I have an application in which the user can describe an analysis job via a yaml like
analysis:
learner:
name: LinearRegression
args:
solver: liblinear
and then it is parsed to instantiate an sklearn LinearRegression model. Basically I want a whitelisted eval
where I am “instantiating” user-input strings subject to the constraint that they are in the whitelist. The whitelist is managed via a known types dictionary:
from hdbscan import HDBSCAN
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import MLPClassifier
from sklearn.svm import SVC
from umap import UMAP
from xgboost import XGBClassifier
KNOWN_TYPES = {
"kmeans": KMeans,
"hdbscan": HDBSCAN,
"logistic_regression": LogisticRegression,
"xgb_classifier": XGBClassifier,
"mlp_classifier": MLPClassifier,
"svm": SVC,
"umap": UMAP,
"pca": PCA,
}
This works fine. It’s easy to extend with new types and I get the blend I need of walled-garden and flexibility of analysis which I can control via yaml job files without touching the code.
However, the import time is like 20s (!) and in most runs most of those imports are never used.
My question is what are my options for lazy loading or dynamically loading the types?
A list of strings with importlib
doing the importing? Lazy importing features within importlib? I’m just not sure what the best approach is here.
The easiest solution is indeed going to be to describe the model via strings describing the module that you then import. For example, something like:
import importlib
KNOWN_TYPES = {
"kmeans": "sklearn.cluster:KMeans",
...
}
def load_model(name: str):
module_name, name = KNOWN_TYPES[name].split(':', 1)
module = importlib.import_module(module_name)
return getattr(module, name)
However, it may be preferable to avoid such dynamic features and use normal imports. Then you can delay the import via a function. For example:
def load_kmeans():
from sklearn.cluster import KMeans
return KMeans
KNOWN_TYPES = {
"kmeans": load_kmeans,
...
}
def load_model(name: str):
return KNOWN_TYPES[name]()
Importlib provides a built-in LazyLoader that initially looks like another alternative, but it only delays loading until the module’s attributes are accessed – which an from … import …
statement does immediately.
1