I’m working on some code that grabs some data from a website and then transforms it into a data object that will work for whatever machine learning library the user is using. I wanted to do some preprocessing on this data as well to make it easier to work with, but a lot of preprocessing tools are sort of library-specific and I don’t want a user to have to download a library that they aren’t going to use, so I don’t want to just try and import all the libraries I’m trying to support at the top. For example, if a user is using scikit-learn, they shouldnt have to have pytorch installed. I was wondering if I could instead put these import statements in an if statement that does the preprocessing with the tools of the specific library. The code would look something like this:
class ML_Library(Enum):
SKLEARN == 1
PYTORCH == 2
def DataReturningFunction(Library):
# some code that gets the raw data from the website
if Library == ML_Library.SKLEARN:
import sklearn
# some code that does data preprocessing with sklearn tools
return (DataObject that works with Scikit-Learn)
if Library == ML_Library.PYTORCH:
import pytorch
# some code that does data preprocessing with pytorch tools
return (DataObject that works with Pytorch)
I’ve tested this with libraries I don’t have installed and it seems to work fine if the control flow doesn’t take the path that imports it, but I feel like there’s some underlying performance issue with this approach that I’m unaware of
Willow Unknown is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.