I retrieve from yahoofinance endpoint a time series data, I store it in azure blob that requires a connection string to access it. I then create a dataframe with this time series and add it to the tracking of mlflow using log_input.
When I create the dataframe with mlflow.data.from_pandas I register the blob URL.
dataset = mlflow.data.from_pandas(
data_df, source=source, name=f"{stock_ticker}_{input_or_pred}_{train_or_test}")
mlflow.log_input(dataset, context=train_or_test_or_all)
Since it’s protected and requires a connection string to access the blob URI, the data source registry in dataset_source_registry.py cannot resolve and throws:
C:ProgramDataminiconda3envsmlflow-sparkLibsite-packagesmlflowdatadataset_source_registry.py:150: UserWarning: Failed to determine whether UCVolumeDatasetSource can resolve source information for 'https://<storagename>.blob.core.windows.net/<container>/<filename>.csv'. Exception:
return _dataset_source_registry.resolve(
Is there a way to register a protected URI with mlflow.data.from_pandas or a different way to log_input this pd dataframe? I would not want to stop warnings because I may miss something else relevant.