I am in the process of deploying a machine learning model for study purposes and I have some questions about it:
- My POST method will send to the API my original features (without transformations applied)
untransformed data
- I’m using the same pipeline from the training fase and getting from it the ColumnTranformed and the best model:
preprocessor = pipeline.named_steps["columntransformer"]
model = pipeline.named_steps["xgbclassifier"]
pipeline
- Inside the API I’m getting the POSTed data and wanted to transformed it with the same preprocessor used in the pipeline but:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-29-f928ce436ece> in <cell line: 15>()
13
14 # preprocessor.fit(df[["tenure", "OnlineSecurity", "TechSupport", "Contract"]])
16 print(preprocessed_df)
17
17 frames
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in _raise_if_missing(self, key, indexer, axis_name)
5939
5940 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
-> 5941 raise KeyError(f"{not_found} not in index")
5942
5943 @overload
KeyError: "['MonthlyCharges', 'TotalCharges'] not in index"
- Veirfying KBest features, MonthlyCharges and TotalCharges are not there!
kbest = final_estimator2.named_steps["selectkbest"].get_support(indices=True)
used_df = transformed_df_columns.iloc[:, kbest]
kbest features
Is there a step I’m forgetting?
I did a double check in all the code and official documentations.
I’m expecting to understand why my preprocess is asking for two features that “in theory” wasn’t used and selected by the KBest during the training fase.