Intermediate model that filters y in Scikit-learn pipelines
I want to implement a predicting architecture which features an intermediate classifier. This model will be fitted and then predict probabilities for the classes of a binary feature on another sector of the training set. Then the transformer will remove every instance whose probability is smaller than a given value. Obviously, the y
values corresponding to those instances should also be removed. This is not supported by the default sklearn behaviour, which does not allow for y to be transformed within a pipeline to be optimized through, e.g., GridSearchCV
.