A number of scikit-learn’s classes have a feature_names_in method, which would be a real time saver if I could understand it better. Specifically, assume your X is a nested list of strings [[‘A’, ‘B’, ‘C’], [‘A’, ‘B’, ‘D’]] and your y is a list of labels [‘Blue’, ‘Green’]. Now, assume you are doing feature selection using, for example, the selectKbest class in scikit. Assume you choose the chi2 univariate approach and ask for the top 2 features (i.e., k=2) and you get your k_best_object. Now, that k_best_object has a method associated with it called feature_names_in which would be really helpful if it returned the “names” of the top 2 features. The problem is that the documentation says that this method is only available when the features are entirely strings. That would be fine, except for the fact that I haven’t been able to get selectKbest (or other scikit classes) to work on strings. Instead, I have only been able to get them to work by converting the X values into a numpy array of floats using TFIDVectorizer (either count or TF-IDF). So, my question is… how would this method ever be used? If it’s only viable when all X input values are strings, but the only X it will take is floats, then how does this method ever apply?