import statsmodels.tools.tools as stattools
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier, export_graphviz
adult_tr = pd.read_csv('~/Desktop/Dataset_Assignment/adult_ch6_training',
usecols=['Marital status','Income','Cap_Gains_Losses'])
y = adult_tr[['Income']]
mar_np = np.array(adult_tr['Marital status'])
(mar_cat, mar_cat_dict) = stattools.categorical(mar_np, drop=True, dictnames = True)
mar_cat_pd = pd.DataFrame(mar_cat)
X = pd.concat((adult_tr[['Cap_Gains_Losses']], mar_cat_pd), axis = 1)
X_names = ["Cap_Gains_Losses", "Divorced", "Married", "Never-married", "Separated", "Widowed"]
y_names = ["<=50K", ">50K"]
cart01 = DecisionTreeClassifier(criterion = "gini", max_leaf_nodes=5).fit(X,y)
data = export_graphviz(cart01, out_file =None, feature_names=X_names, class_names=y_names)
predIncomeCART = cart01.predict(X)
Struggle to run this program, I wonder if there is anything missing from this code. When I run it, it shows the error message:
NotImplementedError Traceback (most recent call last)
Cell In[11], line 11
8 y = adult_tr[['Income']]
10 mar_np = np.array(adult_tr['Marital status'])
---> 11 (mar_cat, mar_cat_dict) = stattools.categorical(mar_np, drop=True, dictnames = True)
12 mar_cat_pd = pd.DataFrame(mar_cat)
14 X = pd.concat((adult_tr[['Cap_Gains_Losses']], mar_cat_pd), axis = 1)
File /opt/anaconda3/lib/python3.11/site-packages/statsmodels/tools/tools.py:151, in categorical(data, col, dictnames, drop)
71 def categorical(data, col=None, dictnames=False, drop=False):
72 """
73 Construct a dummy matrix from categorical variables
74
(...)
149 >>> design2 = sm.tools.categorical(struct_ar, col='str_instr', drop=True)
150 """
--> 151 raise NotImplementedError("categorical has been removed")
NotImplementedError: categorical has been removed
I tried to search the Internet to find any clue, unfortunately, I did not find anything yet.
Any help will be appreciate it!