I am working on a Random Forest Classification model using ECL. I have a dataset split into training and test sets, and I’m trying to extract the target variable 𝑦 (which indicates whether the patient has diabetes) from both datasets. However, the current approach using “ML_Core.Discretize.ByRounding” does not yield the expected results.
I attempted to extract the target variable 𝑦 (which represents the diabetes indicator) from the ‘TrainNF’ and ‘TestNF’ datasets using the following code snippet in ECL:
independent_cols := 8;
X_train := TrainNF(number < independent_cols + 1);
y_train := ML_Core.Discretize.ByRounding(TrainNF(number = independent_cols + 1));
X_test := TestNF(number < independent_cols + 1);
y_test := ML_Core.Discretize.ByRounding(TestNF(number = independent_cols + 1));
The ‘y_train’ and ‘y_test’ variables did not contain any values; they were effectively blank, which indicates that the extraction did not work as expected.
I expected ‘y_train’ and ‘y_test’ to contain the values from the ‘TrainNF’ and ‘TestNF’ datasets, respectively. These values are crucial for training the Random Forest classifier and for evaluating its performance.
SPANDANA SUJAY B Tech 23 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.