Sorry if this is more of a general/beginner question but I’m not finding any luck searching for an answer online – maybe I am googling the wrong things.
So essentially let’s say I have a dataframe of:
-Subject id (int)
-Age (int)
-Sex (int 1- for male, 2- for female)
-Pearson CC representing a functional network (matrix)
Which means i have an array of arrays for the PearsonCC. For example:
p_id sex group age PearsonCC
0 128_S_0200 M MCI 74.0 [0.5052435694128596, 0.3375816208945487, 0.206…
1 003_S_0908 F MCI 74.0 [-0.18955977794142087, 0.01652734870786999, -0…
2 141_S_1052 F MCI 79.0 [0.0562331642358682, 0.5698911953687733, -0.17…
3 021_S_0178 M MCI 76.0 [-0.0025129520401882864, 0.4303185685918817, -…
4 141_S_1378 F MCI 72.0 [0.37126555457245725, 0.5341560356568125, 0.00…
5 135_S_4723 F MCI 74.0 [0.018847806142767695, 0.12456857296257934, -0…
6 037_S_0150 M MCI 85.0 [0.39432071450343287, 0.21554601918874589, 0.1…
7 068_S_0802 F MCI 91.0 [-0.14763038782883162, -0.013336047668838688, …
8 128_S_0205 F MCI 28.0 [0.07919061297038239, 0.02223380119164608, 0.1…
So I understand we need to flatten the matrix/vectorize upper triangle
Then turn it into a numpy in order to do feature scaling. I’ve tried something like this:
X_mat_train = np.vstack(X_matrix_train['Z_new'].values)
X_mat_test = np.vstack(X_matrix_train['Z_new'].values)
My question is:
Am I supposed to convert the vector back to the original form when using it to fit a model?
Because I am getting mixed results from searching online. And I am confused as to what would be the best format to use for this particular column?
Any help is greatly appreciated!
If I use sklearn’s Logistic Regression:
model = LogisticRegression(max_iter=1000)
model.fit(X_train_final, y_train)
I get an error that suggests maybe the PearsonCC columns can only be int/floats:
TypeError Traceback (most recent call last)
TypeError: only size-1 arrays can be converted to Python scalars
For scaling the feature, this was my approach:
scaler = StandardScaler()
X_mat_train = np.vstack(X_matrix_train['Z_new'].values)
X_mat_test = np.vstack(X_matrix_train['Z_new'].values)
X_mat_train_scaled = scaler.fit_transform(X_mat_train)
X_mat_test_scaled = scaler.transform(X_mat_test)
I’m just not sure how to convert this X_mat_train back into a format acceptible for ML/Regression