I want to calculate the Matthews correlation coefficient (MCC) in sklearn between every column of a matrix X with an output y. Here is my code:
from sklearn.metrics import matthews_corrcoef
import numpy as np
X = np.array([[1, 0, 0, 0, 0],
[1, 0, 0, 1, 0],
[1, 0, 0, 0, 1],
[1, 1, 0, 0, 0],
[1, 1, 0, 1, 0],
[1, 1, 0, 0, 1],
[1, 0, 1, 0, 0],
[1, 0, 1, 1, 0],
[1, 0, 1, 0, 1],
[1, 0, 0, 0, 0]])
n_sample, n_feature = X.shape
rcf_all = []
for i in range(n_feature):
coeff_c_f = abs(matthews_corrcoef(X[:, i], y))
rcf_all.append(coeff_c_f)
rcf = np.mean(rcf_all)
It worked pretty good here but as long as I have a very big matrix with many features, calculating them by looping through one feature at a time is pretty slow. What is the most effective way to perform this simultaneously without using the loop to speed up the calculation process?