I’m facing issues with the RandomForestClassifier from the scikit-learn library when trying to perform multivariate time series prediction. My dataset has multiple columns, and I’m using sliding windows for prediction. Since RandomForestClassifier from sklearn does not support multivariate data directly, I have resorted to using flatten() on the rows of my dataset. However, I understand that this might significantly reduce the performance of my model.I am looking for a RandomForestClassifier that supports multivariate data.
I’ve already tried the following:
Search with ChatGPT, and he suggested using TimeSeriesForestClassifier from tslearn.ensemble, but I couldn’t find any working example or documentation for it in the tslearn library.
Here are the other options I have explored:
TimeSeriesForestClassifier from sktime.classification.interval_based
ComposableTimeSeriesForestClassifier from sktime.classification.ensemble
Unfortunately, both of these do not support multivariate data either.
I came across this link: https://piti118.github.io/babar_python_tutorial/notebooks/03_Multivariate_Analysis.html
Which uses the stretch function from numpy. However, this approach seems similar to using flatten().
There is also an old discussion on GitHub from 2019 indicating that a multivariate-supporting RandomForest does not exist in Python, only in R: https://github.com/scikit-learn/scikit-learn/discussions/22984.
Lastly, I found:
TimeSeriesForest from pyts.classification
but it also doesn’t support multivariate data.
Is there an existing implementation or workaround for using a RandomForestClassifier that supports multivariate time series data in Python? Any pointers or suggestions would be greatly appreciated.
Thank you!
Artur Francisco is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.