I have a set of parameters that have time series data. I have rearranged the input in a way that looks like
A.1 B.1 C.1 A.2 B.2 C.2 A.3 B.3 C.3
3. 2. 1. 2. 1. 1. 4. 4. 4
2. 1. 4. 4. 4. 2. 2. 4. 4
where A, B, and C are my parameters, each with three values of time series. The rows represents different instances. I am training a random forest with this input data. Then, I am plotting the importance of the parameters. Initially, my plots showed a bar for each one of the parameters with its time serie, for example A.1, A.2, A.3, … But I wanted to plot the A, B, and C. What I did was to reshape and add their importance, for example A=A.1+A.2+A.3 and so forth for each parameter,I am new to machine learning, so I want to ask if this procedure is appropiate or if I need to change anything in my process, thank you.
this is my code:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import numpy as np
import matplotlib.pyplot as plt
# Input data for 3 flights, each with 5 time steps and 2 parameters (Altitude and Velocity)
data = [
np.array([
[3.06, 110, 310],
[2.94, 128, 300],
[2.88, 144, 299],
[2.56, 128, 301],
[2.38, 129, 301],
[2.13, 142, 301],
[2.25, 127, 303],
[2.13, 162, 300],
[1.75, 138, 298],
[1.69, 142, 301]
]),
np.array([
[8, 135, 179],
[7, 161, 180],
[7, 117, 179],
[6, 122, 180],
[6, 132, 178],
[5, 131, 180],
[5, 136, 180],
[4, 134, 179],
[4, 136, 180],
[3, 113, 179]
]),
np.array([
[1.13, 148, 91],
[1.06, 113, 87],
[0.75, 164, 92],
[0.63, 141, 95],
[0.56, 139, 85],
[0.44, 142, 94],
[0.25, 140, 90],
[0.19, 144, 90],
[0.13, 140, 92],
[0 , 142, 93]
]),
np.array([
[1.13, 119, 299],
[1, 121, 299],
[0.88, 116, 299],
[0.69, 117, 299],
[0.63, 118, 301],
[0.44, 118, 300],
[0.38, 117, 299],
[0.19, 119, 299],
[0.13, 118, 300],
[0.06, 120, 300]
])
]
print(data)
# Reshape the 3D input into a 2D format
num_flights = len(data)
num_timesteps, num_params = data[0].shape
reshaped_data = np.concatenate(data).reshape(num_flights, -1)
print(reshaped_data)
# Create labels for the flights
labels = np.array([1, 1, 0, 0])
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(reshaped_data, labels, test_size=0.2, random_state=42)
# Train a Random Forest classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
# Predict on the test set
y_pred = clf.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
# Get feature importances from the trained Random Forest model
feature_importances = clf.feature_importances_
print(feature_importances)
# Reshape feature importances to match the original data shape
reshaped_importances = feature_importances.reshape(num_timesteps, num_params)
print(reshaped_importances)
# Plot the feature importances for each parameter
plt.figure(figsize=(10, 6))
print(reshaped_importances.sum(axis=0))
plt.bar(range(num_params), reshaped_importances.sum(axis=0), tick_label=['Altitude', 'GroundSpeed','FlightCourse'])
plt.xlabel('Parameter')
plt.ylabel('Importance')
plt.title('Feature Importance of Parameters')
plt.show()
I wanted to plot the importance of the features. I was getting the importance of parameters for its different time series, then I added the importances by parameter
Cardiz12 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.