The input excel file(trainn.xlsx) represents the car percentage and the output excel file(pred.xlsx) represents the different output values from input values.There are 160 inputs and 160 outputs.I want to generate more 340 inputs and corresponding 340 outputs(for every column) by training the current dataset.Is it possible to increase the values to 500 for both input and output by generating new datas??My current code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.utils import resample
# Load the data
train = pd.read_excel('/mnt/data/trainn.xlsx')
pred = pd.read_excel('/mnt/data/pred.xlsx')
# Separate features and target
X = train.drop(['Emission CO (gm)'], axis=1)
y = train['Emission CO (gm)']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.6, random_state=43)
# Train the RandomForestRegressor
rf = RandomForestRegressor()
rf.fit(X_train, y_train)
# Predict emissions on the prediction dataset
y_pred = rf.predict(pred)
pred['predicted Emission CO (gm)'] = y_pred
# Save the predictions to a CSV file
pred.to_csv('/mnt/data/predicted_emission_co_data.csv', index=False)
# Augment the dataset to 500 entries
while len(train) < 500:
train = pd.concat([train, resample(train, replace=True, n_samples=500-len(train), random_state=43)], ignore_index=True)
# Save the augmented dataset to an Excel file
train.to_excel('/mnt/data/augmented_trainn.xlsx', index=False)
print("Model training complete and dataset augmented.")
The code increases the input data to 500 but doesnt increase the corresponding output datas(pred.slsx) .It remains 160.Help me out
shah fin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.