I’m doing a team project related to food recommendation, my task is to predict calories based on input of protein and sodium.
The data looks like this:
enter image description here
I tried various machine learning model at first but the accuracy is so low (XGboost r square is 25%), when I removed outliers (R square is 37%). I then tried using neural network regression and got the results below:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
import matplotlib.pyplot as plt
# Step 2: Split the data into features and target
X = df_clean[['protein', 'sodium']]
y = df_clean['calories']
# Step 3: Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Normalize the data
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Step 5: Define the neural network model with additional layers and dropout
model = Sequential()
model.add(Dense(128, input_dim=2, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1)) # Output layer for regression
# Step 6: Compile the model
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
# Step 7: Define early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
# Step 8: Train the model
epochs = 20
batch_size = 32
history = model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, validation_data=(X_test, y_test), verbose=1, callbacks=[early_stopping])
# Step 9: Evaluate the model on the test set
test_loss, test_mae = model.evaluate(X_test, y_test, verbose=0)
print(f'Test loss: {test_loss:.4f}, Test MAE: {test_mae:.4f}')
# Step 10: Make predictions on the test set
y_pred = model.predict(X_test)
# Step 11: Visualize the results
plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', lw=3)
plt.xlabel('Actual Calories')
plt.ylabel('Predicted Calories')
plt.title('Neural Network Regression')
plt.show()
# Step 12: Plot training & validation loss values
plt.figure(figsize=(10, 6))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(loc='upper right')
plt.show()
enter image description here
enter image description here
Could anyone suggest the approach for this problem, I’m new to this field so any help is really appreciated.
Krystal Nguyen is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.