I’m working on a project involving a feedforward deep neural network (DNN) designed to learn the product of two input values. Theoretically, this should be straightforward, especially with the preprocessing and activation functions I’ve employed. However, the model fails to find the correct weights and biases, resulting in high errors during both interpolation and extrapolation.
Model Details:
- Input: ( x_1, x_2 )
- Output: ( y = x_1 times x_2 )
- Architecture:
- Layer 1: 10 neurons
- Layer 2: 10 neurons
- Output Layer: 1 neuron
- Activation Functions: Identity for all layers
- Preprocessing: None applied to the inputs directly, but tried using logarithms and exponential functions to aid learning.
- Optimizer: Adam with a learning rate of 0.1
- Loss Function: Mean Squared Error (MSE)
- Training Data:
- ( x_1 ) and ( x_2 ) range: ([-100, 100])
- Epochs: 50
- Batch Size: 32
- Weight and Bias Initialization: Random
Observations:
Despite these settings, the model produces significant errors:
- Interpolation:
- For ( x_1 = 2 ) and ( x_2 = 1 ), the expected output is 2, but the observed output is 15.2477.
- For ( x_1 = 1 ) and ( x_2 = 2 ), the expected output is 2, but the observed output is 15.193.
- Extrapolation:
- For ( x_1 = 150 ) and ( x_2 = 200 ), the expected output is 30,000, but the observed output is -75.753.
- For ( x_1 = 200 ) and ( x_2 = 150 ), the expected output is 30,000, but the observed output is -109.046.
Attempts to Resolve:
- Tried different weight initialization strategies.
- Experimented with different learning rates and optimizers.
- Applied logarithmic preprocessing to the inputs and used an exponential activation function.
Code Snippet:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
# Model definition
model = Sequential([
Dense(10, activation='linear', input_shape=(2,)),
Dense(10, activation='linear'),
Dense(1, activation='linear')
])
# Compile the model
model.compile(optimizer=Adam(0.1), loss='mse')
# Sample training data
import numpy as np
x_train = np.random.uniform(-100, 100, (1000, 2))
y_train = x_train[:, 0] * x_train[:, 1]
# Train the model
model.fit(x_train, y_train, epochs=50, batch_size=32)
# Testing the model
test_data = np.array([[150, 200], [200, 150]])
predictions = model.predict(test_data)
print(predictions)
Question:
Given the setup and the efforts to troubleshoot, why is the model failing to learn the product operation? Is there something fundamentally wrong with the architecture or preprocessing steps? How can I modify the model to correctly learn the multiplication of two inputs?
I appreciate any insights or suggestions to help solve this problem.
Mo McWebmo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.