I have a model that adds numbers between -10 to positive 10, but using neural networks to predict the outcome using a data set of adding two numbers. However when getting the train accuracy, its just printing out a lot of 100 percent accuracies. I am not sure if the model is just quickly training, or if there is something wrong and its not properly learning. Can anyone give some insight?
Here is my code
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader,TensorDataset
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
import matplotlib_inline.backend_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('svg')
data = []
labels = []
datasetAmount = 2000
for i in range(datasetAmount):
x = np.random.randint(-10, 10)
y = np.random.randint(-10,10)
bothNumber = [x,y]
data.append(bothNumber)
labels.append(x+y)
data_np = np.array(data)
labels_np = np.array(labels).reshape(-1,1)
train_data, test_data, train_labels, test_labels = train_test_split(data_np, labels_np, train_size =.9)
train_data = TensorDataset(torch.tensor(train_data),torch.tensor(train_labels))
test_data = TensorDataset(torch.tensor(test_data),torch.tensor(test_labels))
batchsize = 20
train_loader = DataLoader(train_data, batch_size = batchsize, shuffle = True, drop_last = True)
test_loader = DataLoader(test_data, batch_size = test_data.tensors[0].shape[0])
def createModel():
class myModel(nn.Module):
def __init__(self):
super().__init__()
self.input = nn.Linear(2,8)
self.fc1 = nn.Linear(8,8)
self.output = nn.Linear(8,1)
def forward(self,x):
x = F.relu( self.input(x) )
x = F.relu( self.fc1(x) )
return self.output(x)
net = myModel()
lossfun = nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(),lr=.001)
return net,lossfun,optimizer
def trainModel():
numepochs = 100
net,lossfun,optimizer = createModel()
losses = torch.zeros(numepochs)
trainacc = []
testacc = []
for epochi in range(numepochs):
batchLoss = []
for X,y in train_loader:
X = X.float()
y = y.float()
yHat = net(X)
loss = lossfun(yHat,y)
batchLoss.append(loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
losses[epochi] = np.mean(batchLoss)
with torch.no_grad():
train_predictions = []
train_labels = []
for x_train, y_train in train_loader:
x_train = x_train.float()
y_train = y_train.float()
train_pred = net(x_train)
train_predictions.append(train_pred)
train_labels.append(y_train)
train_predictions = torch.cat(train_predictions)
train_labels = torch.cat(train_labels)
train_acc = 100 * torch.mean((np.abs(train_predictions - train_labels) < 1).float())
trainacc.append(train_acc.item())
X,y = next(iter(test_data))
X = X.float() # Convert X to float for test data
y = y.float() # Convert y to float for test data
with torch.no_grad():
yHat = net(X)
testacc= 100*torch.mean((np.abs(yHat-y)< 1).float())
return trainacc,testacc,losses,net
trainAcc, testAcc, losses , net = trainModel()
Does anything look wrong with the model?
Predicting the sum of two variables is a linear task and very easy for neural networks. At the core of a neuron it calculates a weighted sum of its input values.
This “network” below is fully sufficient to solve the task in just one epoch.
The math behind the network is: x*weight1 + y*weight
and it just needs to learn to set both its weights to 1.0.
class myModel(nn.Module):
def __init__(self):
super().__init__()
self.input = nn.Linear(2,1)
def forward(self,x):
return self.input(x)
By using relu
you make it actually harder for the network as negative values cannot go trough. Still the network below with two units is still enough for this easy task.
def createModel():
class myModel(nn.Module):
def __init__(self):
super().__init__()
self.input = nn.Linear(2,2)
self.output = nn.Linear(2,1)
def forward(self,x):
x = F.relu(self.input(x))
return self.output(x)
The math behind the network is relu(x@weights11 + y@weights12)@weight2
now just with weights of length two.
Ideally will the network now return in layer1 [-(x+y), x+y]
, which after the relu it will contain 0
and abs(x+y)
the second layer now needs to only fix the sign and multiply by 1 or -1 to yield (x+y).
In the first variant form the network is basically a linear regression and you can think of the second one as two chained regression. I recommend that you familiarize yourself with this little bit of math, because in the end some basic good to knows are:
- linear neurons are just a weighted sum of its inputs followed optionally by an activation.
- A layer of neurons is just a collection of different weights for the inputs
- Other neuron types change only the order how inputs and weights are combined, e.g. convolutional neurons apply identical weights to inputs selected from a grid.
So yes your model learns very quickly to do this simply task and it could even be much faster 😉