I am new to ML and I am starting learning it. I am doing Joh Krohn course for Maths intro to ML. It is pretty clear explaind to me but I am struggling with one thing. In this task https://github.com/jonkrohn/ML-foundations/blob/master/notebooks/regression-in-pytorch.ipynb we used torch.optim.SGD torch SGD which was running thru all the example data.
optimizer = torch.optim.SGD([m,b], lr = 0.01)
epochs = 999
for epoch in range(epochs):
optimizer.zero_grad() # Reset gradients to zero; else they accumulate
yhats = regression(xs, m, b) # Step 1
C = mse(yhats, ys) # Step 2
C.backward() # Step 3
optimizer.step() # Step 4
In the second excersise we were doing learning rate schedulting https://github.com/jonkrohn/ML-foundations/blob/master/notebooks/learning-rate-scheduling.ipynb
There was 8.000.000 datapoints so data was set to batch and code was running in rounds on those samples not in epochs on all data. However this was done not with torch.optim.SGD but shown on a code to see how math is working. I am struggling to run it with torch.optim.SGD. How to write a code to run with it instead of writing with large math equatins like below, where all the equations, like gradient, theta, have been created:
n = 8000000
x = torch.linspace(0., 8., n)
y = -0.5*x + 2 + torch.normal(mean=torch.zeros(n), std=1)
indices = np.random.choice(n, size=2000, replace=False)
gradient = torch.tensor([[b.grad.item(), m.grad.item()]]).T
theta = torch.tensor([[b, m]]).T
lr = 0.01
new_theta = theta - lr*gradient
C = mse(regression(x[batch_indices], m, b), y[batch_indices])
b.requires_grad_()
m.requires_grad_()
def regression(my_x, my_m, my_b):
return my_m*my_x + my_b
m = torch.tensor([0.9]).requires_grad_()
b = torch.tensor([0.1]).requires_grad_()
batch_size = 32 # model hyperparameter
batch_indices = np.random.choice(n, size=batch_size, replace=False)
yhat = regression(x[batch_indices], m, b)
yhat = regression(x[batch_indices], m, b)
def mse(my_yhat, my_y):
sigma = torch.sum((my_yhat - my_y)**2)
return sigma/len(my_y)
C = mse(yhat, y[batch_indices])
C.backward()
m.grad
b.grad
gradient = torch.tensor([[b.grad.item(), m.grad.item()]]).T
theta = torch.tensor([[b, m]]).T
lr = 0.01
new_theta = theta - lr*gradient
new_theta
b = new_theta[0]
m = new_theta[1]
C = mse(regression(x[batch_indices], m, b), y[batch_indices])
rounds = 100
for r in range(rounds):
# This sampling step is slow; we'll cover much quicker batch sampling later:
batch_indices = np.random.choice(n, size=batch_size, replace=False)
yhat = regression(x[batch_indices], m, b) # Step 1
C = mse(yhat, y[batch_indices]) # Step 2
C.backward() # Step 3
gradient = torch.tensor([[b.grad.item(), m.grad.item()]]).T
theta = torch.tensor([[b, m]]).T
new_theta = theta - lr*gradient # Step 4
b = new_theta[0].requires_grad_()
m = new_theta[1].requires_grad_()