Learning stops when you multiply the weights of a layer with a scalar?
I am trying to implement sparsely connected weight matrices for my simple 3-layer feedforward model. To do this I implemented a mask for each of my layers with a certain % of zeros, with the idea being that I would like to zero out the same set of weights after every optimizer step so that my layers are not fully connected. But I am having trouble with this because when I do an element-wise multiplication of the mask with the weight matrices, the weights stop changing in subsequent backward passes. To see if my mask is causing the issue, I did just multiplied my weight matrices with the scalar 1.0 and this recreates the issue. What might be happening here? I checked and gradients still get calculated. It’s just that the loss doesn’t go down anymore and the weights don’t change. Does doing this multiplication somehow disconnect the weights from the graph?
How to modify a neuron in the fully connected layer individually?
How to modify a neuron in the fully connected layer individually and design it as a statistical indicator such as kurtosis or negative entropy, as shown in the figure.enter image description hereIs it just a simple calculation of the kurtosis value after the linear layer? like as:
Neural network training in PyTorch much slower when using batching with PyTorch’s DataLoader
I am training a feed-forward neural network in PyTorch. When I use PyTorch’s DataLoader class, which includes batching, I experience a much longer training time. By much longer I mean about 200x longer…
The network architecture is a sequential network consisting of Linear layers and ReLu activation functions. The exact number of layers and nodes per layer can vary. Yet, no matter what numbers I choose, it is always slower when I use PyTorch’s DataLoader.
I’m running this code on an Apple Macbook Pro M2/2022. Therefore, I cannot use cuda in PyTorch, but I have tried mps, which does not speed things up.
Neural network keeps predicting a similar result (but not the same) for different sample
I am building a neural network to predict drug response in cancer cell lines. The model was design with 2 subnetworks describing genetic features of the cell line (gene expression) and chemical structure of the drug, then these 2 subnetworks were merged into a MLP network to predict the drug responses (IC50 values). When I train the model, the loss (MSE) seems to decrease for both training and validation set (I have a validation set to test the model at the end of each epoch).
However, when I finished the training and used the model to predict on both training and testing set, I observed that for each drug, the predicted values among different cell lines were very similar. So it seems that the model tends to predict the mean values of each drugs. Furthermore, when I split the training and validation set based on drugs, in which some drugs were only in the training set and some drugs only appeared in the validation set, the loss in validation set did not decrease at all during the training. This issue did not appear when I divide training/validation by cell lines.