I am creating a CNN from scratch with Pytorch. I have a balanced dataset of images, split in half for both classes. I am trying to use the BCEwithLogitsLoss function from torch.nn as I have read that is typically the best for use-cases like mine. However, for some reason it seems like my network does not learn anything at all when I use this loss function! It remains at a stagnant ~50% accuracy where it is only ever guessing one class. When I use the regular CrossEntropyLoss function instead and expand my final layer’s output nodes to 2, my network actually begins learning! Whereas with the “correct” loss function my network doesn’t ever reach even 1% accuracy for the target class, using Cross Entropy Loss I can reach even 90%+ after a few epochs.
From my understanding Cross Entropy loss is better suited for multi-class classification problems whereas Binary Cross Entropy is better suited for binary classification problems as in the name so I don’t understand how this could be the case.
Originally, I started with a simpler CNN due to this being my first time building one. As such, after some more research I came to the conclusion that it could be partly due to a lack of layers and complexity. Therefore, I added more layers and ended up with this blueprint:
import torch.nn as nn
import torch.nn.functional as F
class ConvolutionalNN(nn.Module):
def __init__(self):
super(ConvolutionalNN, self).__init__()
self.conv1 = nn.Conv2d(3, 9, 5)
self.conv2 = nn.Conv2d(9, 27, 5)
self.conv3 = nn.Conv2d(27, 54, 5)
self.conv4 = nn.Conv2d(54, 108, 5)
self.conv5 = nn.Conv2d(108, 216, 5)
self.conv6 = nn.Conv2d(216, 432, 5)
self.pool = nn.MaxPool2d(3, 3)
self.fc1 = nn.Linear(432*4*4, 256)
self.fc2 = nn.Linear(256, 64)
self.fc3 = nn.Linear(64, 2)
def forward(self, x):
x = (F.relu(self.conv1(x))) #First convolutional layer, then activation function
x = self.pool(F.relu(self.conv2(x))) #Second layer, activation function, then pooling layer
x = (F.relu(self.conv3(x)))
x = self.pool(F.relu(self.conv4(x)))
x = (F.relu(self.conv5(x)))
x = self.pool(F.relu(self.conv6(x)))
x = x.reshape(-1, 432*4*4) #Flattens the tensor
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
I was partly inspired by the double convolutional layers used in VGGNET. Furthermore, I lack experience in this so if anyone has any suggestions I am more than glad to take them.
I have used a learning rate of both 0.001 and 0.0001. I am using the Adam optimizer. Furthermore, my labels are not one-hot encoded. In this case above I have used 2 output nodes to cooperate with CrossEntropyLoss, however beforehand I was using 1 output node for BCE.
I look forward to any help at all! Thank you so much!
JNuevo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.