I have try to train my model for 100 epochs using Segformer from Timm Library.
But somehow after I trained it my loss graph look like this:
I am using torch.optim.AdamW with lr=1e-4 and lr_scheduler = optim.lr_scheduler.PolynomialLR(optimizer, total_iters=100, power=0.9). The model is written using pytorch-lightning.