I am working on a Transformer model for a translation task and want to track attention weights using Comet. My model consists of 2 layers with 2 attention heads each. I am interested in understanding how to log attention weights effectively during and after training.
Details:
-
Model Setup:
-
I am using PyTorch with a Transformer architecture.
-
The model has 2 layers and each layer has 2 attention heads.
-
-
Tracking Attention:
-
During Training: I want to track attention weights periodically to monitor how they evolve.
-
At the End of Training: I also want to log attention weights from the final model for analysis.
-
-
Data:
- let’s assume my dataset consists of 10 instances with a batch size of 2, so there are 5 batches.
-
Questions:
-
Batch/Instance Selection: How should I select which batch or instance to track? Should I use a specific batch, a random batch, or a representative batch?
-
Tracking Attention Weights:
-
How can I modify my model to return attention weights during the forward pass?
-
What is the best way to log attention weights using Comet? Should I log them at regular intervals during training and also at the end of training?
-
-
I attempted to track attention weights from a Transformer model during training and at the end using Comet. I modified the model to return attention weights and implemented code to log these weights periodically and at the final stage.
What I Expected:
I expected to successfully log attention weights for a specific batch or instance, both during training at specified intervals and at the end of the training process, and view these logged weights in the Comet dashboard.
What Actually Resulted:
I encountered issues in selecting the correct batch or instance to track and experienced difficulties in logging attention weights in a way that is viewable in the Comet dashboard. I need guidance on the best practices for selecting batches/instances and logging attention weights
Farshid B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.