I’m using this code from llama-recipes repo to run LLama 3 on Google Colab with Hugging Face Transformers. However, it is consuming more computational power than I expected.
As I’m a Coolab Pro subscriber, I have access to the following GPUS:
- TPU v2 (coulnd’t run the code)
- T4 GPU (Takes forever, does not give a response)
- A100 GPU (works)
However, using the A100 with High RAM means that I have only have ~8.5 hours of compute per month (at $10 USD).
What are the better options? My goal is exploring as a student, for now I don’t need large inference jobs