I AM TRYING TO INTEGRATE TWO CODES META-LLAMA/LLAMA3 -8B-INSTRUCT MODEL WITH FLASHATTENTION.PY CODE IN https://github.com/shreyansh26/FlashAttention-PyTorch/blob/master/README.md TO REPLACE THE ATTENTION TECHNIQUE IN LLAMA3 AND DO CHatcompletion.py
I am expecting a way to integrate both codes and do the chat completion python script as part of https://github.com/meta-llama/llama3/blob/main/example_chat_completion.py
siri sankar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.