Recently, for a project, I have uploaded Meta Llama 3 8B model from huggingface to Google Colab, since the model’s high VRAM requirements were not being met by my pc. Therefore i needed Colab’s accelerated GPU to use it. But after having successfully uploaded it, I am not being able to create an API endpoint in Colab, using which my original project code, which I have done in VS code, can communicate user prompts to the model. I am searching Youtube videos, Github repos and any article that can help but to no avail. can anyone please tell me what method I should apply here?
I am trying to use ngrok and FastAPI but for this code:
from fastapi import FastAPI
from pyngrok import ngrok
import uvicorn
app=FastAPI()
model_id=”meta-llama/Meta-Llama-3-8B”
@app.post(“/generate”)
def generate_text(prompt: str):
output=text_generator(prompt, max_length=100, num_return_sequences=1)
return {“output”:output}
public_url=ngrok.connect(8000).public_url
print(f”FastAPI running on {public_url}”)
uvicorn.run(app,host=”0.0.0.0″, port=8000)
I am getting exception which is saying that authentication failed, but I have given the auth_token in the previous cell. I am really not sure whether this is the way to establish communication between my VS code code and the colab llm code.
Anuvab Das is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.