You can also use Llama3 model on SageMaker JumpStart as below:
from sagemaker.jumpstart.model import JumpStartModel
model = JumpStartModel(model_id = "meta-textgeneration-llama-3-70b-instruct")
predictor = model.deploy(accept_eula=False)
response = predictor.predict({"inputs": "this is where you place your prompt", "parameters": {"max_new_tokens":128, "do_sample":"true"}})
However, how to improve the latency and/or throughput (Sagemaker MultiDataModel)?