Using LLama3, I would like to perform inference on a very large dataset. When performing linear inference I am however warned that using a dataset will be faster.
You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
.
Here is my inference pipeline:
import transformers
import torch
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device="cuda",
token="xxx"
)
def classification_pipeline(netflow):
messages = [
{"role": "instruction", "content": "You are a Cybersecurity expert which will be asked to classify network flows as malicious or benign. If you think the network flow is benign, answer 0. If you believe the network flow is mailicious, answer 1. For example if I say: IPV4_SRC_ADDR: 149.171.126.0, L4_SRC_PORT: 62073, IPV4_DST_ADDR: 59.166.0.5, L4_DST_PORT: 56082, PROTOCOL: 6, L7_PROTO: 0.0, IN_BYTES: 9672, OUT_BYTES: 416, IN_PKTS: 11, OUT_PKTS: 8, TCP_FLAGS: 25, FLOW_DURATION_MILLISECONDS: 15 and the flow is benign, you output 0. If it is malicious you output 1. You are not allowed to say anything else besides the number 1 or 0."},
{"role": "input", "content": netflow},
]
prompt = pipeline.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
terminators = [
pipeline.tokenizer.eos_token_id,
pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = pipeline(
prompt,
max_new_tokens=100,
eos_token_id=terminators,
do_sample=True,
temperature=0.01,
top_p=0.9,
)
return outputs[0]["generated_text"][len(prompt):]
for index, row in df.iterrows():
print(classification_pipeline(row['input']))
How should I perform batched inference using my custom dataset?
New contributor
user25050201 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.