I’m running triton inference server and it gives me generated texts followed by input prompts.
How can I configure the server only returns generated texts?
Ref: https://github.com/triton-inference-server/vllm_backend?tab=readme-ov-file#sending-your-first-inference
Q:
<code>$ curl -X POST localhost:8000/v2/models/vllm_model/generate -d '{"text_input": "What is Triton Inference Server?", "parameters": {"stream": false, "temperature": 0}}'
</code>
<code>$ curl -X POST localhost:8000/v2/models/vllm_model/generate -d '{"text_input": "What is Triton Inference Server?", "parameters": {"stream": false, "temperature": 0}}'
</code>
$ curl -X POST localhost:8000/v2/models/vllm_model/generate -d '{"text_input": "What is Triton Inference Server?", "parameters": {"stream": false, "temperature": 0}}'
A:
<code>{"model_name":"vllm_model","model_version":"1","text_output":"What is Triton Inference Server?nnTriton Inference Server is a server that is used by many"}
</code>
<code>{"model_name":"vllm_model","model_version":"1","text_output":"What is Triton Inference Server?nnTriton Inference Server is a server that is used by many"}
</code>
{"model_name":"vllm_model","model_version":"1","text_output":"What is Triton Inference Server?nnTriton Inference Server is a server that is used by many"}