I have set up the ollama docker image on docker compose locally and on a vps.
Same image (ollama/ollama:latest), same ollama configurations (no env variables), same model (llama3:8b).
When i hit the local ollama i get a streamed response as expected:
Chunk 1: {"model":"llama3","created_at":"2024-04-25T18:51:50.232262674Z","response":"I","done":false}
Chunk 2: {"model":"llama3","created_at":"2024-04-25T18:51:50.337390507Z","response":"'m","done":false}
Chunk 3: {"model":"llama3","created_at":"2024-04-25T18:51:50.436351424Z","response":" happy","done":false}
...
When i switch from the local ollama instance api to the vps ollama api endpoint, i get multiple tokens per chunk:
Chunk 1:
{"model":"llama3","created_at":"2024-04-25T18:50:48.257558353Z","response":"'m","done":false}
{"model":"llama3","created_at":"2024-04-25T18:50:48.490515771Z","response":" happy","done":false}
{"model":"llama3","created_at":"2024-04-25T18:50:48.747192222Z","response":" to","done":false}
{"model":"llama3","created_at":"2024-04-25T18:50:48.980096938Z","response":" help","done":false}
{"model":"llama3","created_at":"2024-04-25T18:50:49.251317633Z","response":"!","done":false}
{"model":"llama3","created_at":"2024-04-25T18:50:49.489186528Z","response":" However","done":false}
{"model":"llama3","created_at": // Also cuts off like this
Chunk 2:
"2024-04-25T18:50:49.728896473Z","response":",","done":false}
{"model":"llama3","created_at":"2024-04-25T18:50:49.938308226Z","response":" I","done":false}
{"model":"llama3","created_at":"2024-04-25T18:50:50.177705683Z","response":" don","done":false}
{"model":"llama3","created_at":"2024-04-25T18:50:50.43456802Z","response":"'t","done":false}
{"model":"llama3","created_at":"2024-04-25T18:50:50.678656616Z","response":" think","done":false}
{"model":"llama3","created_at":"2024-04-25T18:50:50.922270909Z","response":" you","done":false}
{"model":"llama3","created_at":"2024-04-25T18:50:51.165693506Z","response":"'ve","done":false}
{"model":"llama3","created_at":"2024-04-25T18:50:51.391343641Z","response":" told","done":false}
{"model":"llama3","created_at":"2024-04-25T18:50:51.639056032Z","response":" me","done":false}
Chunk 3:
{"model":"llama3","created_at":"2024-04-25T18:50:51.881018398Z","response":" your","done":false}
{"model":"llama3","created_at":"2024-04-25T18:50:52.115575933Z","response":" name","done":false}
I couldn’t find any documentation on this behaviour, has anyone encountered this issue before?
Kind regards