Everyone
I am new to Kafka. I am writing code for a python module that will process messages from Kafka. The module will be containerized using docker. The module will read a Kafka message, do some processing that involves a lot of I/O, and send a kafka message to another topic. This module is expected to receive a lot of messages every second (100 for example). Each message takes about 3 seconds to process, which means I will need a lot of containers to process those messages, but I want to increase the utilization of each container to reduce the number of overall container and reduce the costs. There are two approaches that I thought about:
- Concurrent processing using
Asyncio
.
for msg in KafkaConsumer:
#call async method to process the msg
processMSG(msg)
await processMSG
I do not think this is a very good approach since, it blocks the listener while processMSG
is running.
- Use multithreading to process requests parallelly.
I am not sure that my thought process is heading towards the right direction, so I would appreciate it if someone would help guide me towards a better approach that improves the server utilization.
user23018130 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.