Introduction
So, following my other question about managing consumers using group_id, I have some more question regarding the design pattern of the consumer service (if one is needed) to handle messages and operate them on my services.
An overview of the application
So, the main application utilizes a micro-service architecture; we have serivses for search, recommendation, single_products, etc., each having their own database. Using kafka, we want to keep these services in sync. We have a topict named products
that has 3 partitions, insert
, delete
, and update
.
What have I done so far?
In my previous question, I got an answer regarding managing the consumers for different partitions. In short, we will have 3 consumers for each service, with separate group_ids and partitions; e.g.:
update consumer for search service
group_id = search_update_consumer
partition = 0
insert consumer for search service
group_id = search_insert_consumer
partition = 1
delete consumer for search service
group_id = search_delete_consumer
partition = 2
And, I will have similar ones for my other services. As you know by now, all of them will subscribe to a single topic, that has only 3 partitions.
Enough intro, what is my question?
After deciding the structure of my consumers, now comes the question of their architecture. Where should I run my consumers? Inside each service? As a separate service for each? Or, as a separate, but single consumer service, that handles all the services?
If I decide to include each ones consumer inside their own service, I do not know how to do this (I am using FastAPI
and aiokafka
), and I am not sure if this is the best way, since the consumer is only responsible for updating the db, and has nothing to do with the api side.
If I decide to run it as a separate service for each service (rec_consumer service for rec app, search_consumer for search app, and so on), then a lot of code duplication will happen, and by no means this is a good thing (unless there is an elegant way to handle shared modules across multiple standalone docker instances, each running on a separate server).
If I decide to create a single service for handling consumers for all the other services, it may actually be a good idea, but this also comes with a big downside. All other services will depend on this one. So it may become something like a single-point-of-failure.
Thank you in advance for taking time to read this, and helping me. Please if you do not have the time to actually explain your answer in details, provide links. This is also Highly appreciated.
I do not knwo if this is needed, but here is my consumer script code (basic implementation):
import sys, json, asyncio, os
from dotenv import load_dotenv
from argparse import ArgumentParser, FileType
from configparser import ConfigParser
from aiokafka import AIOKafkaConsumer, TopicPartition
load_dotenv()
UPDATE_CONSUMER_GROUP_ID = os.getenv("UPDATE_CONSUMER_GROUP_ID", "rec_update_consumer")
UPDATE_CONSUMER_PARTITION = os.getenv("UPDATE_CONSUMER_PARTITION", 1)
KAFKA_SERVER = os.getenv("KAFKA_SERVER", "192.168.1.122:9092")
KAFKA_TOPIC = os.getenv("KAFKA_TOPIC", "products")
def encode_json(msg):
to_load = msg.decode("utf-8")
return json.loads(to_load)
def get_consumer(group_id, partition, topic, server):
topic_partition = TopicPartition(topic=topic, partition=int(partition))
consumer = AIOKafkaConsumer(
bootstrap_servers=server,
enable_auto_commit=False,
group_id=group_id,
# auto_offset_reset="earliest",
)
consumer.assign([topic_partition])
return consumer
async def update_consumer():
consumer = get_consumer(
group_id=UPDATE_CONSUMER_GROUP_ID,
partition=UPDATE_CONSUMER_PARTITION,
topic=KAFKA_TOPIC,
server=KAFKA_SERVER,
)
await consumer.start()
try:
async for msg in consumer:
print(
msg.topic,
msg.key,
msg.headers,
encode_json(msg.value),
sys.getsizeof(msg.value),
)
finally:
consumer.stop()
asyncio.run(update_consumer())