I’m currently working on a C++ codebase that processes high-throughput data, generating around 5000 records per second. These records are sent in batches to AWS Kinesis Firehose. The data has a column called A, which needs to be preprocessed by mapping it through two intermediate tables before being fully processed. These mapping keys is generated in another place in the C++ code (A->B), and in another C code (B->C):
- A to B
- B to C
Here’s a breakdown of the current implementation:
Data Sending (Place 1 – C++):
- Data with A is batched and sent to AWS Firehose.
Mapping A to B (Place 2 – C++):
- The mapping keys A to B are stored in DynamoDB.
Mapping B to C (Place 3 – C):
- The mapping keys B to C are also stored, but this part is implemented in C.
Problem:
The current approach involves querying DynamoDB in AWS Lambda functions, which introduces significant latency. The batch processing time is too high due to the DynamoDB lookups for mapping A to B and B to C.
this is the time it takes for mapping from dynamo in aws lambda function of 10K records
Cloudwatch9
Considered Solutions:
- Using RDS instead of DynamoDB for storing and querying the mappings. (IDK if it actually much faster, or it will not differ much)
- Preprocessing the mapping locally before sending the data to AWS Firehose. But can I do it without introducing high latency delay? how come if I have the data and mapping keys in different places and different codes.
- Forwarding data and mapping keys to a central place(local and then sending(batches) them in a single Firehose stream with the mapping keys.
- Caching the mapped values in AWS to avoid repetitive lookups, and/or caching it locally.
Questions:
- What would be the most efficient way to handle this high-throughput data processing, considering the need for low-latency mapping?
- Are there AWS services or architectural patterns that can help optimize this process?
- Any suggestions or alternative approaches would be highly appreciated!
i was expecting fewer latency of mapping in AWS lambda ( but fetching from dynamDB takes too much time)
Abdulaziz Hamid Ebrahim is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
4