I’m working on an API where I need to map a large number of strings (e.g., “Amazon”, “Walmart”, “Nike – Shoes”) to their corresponding URLs (e.g., “www.amazon.com”, “www.walmart.com”). Here’s what I need:
-
Store String-to-URL Mappings:
-
I need a simple and scalable way to store these mappings.
-
Each string should be easily retrievable with its associated URL.
-
-
Query Multiple Strings at Once:
-
I need to be able to query thousands of these strings in one go and retrieve their URLs.
-
The solution should efficiently handle large-scale data retrieval without performance issues.
-
-
Group Results by URL:
- When I query multiple strings, I want the results grouped by URL. For example, if several strings map to “www.amazon.com”, they should be grouped together under that URL in the results.
Challenges:
- I initially considered using Redis due to its speed and simplicity, but I’ve read that using MGET to retrieve thousands of keys could lead to memory issues or block the server. This is a concern for my use case, especially with the volume of data I expect to handle.
Question:
What’s the best approach or technology stack to efficiently manage and query these string-to-URL mappings, considering the need for:
-
Efficient storage and retrieval.
-
Handling large-scale queries.
-
Grouping results by URL.
-
Avoiding performance pitfalls like those that might arise with Redis when using
MGET
for large numbers of keys.
Here’s an example of what I’d like:
Payload:
{
"strings": ["Amazon", "Walmart", "Nike - Shoes", "Amazon - Books"]
}
Response:
{
"results": [
{
"url": "www.amazon.com",
"strings": ["Amazon", "Amazon - Books"]
},
{
"url": "www.walmart.com",
"strings": ["Walmart"]
},
{
"url": "www.nike.com",
"strings": ["Nike - Shoes"]
}
]
}
I’m still trying to come up with a good solution. Firestore seems like the worst possible solution for something like this, as with a few weeks’ use, it would surely rack up millions with the reads alone.
I’ve not considered MySQL because I’m concerned about the performance impact of querying thousands of values at once, especially in terms of execution time and server load.
Any advice or recommendations would be greatly appreciated!
Thank you