I am volunteering for a non-profit, building a pretty data intensive app. This is my first-ever time using Firestore, so I’m still learning best-practices and optimization strategies. I’d like to reduce infrastructure costs as much as possible. Ideally, I’d be able to cut costs down to the free tier, since the app doesn’t have a stable source of funding, but I realize that may not be a realistic goal.
I have built the naive MVP version of this app, and it costs more than I feel like it should. I am completely willing to over-engineer the solution. I am even willing to switch the database to a different technology at this point.
Currently, I have a few collections that have ~10k documents each. Each document is basically just a key:value pair. In order to load the app, I need all (or most) of that data. The website gets maybe 500 visits per day, so I’m racking up a bit more than I’d like in read costs.
The data in the biggest collection only gets updated about once a day, with acceptable latency of a few days, so a good caching strategy would be very welcome.
Here’s the main issue though: Most of the advice I’ve seen says to prefer using many small documents over fewer documents with more data. Why is this the case? I’m wondering if in my case it would be a better idea to use a few large documents with (not quite) as much data as will fit. Specifically, I would have n=10 or so documents in each collection that represents the data in all 10k documents. I could determine which document each key:value pair goes in by hashing the key and modding that by n. Then, when I get data, I would just be reading 10 docs. What are the downsides to this? Is there a better approach? Basically, I’m looking for other perspectives than my own so I don’t get stuck with a worse app later on.