I’m looking to store a large amount of binned time series data in Firestore. Each key will be the timestamp that marks the start of that binning period (e.g. “1716505200”).
I want to be able to efficiently retrieve the data, while minimizing the number of document reads I perform. I have seen that you can use select() in the Node.js SDK to apply a field mask. Since I know every key in the database (because the binning process is always on a regular interval) I figured I can use select() to grab a set of keys in a specific time frame.
This leads me to my question – what are the downsides of putting near the maximum amount of datapoints inside a single document (1MiB) and using select() to grab certain sections of it? As far as I understand, this will minimize the number of reads compared to splitting the data between smaller documents, and not incur any further costs. The potential downside I see is a hit on the performance. Is there any information available on the efficiency of using select() versus grabbing multiple documents in full? Also, am I incorrect about select not incurring further costs?
I’m currently following an approach that has each document store 24 hours of data. This is working fine for the time being but I am conscious of the number of reads I could save if I took the “mono-document” approach.
Logan Armstrong is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.