Setup
Imagine there are a bunch of IoT devices, each of which is continuously monitored, e.g. once every few minutes. Thus, at every timestamp, each device produces a several floating-point numbers, which need to be stored. The number of parameters per device is always the same. However, different devices may require a different number of parameters to be stored. For example, device A always writes exactly 5 parameters, and device B always writes 12 parameters.
Some devices may collect a few thousand parameters per minute.
I can restrict the required queries to the following:
- Read/write queries happen once every few minutes
- Write query always writes all parameters for a single device at a given moment in time.
- Read query always reads all parameters for a single device in a given time interval. The time interval would not exceed half a year.
We already use influxdb for storing other kind of measurement data, so I am attempting to understand if this problem can be efficiently solved with influxdb, and if yes how.
My ideas for the schemas are as as follows:
- Table has 2 fields: parameter id and parameter value. For example
Time | DeviceID | ParameterID | Value |
---|---|---|---|
Now | A | 0 | 0.03 |
Now | A | 1 | 0.1 |
… | … | … | …. |
Now | B | 0 | 1.2 |
Now | B | 1 | 1.3 |
… | … | … | …. |
1 min ago | A | 0 | 0.04 |
1 min ago | A | 1 | 0.12 |
… | … | … | …. |
- Table has Nmax fields: all possible parameter values, where Nmax is the upper bound on the total number of parameters that might be needed. Each device that uses N < Nmax parameters will fill in the first N parameters, and not the rest.
Time | DeviceID | P1 | P2 | … | P1000 |
---|---|---|---|---|---|
Now | A | 0.03 | 0.1 | … | 1.45 |
Now | B | 1.2 | 1.3 | … | NAN |
… | … | … | … | … | … |
1 min ago | A | 0.04 | 0.12 | … | 2.75 |
Questions
- Is influxdb a good tool for my use case?
- If yes, what schema should I use: idea 1, idea 2, or something else?
- How does the performance of different solutions roughly compares? It is intended to run this solution for several years for a few dozen devices.
What have I looked at already
A similar question has already been considered. However, none of the answers explicitly address performance differences between the approaches. Further, that question explicitly deals with an easier situation – uniform number of parameters measured per timestamp. In my case, the number of parameters measured varies as stated above. So, this question does not address my concerns.