Thiết kế website giá rẻ

Question

We have thousands of IoT devices sending measurements throughout the day. This is being stored in a MongoDB instance as a simple timeseries, with a value, a timestamp and an identifier.

Example data could look like this:

{
  "observed_at": {
    "$date": "2024-04-03T16:52:56.217Z"
  },
  "meta": {
    "snid": "xyz",
    "type": "total_charge_power"
  },
  "_id": {
    "$oid": "....."
  },
  "value": 100
}

We need to be able to make a report down to each second throughout a given day, with the sum of consumption from each device.

Some examples of the data:

DeviceA:
16:52:56 : 100watt
17:01:15 : 0watt

DeviceB:
16:56:30 : 50watt
16:58:38 : 100watt
17:00:04 : 0watt

This is interpreted as “DeviceA used 100watt each second between 16:52:56 and 17:01:15 and DeviceB used 50watt each second from 16:56:30 until 16:58:38 where it ramps up to 100watt each second until 17:00:04”

The calculation for these two devices would be:

16:52:56 : 100watt
16:52:57 : 100watt
...
16:56:30 : 150watt
16:56:31 : 150watt
...
16:58:38 : 200watt
16:58:39 : 200watt
...
17:00:04 : 100watt
17:00:04 : 100watt
...
17:01:15 : 100watt
17:01:16 : 0watt

Our initial thought was to use $densify and $fill, but you quickly realize that the about of documents get too big for this to be performant. Lets say we have 5000 devices each day, when running $densify, this would give us 5000 * 60 * 60 * 24: 432.000.000 documents in the calculation each day – only increasing with more devices being added. $densify bails out at 500.000 documents with default settings, and could maybe be increased while we’re taking smaller chunks, but it’s not really viable for a scaleable solution.

The other solution we’re thinking about would be to use $shift making each document a span from one timestamp to another with the value between these timestamps. Given the above example, we would end up with something like this:

// timestamps slimmed down to only hour, minute and seconds
{
 from: null,
 until: "16:56:30",
 value: 0
}
{
 from: "16:56:30",
 until: "16:58:38",
 value: 50
},
{
 from: "16:58:38",
 until: "17:00:04",
 value: 100
},
{
 from: "17:00:04",
 until: null,
 value: 0
}

From this, we then need to (preferable in MongoDB) calculate each second in a loop, picking out the measurements with some sort of “between timestampA and timestampB” lookup.

The first solution would be something like this:

[
  {
    $densify:
      /*
       * gap-filling with per seconds, in the period
       */
      {
        field: "from",
        partitionByFields: ["snid"],
        range: {
          step: 1,
          unit: "second",
          bounds: [
            ISODate("2024-05-23T01:00:00Z"),
            ISODate("2024-05-23T01:01:00Z") // Notice 1 minute gap only
          ]
        }
      }
  },
  {
    $fill:
      /*
       * filling the missing values using locf
       */
      {
        partitionByFields: ["snid"],
        sortBy: {
          from: 1
        },
        output: {
          value: {
            method: "locf"
          }
        }
      }
  },
  {
    $group:
      /**
       * _id: The id of the group.
       * fieldN: The first field name.
       */
      {
        _id: "$from",
        sum_power_w: {
          $sum: "$value"
        }
      }
  }
]

The second solution would use something like this, and then afterwards either be handled in application layer, or with MongoDB, if someone knows how:

[
  ...,
  {
    $setWindowFields:
      /*
       * calculate the end timestamp of each reading
       */
      {
        partitionBy: "$_id.snid",
        sortBy: {
          "_id.from": 1
        },
        output: {
          until: {
            $shift: {
              output: "$_id.from",
              by: 1
            }
          }
        }
      }
  }
]

So the question is:
How do you handle this size of interpolation in MongoDB?
And… Is it possible to calculate solution 2 in MongoDB alone (without the application layer)?

Hope this makes sense – if not, please let me know so I can provide the needed information.

Thiết kế website giá rẻ

Danh mục

Summarize each second in a timeseries with MongoDB with gap-filling