I have a data frame table 1 with close to 56 million rows which captures the comment made on different videos.. The columns are videoID (an identifier for video),Time (captures the time to nearest minute),Time_5 (captures 5 minutes before the time) and emotion (captures the emotional score of comment). I want the mean score of emotions for last 5 minutes such that data$Time >= Time_5 & data$Time < Time. It’s like a rolling window that I am trying to cover and get Table-2. I used the code that I could find on other stack overflow questions but they are quite slow since I have 56 million rows.
Table-1
videoID | Time | Time_5 | Emotion |
---|---|---|---|
1 | 2022-06-15 01:39:00 | 2022-06-15 01:34:00 | 0.5 |
1 | 2022-06-15 01:39:00 | 2022-06-15 01:34:00 | 1 |
1 | 2022-06-15 01:40:00 | 2022-06-15 01:35:00 | 0.6 |
1 | 2022-06-15 01:40:00 | 2022-06-15 01:35:00 | 0.4 |
1 | 2022-06-15 01:40:00 | 2022-06-15 01:35:00 | 0.2 |
2 | 2022-07-17 03:20:00 | 2022-07-17 03:15:00 | 0.2 |
2 | 2022-07-17 03:20:00 | 2022-07-17 03:15:00 | 0.4 |
2 | 2022-07-17 03:20:00 | 2022-07-17 03:15:00 | 0.9 |
Table-2
videoID | Time | Time_5 | Emotion_mean |
---|---|---|---|
1 | 2022-06-15 01:39:00 | 2022-06-15 01:34:00 | 0 |
1 | 2022-06-15 01:40:00 | 2022-06-15 01:35:00 | 0.75 |
1 | 2022-06-15 01:41:00 | 2022-06-15 01:36:00 | 0.54 |
1 | 2022-06-15 01:42:00 | 2022-06-15 01:37:00 | 0.54 |
1 | 2022-06-15 01:43:00 | 2022-06-15 01:38:00 | 0.54 |
1 | 2022-06-15 01:44:00 | 2022-06-15 01:39:00 | 0.54 |
1 | 2022-06-15 01:45:00 | 2022-06-15 01:40:00 | 0.4 |
1 | 2022-06-15 01:46:00 | 2022-06-15 01:41:00 | 0 |
2 | 2022-07-17 03:20:00 | 2022-07-17 03:15:00 | 0 |
2 | 2022-07-17 03:21:00 | 2022-07-17 03:16:00 | 0 |
2 | 2022-07-17 03:22:00 | 2022-07-17 03:17:00 | 0 |
2 | 2022-07-17 03:23:00 | 2022-07-17 03:18:00 | 0 |
2 | 2022-07-17 03:24:00 | 2022-07-17 03:19:00 | 0 |
2 | 2022-07-17 03:25:00 | 2022-07-17 03:20:00 | 0.5 |