I have a table of start and stop times for jobs, that I would like to plot. The input table looks like this:
job_started_at | job_finished_at | position |
---|---|---|
2024-04-29 18:35:09 | 2024-04-29 19:36:25 | 1 |
2024-04-29 18:33:19 | 2024-04-29 20:34:40 | 2 |
2024-04-29 19:37:17 | 2024-04-29 19:44:51 | 1 |
2024-04-29 21:12:35 | 2024-04-29 21:31:31 | 2 |
2024-04-29 20:31:52 | 2024-04-29 20:53:23 | 1 |
with multiple entries per day. There are 2 machines that handle jobs (indicated by position 1 and 2), thus the time intervals can not overlap for a given position.
I want to show in a histogram the workload of each machine per hour for each day in the week (Mon, Tue, Wed, Thu, Fri, Sat, Sun). I was thinking about 7 different plots (1 per day) in which each hour gets one bar of the histogram. So if a job starts at 12:30 and finishes at 13:15 it should count 50% towards the 12h bar and 25% towards the 13h bar for example.
Since the time intervals can overlap the hour mark, I don’t know how to handle this problem.
I was thinking about the following workflow:
- add a column that indicates the weekday of the job (for grouping later)
- split all jobs on the hour mark (so 12:30 – 13:15 turns into 2 rows -> 12:30-13:00 and 13:00-13:15), this also has to work for multiple splits in one job
- calculate usage for each job (percentage of workload (for example 12:30-13:00 -> 50%)
- summarize by weekday and hour and take the mean of all workloads
This also has the problem that the machine might not work at all for multiple hours, but it should still count as 0% for the mean calculation above.
Do you have better ideas or can help me implement my idea?
Thanks a lot!