I have sleep data from Apple Health Kit which looks similar to this
type startDate endDate creationDate value unit sourceName sourceVersion
32195 HKCategoryTypeIdentifierSleepAnalysis 2024-06-03 22:40:16+02:00 2024-06-03 22:48:16+02:00 2024-06-04 05:00:42+02:00 HKCategoryValueSleepAnalysisAsleepCore None Christian’s Apple Watch 10.5
32197 HKCategoryTypeIdentifierSleepAnalysis 2024-06-03 22:48:16+02:00 2024-06-03 22:49:46+02:00 2024-06-04 05:00:42+02:00 HKCategoryValueSleepAnalysisAwake None Christian’s Apple Watch 10.5
32199 HKCategoryTypeIdentifierSleepAnalysis 2024-06-03 22:49:46+02:00 2024-06-03 22:56:16+02:00 2024-06-04 05:00:42+02:00 HKCategoryValueSleepAnalysisInBed None Christian’s Apple Watch 10.5
32198 HKCategoryTypeIdentifierSleepAnalysis 2024-06-03 22:49:46+02:00 2024-06-03 22:56:16+02:00 2024-06-04 05:00:42+02:00 HKCategoryValueSleepAnalysisAsleepCore None Christian’s Apple Watch 10.5
32200 HKCategoryTypeIdentifierSleepAnalysis 2024-06-03 22:56:16+02:00 2024-06-03 22:57:16+02:00 2024-06-04 05:00:42+02:00 HKCategoryValueSleepAnalysisAwake None Christian’s Apple Watch 10.5
I would like to get a list of sleep sessions (start, end) with all the stages in between. The example above is a subset of 1 sleep session. Starts at 22:40 and ends at 22:57
Problem I have is how to find the start and the end of each session without specifying a time range. i.e bedtime between 9pm and 12pm. I’d rather find clusters of sleep stages and group them. Usually between sleeps there is a gap of … 12+ hours?
The second issue I have is that there are gaps between sleep stages. An example would be to wakeup at 3am and fall back to sleep at 5am. This 2 hour gap should be considered as HKCategoryValueSleepAnalysisAwake but within the same sleep session.
I did try it before but with erroneous data. It has too many sleep sessions at times with very short sleep times. I am sure there are more efficient ways to find each sleep session. Thinking about island and gap problem here.
def group_sleep_sessions(sleep_df, max_gap=timedelta(hours=20), min_session_duration=timedelta(hours=2)):
sleep_df = sleep_df.sort_values('startDate')
sessions = []
current_session = None
def merge_sessions(s1, s2):
return {
'startDate': min(s1['startDate'], s2['startDate']),
'endDate': max(s1['endDate'], s2['endDate']),
'sourceName': s1['sourceName'],
'stages': sorted(s1['stages'] + s2['stages'], key=lambda x: x['start'])
}
for _, row in sleep_df.iterrows():
new_segment = {
'startDate': row['startDate'],
'endDate': row['endDate'],
'sourceName': row['sourceName'],
'stages': [{'stage': row['value'], 'start': row['startDate'], 'end': row['endDate']}]
}
if current_session is None:
current_session = new_segment
else:
time_diff = row['startDate'] - current_session['endDate']
same_day_or_next = (row['startDate'].date() - current_session['endDate'].date()) <= timedelta(days=1)
if time_diff <= max_gap and same_day_or_next:
# Merge if within max gap and on the same or next day
if row['startDate'] > current_session['endDate']:
current_session['stages'].append({
'stage': 'HKCategoryValueSleepAnalysisAwake',
'start': current_session['endDate'],
'end': row['startDate']
})
current_session = merge_sessions(current_session, new_segment)
else:
# Finalize current session and start a new one
if (current_session['endDate'] - current_session['startDate']) >= min_session_duration:
sessions.append(current_session)
current_session = new_segment
# Add the last session if it meets the minimum duration
if current_session and (current_session['endDate'] - current_session['startDate']) >= min_session_duration:
sessions.append(current_session)
return sessions
to simplify debugging here a dict version of the test data
data = [{'type': 'HKCategoryTypeIdentifierSleepAnalysis',
'startDate': Timestamp('2024-06-03 22:40:16+0200', tz='pytz.FixedOffset(120)'),
'endDate': Timestamp('2024-06-03 22:48:16+0200', tz='pytz.FixedOffset(120)'),
'creationDate': Timestamp('2024-06-04 05:00:42+0200', tz='pytz.FixedOffset(120)'),
'value': 'HKCategoryValueSleepAnalysisAsleepCore',
'unit': None,
'sourceName': 'Christian’s Applexa0Watch',
'sourceVersion': '10.5'},
{'type': 'HKCategoryTypeIdentifierSleepAnalysis',
'startDate': Timestamp('2024-06-03 22:48:16+0200', tz='pytz.FixedOffset(120)'),
'endDate': Timestamp('2024-06-03 22:49:46+0200', tz='pytz.FixedOffset(120)'),
'creationDate': Timestamp('2024-06-04 05:00:42+0200', tz='pytz.FixedOffset(120)'),
'value': 'HKCategoryValueSleepAnalysisAwake',
'unit': None,
'sourceName': 'Christian’s Applexa0Watch',
'sourceVersion': '10.5'},
{'type': 'HKCategoryTypeIdentifierSleepAnalysis',
'startDate': Timestamp('2024-06-03 22:49:46+0200', tz='pytz.FixedOffset(120)'),
'endDate': Timestamp('2024-06-03 22:56:16+0200', tz='pytz.FixedOffset(120)'),
'creationDate': Timestamp('2024-06-04 05:00:42+0200', tz='pytz.FixedOffset(120)'),
'value': 'HKCategoryValueSleepAnalysisInBed',
'unit': None,
'sourceName': 'Christian’s Applexa0Watch',
'sourceVersion': '10.5'},
{'type': 'HKCategoryTypeIdentifierSleepAnalysis',
...
'creationDate': Timestamp('2024-06-04 05:00:42+0200', tz='pytz.FixedOffset(120)'),
'value': 'HKCategoryValueSleepAnalysisAwake',
'unit': None,
'sourceName': 'Christian’s Applexa0Watch',
'sourceVersion': '10.5'}]