I have the following code using exported data from Apple Health. The data is obtained by exporting the Apple Health data to an export.zip
file, and then you’ll see in the code I’m extracting the apple_health_export/export.xml
file and importing it as a DataFrame
.
import zipfile
import pandas
import matplotlib.pyplot as plt
import numpy
with zipfile.ZipFile('/Users/steven/Downloads/export.zip') as myzip:
with myzip.open('apple_health_export/export.xml') as myfile:
x = pandas.read_xml(myfile, xpath='//Record', attrs_only=True,parse_dates=["creationDate","startDate","endDate"])
x.value=pandas.to_numeric(x.value,errors='coerce')
x = x[x.value.notnull()]
data = x[x.type == 'HKQuantityTypeIdentifierStepCount']
plt.figure()
plt.rcParams.update({'font.family':'Avenir'})
data.plot(title='Daily Steps',grid=True,x="endDate",y="value",kind="scatter",fontsize="8",figsize=(11,5),xlim=(pandas.to_datetime('2024-08-01'),pandas.to_datetime('today')))
It’s easy enough to plot the individual data points as shown in the last line above, but I’m running into a problem when trying to plot them grouped by day:
df = data.groupby(pandas.Grouper(key='endDate', axis=0, freq='D')).sum('value')
value
endDate
2023-11-01 00:00:00-04:00 6284.0
2023-11-02 00:00:00-04:00 3477.0
2023-11-03 00:00:00-04:00 522.0
2023-11-04 00:00:00-04:00 760.0
2023-11-05 00:00:00-04:00 14220.0
... ...
2024-09-07 00:00:00-04:00 916.0
2024-09-08 00:00:00-04:00 5981.0
2024-09-09 00:00:00-04:00 1012.0
2024-09-10 00:00:00-04:00 14018.0
2024-09-11 00:00:00-04:00 298.0
[316 rows x 1 columns]
If I try data.groupby(pandas.Grouper(key='endDate', axis=0, freq='D')).plot('value')
, I get a chart for each day, rather than a single chart with a data point for each day along the x axis.
Count me a dunce, but how do I get this grouped data into a single chart (short of pulling all this into a database and pulling using SQL GROUPBY
, as I’m trying to avoid additional steps)?
1
You still need to aggregate, then you can plot:
(data.groupby(pd.Grouper(key='endDate', axis=0, freq='D')).sum('value')
.plot()
)
Output:
If you want to have multiple lines based on groups (for example one line for each month and days of the month as X-axis), you first need to rework your dataset (e.g. with pivot
):
(data.assign(period=data['endDate'].dt.to_period('M'),
day=data['endDate'].dt.day,
)
.pivot_table(index='day', columns='period', values='value',
aggfunc='sum')
.plot()
)
Or using seaborn
:
import seaborn as sns
sns.lineplot(data.assign(period=data['endDate'].dt.to_period('M'),
day=data['endDate'].dt.day),
x='day', hue='period', y='value')
Output:
1