I have a table named “daily_activity” with 4 columns: PersonalID, Date, TotalSteps, Calories
The table contains data for 35 PersonalIDs. There are duplicate PersonalID rows because the table shows each PersonalID’s TotalSteps and Calories over several Dates. The earliest Date is 3/12, and the latest date is 5/12.
Image of an Example Table
I want to count the average number of days where steps were not tracked, and do the same for calories.
I tried: nrow(daily_activity)
nrow(daily_activity[daily_activity$TotalSteps==0,])
to compare the “total number of rows” to the “total number of rows where there were 0 total steps”. I thought this would let me see the ratio of “total days” to “days not tracked”.
However, I realized that this only works if there is a row for every Date per PersonalID. Unfortunately, some of the PersonalID’s in my table have a row for each date from 3/12-5/12, and others only have a row for some of those dates. I did not realize this at first since there are rows where TotalSteps and Calories are both 0, so I assumed all Dates were in there and a 0 was populated for any days not tracked.
I’m thinking to solve this, I could populate rows with TotalSteps = 0 and Calories = 0 for all the Dates I am missing, but I don’t know how to do that.
How can I do that, or is there a different method I can use to count the average number of days where steps were not tracked?
Aliyah Hall is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.