I have imported, sorted, and combined 7 files, one per day of week. From here I have identified there are some duplicates and now searching for specifics of a program that scraped a daily listing of songs where there may be a commercial break (every 5 minutes) or gap between files/records longer than 13 hours. Each time going through, I cannot find where my datetime or format does not allow a .dt datetimelike values to answer some of these questions below. Pretty much lost how to start a few of these as I am still a bit of a newbie learning about Python.
- Find missing records – gap of 13 or more hours
- Using the average song length in minutes – found as 3.18 minutes, find the average commercial duration by subtracting the average length from the delta to the next song start.
- How many songs are played in a 24-hour period? Any difference between day and night?
Between weekdays and weekends? - How many hours of commercials (or conversely, music) are played in a 24-hour period?
Any difference: day vs. night, weekdays vs. weekends? - Describe the patterns of commercial breaks, e.g., how many breaks, usually at what
times, etc. - During the 1-week period, list
a. top 10 songs and their number of air plays
b. top 10 artists and their number of air plays
c. top 10 artists in terms of number of distinct songs (i.e., for each artist, determine
number of distinct songs)
1 Started with:`
df['time_interval_hr'] = df['ts'].diff().fillna(timedelta(0)).apply(lambda x: x.total_seconds() // 3600
From here, how to show only those rows with >= 13 hour difference. I have tried several versions but either get a .dt needs datetimelike values or other errors. Otherwise, I get a full datetime instead of hours alone.
2
3 Started with:
`
df['24-HourCount'] = df.transform(lambda x:
len(df[(df['ts'].between(x['ts'] - timedelta(days=1),
x['ts']))]['artist'].unique()), axis=1)
df = df.set_index('ts')
df = df[['24-HourCount']].resample('30T').max()
`
How can I count through each file to get an average of how many songs are played?
4 Started with same as #3 above, but figure I would need to do some aggregation to sum commercial time from the average song length.
5 Not sure how to start, but would be interested in seeing this figure
6 Over a week period, same as #5, interested to see this billboard type figure.