I have to analyze a number of audio recordings of the bird singing activity of various urban parks; each recording was ran through BirdNET and the software returned a dataset with the start/end (in seconds) of every bird song it recognized of a certain species (ie. Blackbird) . Starts at 0:00, ends at 9.00. Like this:
Start.sec | End.sec |
---|---|
0.1 | 3.5 |
56.4 | 67.2 |
105.5 | 111.3 |
….. | ….. |
5403.2 | 5408.7 |
….. | ….. |
10007.2 | 10013.4 |
etc. until the end of the recording. One dataset for each park/site.
Now, I have to study the possible correlation between the temporal patterns of the singing activity and some independent variables associated with each park (traffic noise, artificial light…); so I believe I’m supposed to choose and calculate some dependent variables that summarize the recordings for each site, so I can put them in the correlation models.
However, someone told me “the difficult part is figuring out which variables to use”. I’m confused, I thought it would be really simple?
I was going to take the frequency distribution of every start time (second) and just calculate the mean/median/maybe some quartiles. Is that not right?
I also considered using the “most active hour” by finding the 3.600 second interval containing the max number of elements, but I can’t understand if it would be any more accurate.
Giulia Esposito is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.