Hoping that the title makes some sense. I’m essentially trying to pull data via the GA4 Reporting API in Python, but since this API has a 10k rows limit, I need to select smaller time ranges (1 month at a time versus a 12 months, for example) and concatenate the resulting 1 month dataframe together to create the final dataset.
Currently the (probably inappropriate) solution I’ve found is to create a dataframe with 2 columns, “starting_date” and “ending_date”. I then built the API request to loop over each row of this dataframe to fetch the starting_date and ending_date of the API request (always 1 month’s worth of time) dynamically, and append the result of each loop to a new dataframe which I can use.
I keep reading that iterating through dataframes is a big nono. What’s the best alternative to achieve what I’m trying to do more efficiently?
Here’s the code example:
#Create the date range dataframe
def create_date_dataframe(start_date='2022-11-01'):
start_date = datetime.strptime(start_date, '%Y-%m-%d')
end_date = datetime.now()
date_range = pd.date_range(start=start_date, end=end_date, freq='MS')
df = pd.DataFrame({
'starting_date': date_range,
'ending_date': date_range + pd.offsets.MonthEnd(0)
})
return df
date_df = create_date_dataframe().reset_index()
#Start the API request function
pre_final_data= pd.DataFrame()
final_data = pd.DataFrame()
def core_kpi_pipeline():
global pre_final_data, final_data
for index, row in date_df.iterrows():
property_id = "#########"
starting_date = row['starting_date'].strftime("%Y-%m-%d")
ending_date = row['ending_date'].strftime("%Y-%m-%d")
#etc....