There is a DataFrame with clients raw data. It is necessary to count the number of sessions for every client based on the following conditions:
- if 30 minutes have passed between sessions
- if one of the UTM parameters has changed between sessions
The raw data table is an image: Dataframe. I do not know how to insert dataframe into text here by far 🙂
I’ve started like this:
df['date'] = pd.to_datetime(df['eventTimestamp'], format='mixed')
df = df.sort_values(['client_id', 'date'])
df['diff'] = df.groupby('client_id')['date'].agg(['diff'])
df['diff'] = df['diff'].dt.total_seconds()
Now I would like to know some most effective ways to count sessions. I’m new in Python/Pandas but I need solutions as fast as possible. Should I write the algorithAm in a separate function using the ‘apply’ method, or does Pandas have its own solutions for such cases?
The data link (csv file) – https://drive.google.com/file/d/1zbn2Q_TcGPOLykSHC6gmXnIdz_ATBHrX/view?usp=sharing
Lex is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2