I’ve got a dataframe that looks a bit like this although there’s more columns.
Client ID Fulfilled date
0 309032 2017-05-04
1 309032 2017-06-29
3 331793 2017-07-19
5 319659 2017-05-11
6 321682 2017-05-16
I want to create a flag column is_first_time for first time visitor.
Thought the logic was simple:
def first_time_visitor(self):
# Create a copy of 'self.df' and sort it by 'Client ID' and 'Fulfilled date'
temp_df = self.df.copy()
temp_df.reset_index(inplace=True) #thought this would create a column called index that I could use later.
temp_df.sort_values(['Client ID', 'Fulfilled date'], inplace=True)
# Create a new column in 'temp_df' that marks the first occurrence of each client_id
temp_df['is_first_time'] = ~temp_df['Client ID'].duplicated(keep='first')
# Convert the boolean values to integers
temp_df['is_first_time'] = temp_df['is_first_time'].astype(int)
temp_df = temp_df.sort_values("index")
I then thought I could do something like:
self.df['is_first_time'] = temp_df['is_first_time']
But I get a key error
KeyError: 'index'