This is my dataframe:
id url
25 google.com/main
25 google.com/buy
25 google.com/videos
144 google.com/buy
144 google.com/videos
144 google.com/pictures
992 google.com/main
992 google.com/buy
992 google.com/videos
I would like to delete all rows after url “google.com/buy” for each id. I want this:
id url
25 google.com/main
25 google.com/buy
144 google.com/buy
992 google.com/main
992 google.com/buy
I think to use “for” or “def”, but i can not undestand how
2
You could use groupby.transform
with a reverse cummax
and boolean indexing:
out = df[df.groupby('id')['url']
.transform(lambda x: x[::-1].eq('google.com/buy').cummax())]
Output:
id url
0 25 google.com/main
1 25 google.com/buy
3 144 google.com/buy
6 992 google.com/main
7 992 google.com/buy
Intermediates:
id url .eq('google.com/buy') reverse cummax
0 25 google.com/main False True
1 25 google.com/buy True True
2 25 google.com/videos False False
3 144 google.com/buy True True
4 144 google.com/videos False False
5 144 google.com/pictures False False
6 992 google.com/main False True
7 992 google.com/buy True True
8 992 google.com/videos False False
Note that if you have several “google.com/buy” in the same id, you’ll get all the rows up to the last match. If you want to delete after the first match you could use a forward cummax
and shift
:
out = df[df.groupby('id')['url']
.transform(lambda x: ~x.eq('google.com/buy')
.cummax().shift(fill_value=False))]
Intermediates (with a more complex input example):
id url .eq('google.com/buy') cummax shift ~
0 25 google.com/main False False False True
1 25 google.com/buy True True False True
2 25 google.com/videos False True True False
3 144 google.com/buy True True False True
4 144 google.com/videos False True True False
5 144 google.com/pictures False True True False
6 992 google.com/main False False False True
7 992 google.com/buy True True False True
8 992 google.com/videos False True True False
9 992 google.com/buy True True True False
10 992 google.com/pictures False True True False
1