I would like to group some data, and:
- sort groups by size
- drop the size less than 3 groups.
In below sample, I would like to get the group with order z
, y
since size of z
is 4 and size of y
is 3. but drop x
since its size is 2.
import pandas as pd
data = [
['x', 1],
['y', 5],
['z', 1],
['x',43],
['y', 22],
['z', 2],
['y', 31],
['z', 3],
['z', 4],
]
df = pd.DataFrame(data, columns=['name', 'value'])
print(df)
dfg = df.groupby('name')
for name,df in dfg:
print(name)
print(df)
Current output:
name value
0 x 1
1 y 5
2 z 1
3 x 43
4 y 22
5 z 2
6 y 31
7 z 3
8 z 4
x
name value
0 x 1
3 x 43
y
name value
1 y 5
4 y 22
6 y 31
z
name value
2 z 1
5 z 2
7 z 3
8 z 4
expected:
name value
2 z 1
5 z 2
7 z 3
8 z 4
name value
1 y 5
4 y 22
6 y 31
z
A possible solution:
g = df.groupby('name')
[x for _, x in g if len(x) >= 3]
Output:
[ name value
1 y 5
4 y 22
6 y 31,
name value
2 z 1
5 z 2
7 z 3
8 z 4]
In case you want the resulting groups sorted by size:
g = df.groupby('name')
sorted([x for _, x in g if len(x) >= 3], key=lambda y: -len(y))