Example
import pandas as pd
import numpy as np
t=pd.DataFrame({'c1': {(1, 1, 1): 1, (1, 2, 9): 2, (2, 1, 3): np.nan, (2, 1, 7): 4, (4, 2, 2): 6}, 'c2': {(1, 1, 1): 3, (1, 2, 9): 3, (2, 1, 3): 3, (2, 1, 7): 1, (4, 2, 2): 2}})
print(t["c1"].groupby(level=[0,1]).first())
output
1 1 1.0
2 2.0
2 1 4.0
4 2 6.0
Name: c1, dtype: float64
Desired:
1 1 1 1.0
2 9 2.0
2 1 7 4.0
4 2 2 6.0
Name: c1, dtype: float64
Answer doesn’t need to be a groupby, I feel like there has to be a way to use a slice on the iloc of an index that respects nans
Use dropna
+ groupby.nth
:
t.dropna(subset='c1').groupby(level=[0, 1])['c1'].nth(0)
Another option, modify your code to use groupby.apply
and slice with .iloc[:1]
or .iloc[[0]]
:
(t['c1'].groupby(level=[0,1], group_keys=False)
.apply(lambda x: x.dropna().iloc[:1])
)
Or, if you want to ensure keeping the NaN if a group has one, using groupby.idxmax
:
t.loc[t['c1'].notna().groupby(level=[0, 1]).idxmax(), 'c1']
Output:
1 1 1.0
2 2.0
2 1 4.0
4 2 6.0
Name: c1, dtype: float64
Code
Using groupby
with reset_index
and set_index
could be a way to do it (but it’s not as concise).
n = t.index.nlevels # or set manually n = 3
out = (t.reset_index(level=2)
.groupby(level=[0, 1]).first()
.set_index('level_2', append=True).rename_axis([None] * n)['c1']
)
out
1 1 1 1.0
2 9 2.0
2 1 3 4.0
4 2 2 6.0
You can simply groupby the 3 levels and add dropna at the end:
print(t["c1"].groupby(level=[0,1,2]).first().dropna())
1 1 1 1.0
2 9 2.0
2 1 7 4.0
4 2 2 6.0
1