I have a multi-index pandas dataframe, and I need assign the value to a slice of the dataframe (based on one index), with a calculation on another slice of the same dataframe (based on the same index).
I have tried to assign the values using loc, but the entire slice ends up NaN.
I have written this simple example code to make clear the problem I’m having.
Sample dataframe:
iterables = [ [1, 2, 3, 4], ["A", "B"]]
idx = pd.MultiIndex.from_product(iterables, names=["tstep", "trial"])
df = pd.DataFrame(index=idx)
df['Value'] = 0.0
df.loc[(1,'A'), 'Value'] = 1.0
df.loc[(1,'B'), 'Value'] = 2.0
:
Value
tstep trial
1 A 1.0
B 2.0
2 A 0.0
B 0.0
3 A 0.0
B 0.0
4 A 0.0
B 0.0
Now let’s say I want to set all elements with index tstep = 2 equal to those with tstep = 1, incremented by 1. I wrote:
df.loc[2]['Value'] = df.loc[1]['Value'] + 1.0
This is the desired output:
Value
tstep trial
1 A 1.0
B 2.0
2 A 2.0
B 3.0
3 A 0.0
B 0.0
4 A 0.0
B 0.0
This is the actual output I get:
Value
tstep trial
1 A 1.0
B 2.0
2 A NaN
B NaN
3 A 0.0
B 0.0
4 A 0.0
B 0.0
Note that if I assign the slice to a single scalar, it works fine (as when I create this sample dataframe). I have tried to swap the [‘Value’] and the .loc[2] with the same result.
What I am doing wrong?
You could slice with xs
, rename
to change the level and update
the DataFrame in place:
df.update(df.xs(1, level='tstep', drop_level=False).rename({1: 2})+1)
Generic code
source = 1
target = 2
offset = 1
df.update(df.xs(source, level='tstep', drop_level=False)
.rename({source: target})+offset)
Output:
Value
tstep trial
1 A 1.0
B 2.0
2 A 2.0
B 3.0
3 A 0.0
B 0.0
4 A 0.0
B 0.0
The reason your code is not working is due to pandas’ intrinsic alignment:
In [14]: df.loc[1, 'Value']
Out[14]:
trial
A 1.0
B 2.0
Name: Value, dtype: float64
In [15]: df.loc[2, 'Value']
Out[15]:
trial
A 0.0
B 0.0
Name: Value, dtype: float64
In [16]: df.loc[[2], 'Value']
Out[16]:
tstep trial
2 A 0.0
B 0.0
Name: Value, dtype: float64
In [17]: df.loc[1, 'Value'].array
Out[17]:
<NumpyExtensionArray>
[1.0, 2.0]
Length: 2, dtype: float64
a MultiIndex for correctness needs to be aligned on all indexes – when assigning back tstep 1 to tstep 2, pandas notices that the indices are obviously different and slots a null. In this case, you know both 1 and 2 tsteps share the same trial indices, with the same length, and the same order – remove the indices from the right hand side assignment, and pandas will duly pass it as is:
In [28]: df.loc[2, 'Value'] = df.loc[1, 'Value'].array + 1
In [29]: df
Out[29]:
Value
tstep trial
1 A 1.0
B 2.0
2 A 2.0
B 3.0
3 A 0.0
B 0.0
4 A 0.0
B 0.0