Question
I am updating the values in a slice of a pandas.DataFrame
by row such that each row of the slice has unique value. I am using pandas version 2.2.3.
I have found an approach that seems to work by using a dictionary of the form {row_index: row_value}
(see Minimal working example 1).
I would like to understand:
- if / why / how the approach in MWE 1 works and if it is a reasonable way of solving the problem.
- if there could be any unexpected behaviour by using this approach
- why the approach in MWE 2 does not work / raises error
- if there are any other more appropriate methods to achieve this.
I can’t find any relevant information about this approach on SO or elsewhere.
My exact use case is slightly different to MWE 1, details of the differences are listed at the bottom, in case they are relevant.
Minimal working example 1 (works)
Here it seems that dictionary values are indeed mapped to the row corresponding to each dictionary key
>>> import pandas as pd
>>> df = pd.DataFrame.from_dict({'a': [None, None], 'b': [None, None], 'c': [None, None]})
>>> df.loc[df.index, df.columns] = {0: 1, 1: 2}
>>> df
a b c
0 1 1 1
1 2 2 2
Minimal working example 2 (raises error)
Here, just by selecting the entire dataframe in a different way, an error is raised.
>>> import pandas as pd
>>> df2 = pd.DataFrame.from_dict({'a': [None, None], 'b': [None, None], 'c': [None, None]})
>>> df2.loc[:, :] = {0: 1, 1: 2}
ValueError: setting an array element with a sequence.
Similar question on SO
This answer to a similar question explains why a dictionary’s keys appear in a row when doing df.iloc[0] = {'a': 1, 'b': 2, 'c': 3}
which is more confusing yet.
Details of what I’m doing
My exact use case differs slightly to MWE 1 in that I have a larger dataframe with some contiguous NaN
values, so I am updating that slice/subset of the dataframe with df.loc[df_index_subset, df_columns_subset]
instead of selecting the entire dataframe with df.loc[df.index, df.columns]
.
However, the behaviour is the same as in Minimal working example 1, it fills the slice of the dataframe by row according to the dictionary’s key-value pairs.
1
Pandas requires index alignment. You should either provide an array/list of the exact size of the slice, or a DataFrame/Series.
In your case you could use:
df2.loc[:, :] = pd.DataFrame({0: 1, 1: 2}, index=df2.columns).T
Output:
a b c
0 1 1 1
1 2 2 2
1
From your description in “Details of what I’m doing”, I guess you’re looking to map index with a dict.
import pandas as pd
import numpy as np
cols = list('abcde')
df = pd.DataFrame(np.zeros((3, len(cols))), columns=cols)
map_dict = {0:1, 1:2, 2:3}
df.loc[:,'b':'e'] = np.array(df.index.map(map_dict)).reshape(3,1)
df
a b c d e
0 0.0 1.0 1.0 1.0 1.0
1 0.0 2.0 2.0 2.0 2.0
2 0.0 3.0 3.0 3.0 3.0
3