I have a pandas dataframe that looks like so
Index Key 2010-01 2010-02 2010-03 ... 2020-12
A/B/C foo 0.23 0.44 0 2.1
A/B/C bar 0.43 0.12 0.23 1.2
A/B/C baz 0.25 0.23 0.2 2.5
P/Q/R foo 0.31 0.41 0 2.4
P/Q/R foo 0.33 0.54 0.5 4.2
P/Q/R foo 0.93 0.64 0.99 6.5
The index
is a multi-column index. “foo”, “bar”, “baz” are present for every index.
How do I convert this group of data into individual dataframes that looks like
# dataframe for A/B/C
index foo bar baz
2010-01 0.23 0.43 0.25
2010-02 0.44 0.12 0.23
...
2020-12 2.1 1.2 2.5
Fairly new to pandas, so I tried converting data into dictionaries and working with it. So I have a solution that involves collecting the necessary values and then in the 2nd pass, converting them to their individual dataframes. The pseudo code was like this
# loop over the converted dictionary (as per keys)
For each key, create 'foo', 'bar', 'baz' with empty dicts;
when encountering a row for 'foo', collect all values from col 2010-01 to 2020-12 as a list
do the same for 'bar' and 'baz'. Add to the nested dict that is held by the given key
For the second pass, loop through each key
take the nested dict and create a dataframe using the entire dict and the dates 2010-01 to 2020-12 as the index.
Is there a more panda-ish way to do this.
Is it possible to convert the group obtained by the index A/B/C and transpose it without incurring the hit of performance that comes with transpose?
The actual data in question may have upwards of 10000 such indices (>30k rows).
3
Assuming “Index” the index, use groupby
and transpose
in a dictionary comprehension to create a dictionary of DataFrames:
out = {k: g.set_index('Key').T for k, g in df.groupby(level='Index')}
Output:
{'A/B/C': Key foo bar baz
2010-01 0.23 0.43 0.25
2010-02 0.44 0.12 0.23
2010-03 0.00 0.23 0.20
2020-12 2.10 1.20 2.50,
'P/Q/R': Key foo foo foo
2010-01 0.31 0.33 0.93
2010-02 0.41 0.54 0.64
2010-03 0.00 0.50 0.99
2020-12 2.40 4.20 6.50}
Then you can access each DataFrame by key:
# out['A/B/C']
Key foo bar baz
2010-01 0.23 0.43 0.25
2010-02 0.44 0.12 0.23
2010-03 0.00 0.23 0.20
2020-12 2.10 1.20 2.50
2
Assuming that df
is the initial DataFrame I think you are looking for this:
import pandas as pd
for idx in df["Index"].unique():
print(pd.pivot_table(df[df["Index"] == idx],
values=["2010-01", "2010-02", "2010-03", "2020-12"],
columns=["Key"]))
Of course values list can be derived from the df.columns
.
2