I have a complex dataframe with multiple columns. All of them being MultiIndex
based. At some point I wanted to be quite specific when it comes to estimating some metrics so I started experimenting with the .groupby
method. I can manage to do the basics: 1) computing the aggregation method on the whole dataframe or 2) computing it for one specific column. However, I am interested in computing the aggreagtion method by indicating some of the names within the first column levels. This is quite easy to do when there is just a single level within the columns. In order to be understood, I created the following MRO that reproduces my idea and the errors I am getting:
import numpy as np
import pandas as pd
columns = pd.MultiIndex.from_tuples(
[
("Dimensions", "x"),
("Dimensions", "y"),
("Dimensions", "z"),
("Coefficient", ""),
("Comments", ""),
],
names=["Category", "Details"],
)
df = pd.DataFrame(index=range(11), columns=columns)
df[("Dimensions", "x")] = np.random.randint(1, 100, size=11)
df[("Dimensions", "y")] = np.random.randint(1, 100, size=11)
df[("Dimensions", "z")] = np.random.randint(1, 100, size=11)
df[("Coefficient", "")] = np.random.randint(1, 50, size=11) # Coefficient como entero aleatorio
df[("Comments", "")] = np.random.choice(["Good", "Average", "Bad"], size=11)
df["Comments"] = df["Comments"].astype("category")
# Basic metrics
print(df.groupby("Comments").mean()) # It works
print(df.groupby("Comments")["Dimensions"].mean()) # It works
# Selecting multiple columns within a MultiIndex based one. Different ideas I tried:
df.groupby("Comments")["Dimensions", "Coefficient"].mean() # It does not work
df.groupby("Comments")[["Dimensions", "Coefficient"]].mean() # It does not work
df.groupby("Comments").agg({"Dimensions": "mean", "Coefficient": "mean"}) # It does not work
If you use print(df.columns)
you’ll see the true column names are tuples rather than single strings.
Try this:
df.groupby("Comments")[[('Dimensions', 'x'), ('Dimensions', 'y'), ('Dimensions', 'z'), ('Coefficient', '')]].mean()
Category Dimensions Coefficient
Details x y z
Comments
Average 35.00 55.166667 59.333333 21.833333
Bad 81.75 24.250000 45.750000 35.750000
Good 36.00 1.000000 42.000000 20.000000
2