I am getting to know the pandas group-by functionality, currently focusing on its application to a DataFrame rather than a Series object. I am also starting simple, grouping by one dataframe column. I found group-by functions here.
I noticed that some aggregation functions like size()
return one result column per dataframe while others like count()
return one column per dataframe column. This makes sense, since count()
can vary between columns.
The top part of the documentation for each aggregation function doesn’t say whether it returns 1 column/dataframe for 1 column for each dataframe column. One would have to infer this from the description of the function’s purpose, look through the examples, and/or experiment.
Is there a quick way to glance a specification and know right away how many columns are returned?
Afternote: I’m running into an ambigous situation. If I invoke .agg(['size','count'])
on a dataframe, count()
behaves differently than described above and does not return one column per dataframe column. Instead, it returns 1 column for the entire dataframe, just like size()
. I’m hoping that there is a clear and unambiguous way to determine from the documentation how the functions behave under the various circumstances that they support. In this case, I don’t even know what column is being subjected to the count()
function.