I have a very large dataset called bin_df.
Using pandas and the following code I’ve assigned sub-total “Total” to each group:
bin_df = df[df["category"].isin(model.BINARY_CATEGORY_VALUES)]
bin_category_mime_type_count_df = (
bin_df.groupby(["category", "mime_type"])["mime_type"]
.count()
.reset_index(name="Count")
)
The output from bin_category_mime_type_count_df :
category mime_type Count
1 application/x-executable 19
1 application/x-pie-executable 395
1 application/x-sharedlib 1
2 application/x-sharedlib 755
3 application/x-sharedlib 1
6 application/x-object 129
Then:
bin_category_total_count_df = (
bin_category_mime_type_count_df.groupby(["category", "mime_type"])[
"Count"
]
.sum()
.unstack()
)
bin_category_total_count_df = (
bin_category_total_count_df.assign(
Total=bin_category_total_count_df.sum(1)
)
.stack()
.to_frame("Count")
)
bin_category_total_count_df["Count"] = (
bin_category_total_count_df["Count"].astype("Int64").fillna(0)
)
This produces the following (it is sorted by category by default):
Count
category mime_type
1 application/x-executable 19
application/x-pie-executable 395
application/x-sharedlib 1
Total 415
2 application/x-sharedlib 755
Total 755
3 application/x-sharedlib 1
Total 1
6 application/x-object 129
Total 129
I would like it to be sorted by “Total” and then within a category I would like it to be sorted by mime_type Count:
Count
category mime_type
2 application/x-sharedlib 755
Total 755
1 application/x-pie-executable 395
application/x-executable 19
application/x-sharedlib 1
Total 415
6 application/x-object 129
Total 129
3 application/x-sharedlib 1
Total 1
What kind of function should I look at to get the desired result?