I have the following DataFrame and want to calulate the “share”.
import pandas as pd
d = {"col1":["A", "A", "A", "B", "B", "B"], "col2":["start_amount", "mid_amount", "end_amount", "start_amount", "mid_amount", "end_amount"], "amount":[0, 2, 8, 1, 2, 3]}
df_test = pd.DataFrame(d)
df_test["share"] = 0
for i in range(len(df_test)):
df_test.loc[i, "share"] = df_test.loc[i, "amount"] / df_test.loc[(df_test["col1"] == df_test.loc[i, "col1"]) & (df_test["col2"] == "end_amount"), "amount"].values
This works but is far from efficient. Is there a better way to do my calculation?
This is equivalent to selecting the rows with “end_amount”, then performing a map
per “col1” to then divide “amount”:
s = df_test.loc[df_test['col2'].eq('end_amount')].set_index('col1')['amount']
df_test['share'] = df_test['amount']/df_test['col1'].map(s)
Output:
col1 col2 amount share
0 A start_amount 0 0.000000
1 A mid_amount 2 0.250000
2 A end_amount 8 1.000000
3 B start_amount 1 0.333333
4 B mid_amount 2 0.666667
5 B end_amount 3 1.000000