I have two Dataframes.
Dataframe1(df1): has 4 columns as shown below.
X | Y | A( in days) | B(sum) |
---|---|---|---|
a | aa | 7 | |
a | bb | 9 | |
b | aa | 36 | |
c | dd | 29 |
Column X and Column Y are strings and Column A in days and another column B that is blank and needs to be filled with sum of rows from dataframe 2,the number of rows to be summed is based on how days translates to week in column B.
Dateframe 2 (df2):
X | Y | Week | C(weekly) |
---|---|---|---|
a | aa | WK1 | 10 |
a | aa | WK2 | 23 |
a | aa | WK3 | 21 |
a | aa | WK4 | 2 |
a | aa | Wk5 | 5 |
a | bb | WK1 | 10 |
a | bb | WK2 | 7 |
a | bb | WK3 | 14 |
b | aa | WK1 | 10 |
b | aa | WK2 | 5 |
b | aa | WK3 | 4 |
b | aa | WK4 | 8 |
b | aa | Wk5 | 7 |
b | aa | Wk6 | 18 |
b | aa | Wk7 | 3 |
b | aa | WK8 | 7 |
c | dd | WK1 | 10 |
c | dd | WK2 | 5 |
c | dd | WK3 | 7 |
c | dd | WK4 | 14 |
c | dd | WK5 | 7 |
c | dd | WK4 | 21 |
Dataframe 2 also has four columns. Column X and Column Y are string and this dataframe has Week numbers and weekly which is sum of weekly units.
I want to fill the values in column B of df1 based on column C in df2 like below:
For first row in df1 when X=a and Y =aa and A=7 days then from df2 find corresponding matching values of a and aa in columns X and Y and then use just WK1 values (As 7 days is 1 week so just WK1 of C from df2) so B=10 in df1
For second row in df1 when X=a and Y =bb and A=9 days then from df2 find corresponding matching values of a and bb in columns X and Y and then use WK1 + (2/7)WK2 (As 7 days is 1 week plus 2 additional days i.e. a fraction (2/7th) of week 2 value so 10+7(2/7) giving B=12
For third row in df1 when X=b and Y =aa and A=36 days then from df2 find corresponding matching values of b and aa in columns X and Y and then use WK1+WK2+WK3+WK4+WK5+(1/7)WK6 (As 36 days is 5 weeks plus 1 additional day i.e. a fraction (1/7th) of week 6 value so 10+5+4+8+7+18(1/7) giving B=36.5
and so on.
df1 (Completed):
X | Y | A( in days) | B(sum) | Methodology |
---|---|---|---|---|
a | aa | 7 | 10 | a-aa-WK1-10 |
a | bb | 9 | 10+7*(2/7)=12 | a-bb- Wk1+(2/7)*WK2 |
b | aa | 36 | 10+5+4+8+7+18*(1/7)=36.5 | b-aa-WK1+WK2+WK3+WK4+WK5+(1/7)*WK6 |
c | dd | 29 | 10+5+7+14+(1/7)*7=37 | c-dd-WK1+WK2+Wk3+(1/7)*WK4 |
PS: I had an earlier post on this where I asked this question without the requirements for two columns X and Y in df1 to match with those in df2 –it was only after working on this for a little bit did i realize that I will need to do this with a match with the columns X and Y so reposting the additional requirement. I did gain very valuable insights from my previous post but I have to get the above solved to complete my task.
So please don’t close this as a duplicate.
I am a newbie to python and programming so am still learning all the various concepts – please help.
Based on answer I got from my previous post:
Summing rows in a Pandas DataFrame where the number of rows summed is based on columns values in a different DataFrame
I tried doing the following:
df1a= df1.groupby([‘X’,’Y’]).agg({‘A’:mean})
and df2a=df2.groupby([‘X’,’Y’]).agg({‘C’:mean}).
As these are unique values I get the same values as original when I take the mean.
Then used the function from earlier post: df1a[‘B’] = df1a[‘A’].apply(cust_sum, ref=df2a[‘C’]).
But I get an error “cannot do positional indexing on MultiIndex with these indexers [1.0] of type float”
One_more_time is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.