I have a dataframe like below –
|Business Name|Case Number|Violation|Regulation(s)|Payments|
0|NaN|NaN|NaN|30 CFR 1241.60(b)(1) 30 CFR |NaN|
1|NaN|NaN|Business knowingly or willfully|Part 1218,Subparts B, D, E and|NaN|
2|CHI OPERATING CO|CP17‐085|sales months January 2011|30 CFR 1241.50‐52|$142,310|
3|NaN|NaN|Business failed to report production|30 CFR Part 1210, Subpart C|NaN|
4|CROWHEARTENERGYLLC|CP19‐050|Reports(Forms ONRR‐4054)for22production|30CFR1241.50‐52|$5,544|
Can we apply a check on Business Name if NaN then keep it as blank and don’t count that as a row and whenever a value comes keep that as a row and combine all the values from above rows to that row.
Expected output –
|Business Name|Case Number|Violation|Regulation(s)|Payments|
0|CHI OPERATING CO|CP17‐085|Business knowingly or willfully sales months January 2011|30 CFR 1241.60(b)(1) 30 CFR Part 1218,Subparts B, D, E and 30 CFR 1241.50‐52|$142,310|
1|CROWHEARTENERGYLLC|CP19‐050|Business failed to report production Reports(Forms ONRR‐4054)for22production|30 CFR Part 1210, Subpart C 30CFR1241.50‐52|$5,544|
Use Series.bfill
for back filling missing values for groups an aggregate custom lambda function with join
and removing missing values:
out = (df.groupby(df.pop('Business Name').bfill())
.agg(lambda x: ' '.join(x.dropna()))
.reset_index())
print (out)
Business Name Case Number Violation Regulation(s) Payments Unnamed: 6
0 CHI OPERATING CO CP17‐085 Business knowingly or willfully sales months J... 30 CFR 1241.60(b)(1) 30 CFR Part 1218,Subpart... $142,310
1 CROWHEARTENERGYLLC CP19‐050 Business failed to report production Reports(F... 30 CFR Part 1210, Subpart C 30CFR1241.50‐52 $5,544
Solution should be changed if need combine each non missing value separately (rows 3,4 and rows 5,6):
print (df)
Business Name Case Number Violation Regulation(s) Payments
0 NaN NaN NaN 30 CFR 1241.60(b)(1) 30 CFR NaN
1 NaN NaN Business knowingly or willfully Part 1218,Subparts B, D, E and NaN
2 CHI OPERATING CO CP17‐085 sales months January 2011 30 CFR 1241.50‐52 $142,310
3 NaN NaN Business failed to report production 30 CFR Part 1210, Subpart C NaN
4 CROWHEARTENERGYLLC CP19‐050 Reports(Forms ONRR‐4054)for22production 30CFR1241.50‐52 $5,544
5 NaN NaN Business failed to report production 30 CFR Part 1210, Subpart C NaN
6 CROWHEARTENERGYLLC CP19‐050 Reports(Forms ONRR‐4054)for22production 30CFR1241.50‐52 $5,544
out = (df.groupby(df['Business Name'].notna().iloc[::-1].cumsum().iloc[::-1], sort=False)
.agg(lambda x: ' '.join(x.dropna()))
.reset_index(drop=True))
print (out)
Business Name Case Number Violation Regulation(s) Payments Unnamed: 6
0 CHI OPERATING CO CP17‐085 Business knowingly or willfully sales months J... 30 CFR 1241.60(b)(1) 30 CFR Part 1218,Subpart... $142,310
1 CROWHEARTENERGYLLC CP19‐050 Business failed to report production Reports(F... 30 CFR Part 1210, Subpart C 30CFR1241.50‐52 $5,544
2 CROWHEARTENERGYLLC CP19‐050 Business failed to report production Reports(F... 30 CFR Part 1210, Subpart C 30CFR1241.50‐52 $5,544
If in real data is combination strings and numeric columns is possible use:
print (df)
Business Name Case Number Violation Regulation(s) Payments col
0 NaN NaN NaN 30 CFR 1241.60(b)(1) 30 CFR NaN 4
1 NaN NaN Business knowingly or willfully Part 1218,Subparts B, D, E and NaN 1
2 CHI OPERATING CO CP17‐085 sales months January 2011 30 CFR 1241.50‐52 $142,310 0
3 NaN NaN Business failed to report production 30 CFR Part 1210, Subpart C NaN 1
4 CROWHEARTENERGYLLC CP19‐050 Reports(Forms ONRR‐4054)for22production 30CFR1241.50‐52 $5,544 7
5 NaN NaN Business failed to report production 30 CFR Part 1210, Subpart C NaN 1
6 CROWHEARTENERGYLLC CP19‐050 Reports(Forms ONRR‐4054)for22production 30CFR1241.50‐52 $5,544 2
out = (df.groupby(df['Business Name'].notna().iloc[::-1].cumsum().iloc[::-1], sort=False)
.agg(lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ' '.join(x.dropna()))
.reset_index(drop=True))
print (out)
Business Name Case Number Violation Regulation(s) Payments col
0 CHI OPERATING CO CP17‐085 Business knowingly or willfully sales months J... 30 CFR 1241.60(b)(1) 30 CFR Part 1218,Subpart... $142,310 5
1 CROWHEARTENERGYLLC CP19‐050 Business failed to report production Reports(F... 30 CFR Part 1210, Subpart C 30CFR1241.50‐52 $5,544 8
2 CROWHEARTENERGYLLC CP19‐050 Business failed to report production Reports(F... 30 CFR Part 1210, Subpart C 30CFR1241.50‐52 $5,544 3
3