Hi all I want to take the differences between two dataframes,
Dataframe A Dataframe B
-------------------- ------ ----------------------------
No|Name| iD| address| mail No|Name| iD| address | mail
-------------------- ------ -----------------------------
1 | Raj | 01 | xxxxx 1 | Raj | 01 | xxxxx | NULL
2 | Kam | 02 | yyyyy 2 | Kam | 02 | yyyyy | NULL
3 | Buv | 03 | zzzzz 3 | Buv | 03 | lllll | NULL
4 | Ram | 04 | kkkkk
5 | Buv | 05 | ppppp
here I have two dataframes, Dataframe A has 5 rows, but Dataframe B has 3 rows, I want to get output of different entries in this dataset, for example ,
here expected output is :
3 | Buv | 03 | zzzzz
3 | Buv | 03 | lllll --- 3 is icluded because address content is mismatch.
4 | Ram | 04 | kkkkk
5 | Buv | 05 | ppppp
I tried to concat both dataframes and tried to take the result out but I cant able to find the exected result
df_diff = pd.concat([df1, df2]).drop_duplicates(keep=False)
And on another hand as Dataframe A has whitespace and DataframeB has NULL value instead of whitespace,
I used df1 = df1.replace(r'^\s*$', 'NULL', regex=True).replace({np.nan: 'NULL'})
to fix this NULL issue
The output I expect is :
3 | Buv | 03 | lllll --- 3 is icluded because address content is mismatch.
4 | Ram | 04 | kkkkk
5 | Buv | 05 | ppppp