How to get difference between 2 pandas dataframes (symmetric difference)?
import pandas as pd
a = pd.DataFrame({'a': [1, 2], 'b': ['x', 'y']})
b = pd.DataFrame({'a': [1, 2, 3], 'b': ['x', 'z', '']})
result = pd.DataFrame({'a': [2, 2, 3], 'b': ['y', 'z', ''], 'source': ['a', 'b', 'b']})
Visual
a b
0 1 x
1 2 y
a b
0 1 x
1 2 z
2 3
Out[103]:
a b source
0 2 y a
1 2 z b
2 3 b
Attempted solution seems too complicated
diff_a = pd.concat([a, b, b]).drop_duplicates(keep=False)
diff_a['source'] = 'a'
diff_b = pd.concat([b, a, a]).drop_duplicates(keep=False)
diff_b['source'] = 'b'
out = pd.concat([diff_a, diff_b]).reset_index(drop=True)