I have two csv files with same column names like this:
File1: (df1)
<code>column1 column2 column3 column4
ABC 100 020 030
DEF 200 040 050
GHI 300 001 002
</code>
<code>column1 column2 column3 column4
ABC 100 020 030
DEF 200 040 050
GHI 300 001 002
</code>
column1 column2 column3 column4
ABC 100 020 030
DEF 200 040 050
GHI 300 001 002
File2: (df2)
<code>column1 column2 column3 column4
ABC 100 060 070
DEF 200 040 090
</code>
<code>column1 column2 column3 column4
ABC 100 060 070
DEF 200 040 090
</code>
column1 column2 column3 column4
ABC 100 060 070
DEF 200 040 090
I am writing a comparison script to generate the file like this:
<code>column1 column2 column3 column4
ABC 100 020 | 060 030 | 070
DEF 200 . 050 | 090
GHI 300 001 002
</code>
<code>column1 column2 column3 column4
ABC 100 020 | 060 030 | 070
DEF 200 . 050 | 090
GHI 300 001 002
</code>
column1 column2 column3 column4
ABC 100 020 | 060 030 | 070
DEF 200 . 050 | 090
GHI 300 001 002
Comparing the combination of colummn1 and column2 to highlight the differences in the other columns and if “Column1” and “Column2” not matched then return the same value of the row (In the case of “GHI”).
My code looks like this:
<code>df = pd.concat([df1,df2], sort=False)
df.set_index(['column1', 'column2'], inplace=True)
df = df.replace(np.nan, '', regex=True)
def report_diff(x):
print(x)
return '.' if x[0] == x[1] else '{} | {}'.format(*x)
changes = df.groupby(level=['column1', 'column2']).agg(report_diff)
display(changes)
</code>
<code>df = pd.concat([df1,df2], sort=False)
df.set_index(['column1', 'column2'], inplace=True)
df = df.replace(np.nan, '', regex=True)
def report_diff(x):
print(x)
return '.' if x[0] == x[1] else '{} | {}'.format(*x)
changes = df.groupby(level=['column1', 'column2']).agg(report_diff)
display(changes)
</code>
df = pd.concat([df1,df2], sort=False)
df.set_index(['column1', 'column2'], inplace=True)
df = df.replace(np.nan, '', regex=True)
def report_diff(x):
print(x)
return '.' if x[0] == x[1] else '{} | {}'.format(*x)
changes = df.groupby(level=['column1', 'column2']).agg(report_diff)
display(changes)
It gives me an error of “index 1 is out of bounds for axis 0 with size 1”
Looking forward.