I have two pyspark dataframe with 6 columns and 50000 rows each.
comparison = SparkSQLCompare(
spark,
df1,
df2,
join_columns=['col1', 'col2', 'col3', 'col4', 'col4', 'col5', 'col6'],
abs_tol=0, # Absolute tolerance
rel_tol=0, # Relative tolerance
df1_name='df1', # Name of the first dataframe
df2_name='df2' # Name of the second dataframe
)
Now When I do comparison if I take any of one column out my all rows match. If I match all 6 keys they I get Number of rows with some compared columns unequal: 50000 otherwise Number of rows with all compared columns equal: 50000. Interestingly If divide columns in set of 3 columns as keys then still they match when I intersect all 6 columns they match- Any idea whats happening here?