I have two CSV files that I want to compare their contents and print the contents that doesn’t have the same data. Most contents will have the same name but not all their data are accurate. I want to print those whose data are not accurate. Example; the file have the same name but their amount may be different or the same. I want to highlight those ones whose amount are not the same.
I tried using python but it is throwing this error.
C:UsersuserPycharmProjectsLearning.venvScriptspython.exe C:UsersuserPycharmProjectsLearningmain.py
Columns of df1: Index(['INSTITUTION: ', 'MY INSTITUE', 'Unnamed: 2',
'Unnamed: 3', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7',
'Unnamed: 8'],
dtype='object')
Columns of df2: Index(['Account No', 'Customer Name', 'Category Code', 'Category', 'Currency',
'Online Balance', 'Account Officer'],
dtype='object')
Traceback (most recent call last):
File "C:UsersuserPycharmProjectsLearningmain.py", line 35, in <module>
compare_csv_files(file1, file2)
File "C:UsersuserPycharmProjectsLearningmain.py", line 14, in compare_csv_files
merged_df = pd.merge(df1, df2, on='Name', suffixes=('_File1', '_File2'), how='outer', indicator=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:UsersuserPycharmProjectsLearning.venvLibsite-packagespandascorereshapemerge.py", line 170, in merge
op = _MergeOperation(
^^^^^^^^^^^^^^^^
File "C:UsersuserPycharmProjectsLearning.venvLibsite-packagespandascorereshapemerge.py", line 794, in __init__
) = self._get_merge_keys()
^^^^^^^^^^^^^^^^^^^^^^
File "C:UsersuserPycharmProjectsLearning.venvLibsite-packagespandascorereshapemerge.py", line 1297, in _get_merge_keys
right_keys.append(right._get_label_or_level_values(rk))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:UsersuserPycharmProjectsLearning.venvLibsite-packagespandascoregeneric.py", line 1911, in _get_label_or_level_values
raise KeyError(key)
KeyError: 'Name'
Process finished with exit code 1
This is my code.
import pandas as pd
def compare_csv_files(file1, file2):
try:
# Load CSV files into pandas DataFrames
df1 = pd.read_csv(file1)
df2 = pd.read_csv(file2)
# Print columns to verify their names and contents
print("Columns of df1:", df1.columns)
print("Columns of df2:", df2.columns)
# Merge DataFrames on common columns ('Name' assumed as the merge key)
merged_df = pd.merge(df1, df2, on='Name', suffixes=('_File1', '_File2'), how='outer', indicator=True)
# Filter rows where amounts are different
differences = merged_df[merged_df['_merge'] == 'both']
differences = differences[differences['Amount_File1'] != differences['Amount_File2']]
# Print or highlight the differences
if not differences.empty:
print("Differences found:")
print(differences[['Name', 'Amount_File1', 'Amount_File2']])
else:
print("No differences found.")
except FileNotFoundError:
print("One or both files not found. Please check the file paths.")
# Example usage:
if __name__ == "__main__":
# Update file paths with the actual paths on your computer
file1 = r'C:UsersuserDesktopSavings_Current_Accounts.csv' # Example on Windows
file2 = r'C:UsersuserDesktopEM_GIC_LIST_ACCOUNT_ICL_ALL(1).csv' # Example on Windows
compare_csv_files(file1, file2)
2