I’m comparing to excel sheets and I created dataframes from an identical column from each file. I’m trying to output the compared dataframes to a new excel file however I only want the unique values from each column(a last week vs what changed this week kind of deal). The columns themselves will never have any duplicate data and .drop_duplicates isn’t giving the desired outcome. Is there a method or arguement I’m missing to make this work?
Here’s my Script:
import pandas as pd
import numpy as np
df1 = pd.read_excel('weeklyreport1.xlsx', index_col=None, na_values=['NA'], usecols="AH")
df2 = pd.read_excel('weeklyreport2.xlsx', index_col=None, na_values=['NA'], usecols="AH")
df_merged = pd.merge(df1, df2, how='inner', left_index=True, right_index=True, suffixes=('_previous_week', '_current_week'))
final = df_merged.drop_duplicates(subset=['hostname_previous_week', 'hostname_current_week'], keep=False)
final.to_excel('weeklychanges.xlsx')
This is what I get
vs
What I want
user25179799 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.