I have a DataFrame to find the percentage of absolute difference between two sources in python. But when I use the below code, few of the columns are giving the -% (minus percentages)
I have checked the columns showing minus percentage datatypes are same in both the sources.
Can anyone help me to find out why?
# Define the columns you want to process
columns = ['a', 'b', 'c', 'd']
# Create the results DataFrame
results_df = pd.DataFrame()
results_df['date'] = c_df['date']
results_df['id'] = c_df['id']
for col in columns:
# calculating the absolute difference
diff = np.abs(c_df[f'{col}_s1'] - c_df[f'{col}_s2'])
# calculating mean for snowflake columns
norm_diff = c_df[f'{col}_s1'].mean()
# Avoid division by zero
if norm_diff == 0:
percentage = np.zeros(len(diff))
else:
# calculating percentage difference b/w the columns in both datasets
percentage = (diff / norm_diff) * 100
# Round the percentage_difference and add % symbol
results_df[col] = [f"{round(diff)}%" for diff in percentage]
display(results_df)
And the results are something like this.
a b c d
0% 5% 10% -5%
10% 100% -50% 200%