I am trying to figure out python code that works across two csv files. the objective is to find all rows in source file that have threshold value greater than or equal to 50% but are missing in destination file that can then be manually copied over to destination file.
here is the code.
import pandas as pd
# Define file paths (replace with your actual paths)
source_file = "C:/Users/sharsa07/Desktop/pipeline/gso_orb_ticket_list.csv"
dest_file = "C:/Users/sharsa07/Desktop/pipeline/gso_pipeline.csv"
# Read excel files into DataFrames
df_source = pd.read_csv(C:/Users/sharsa07/Desktop/pipeline/gso_orb_ticket_list.csv)
df_dest = pd.read_csv(C:/Users/sharsa07/Desktop/pipeline/gso_pipeline.csv)
# Find the common column name (assuming the same column name in both files, case-insensitive)
common_col = "a" # Assuming column names are the same (case-insensitive)
# Merge DataFrames based on the common column (outer join to keep unmatched rows)
merged_df = df_source.merge(df_dest[[common_col]], how="outer", on=common_col.lower())
# Calculate threshold value based on 'T' column mean in the source DataFrame
threshold_value = df_source['T'].mean() * 0.5
# Filter merged DataFrame to rows where 'T' is greater than or equal to the threshold and source column is missing in destination
filtered_df = merged_df[(merged_df['T'] >= threshold_value) & (merged_df[common_col.lower()].isna())]
# Get source column names from the filtered DataFrame (excluding the common column)
source_cols = set(filtered_df.columns) - {common_col.lower()}
# Print the column names that meet the criteria
print("Columns to be checked:", source_cols)
I get this error message. Can someone please help me debug this. thanks
Cell In[2], line 8
df_source = pd.read_csv(C:/Users/sharsa07/Desktop/pipeline/gso_orb_ticket_list.csv)
^
SyntaxError: invalid syntax
Can someone please help me debug this piece of code?
Satyajit Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.