I have multiple csv files having data as follows:
Sno,Country,2017-2018
1,AFGHANISTAN,287.1
2,ALGERIA,1286.4
3,AMERI SAMOA,9.09
4,ANGOLA,442.38
5,ARGENTINA,13881.75
6,AUSTRALIA,875706.78
7,AUSTRIA,10.45
8,AZERBAIJAN,147.87
.... so on
The 3rd column in each csv file denotes the data for that fiscal year. I have data across multiple years, 2014-2024 specifically, that I want to combine into 1 combined csv file.
Can someone help me figure out how to combine the csv files using Python and pandas?
I tried using pandas to concat the tables but the values do not get appended in the same row.
For example, Angola has the value 442.38 in column 2017-2018 but 0 in 2018-2019. But the value in year 2018-2019, 170.53, is added as a new row later in the file. I want the values for a specific country to be in the same row. Infact, the first year values are stored earlier and all the values in the next file are appended later in the csv.
import pandas as pd
import glob
file_path = 'file_name.csv'
csv_files = glob.glob(file_path)
# combined_df = pd.DataFrame()
combined_df = pd.read_csv("combined_data.csv")
for file in csv_files:
df = pd.read_csv(file)
if combined_df.empty:
combined_df = df
else:
combined_df = pd.concat([combined_df,df],ignore_index=True)
combined_df = combined_df.fillna(0)
combined_df.to_csv('combined_data.csv', index=False)
Nilay Kamat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.