I have several spreadsheets with tax data for each month. I organized them into folders labeled after each year (2009, 2010, etc.). Each folder contains 12 spreadsheets for each month.
There are two problems:
- Inconsistency. Spreadsheets may be organized differently.
Example: Picture 1
Picture 2
What is the best way to import spreadsheets without including unnecessary text (highlighted in red)?
- What is the best approach to combine spreadsheets into a single dataframe that contains tax data for the whole year.
I tried using glob to import spreadsheets and append them to a single DF.
path = r'C:UsersAsusDownloadsData2009'
all_files = glob.glob(os.path.join(path, "*.xls"))
li = []
for filename in all_files:
df = pd.read_excel(filename, index_col=None, sheet_name='РК')
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
However, what I get is the first spreadsheet being imported somewhat properly and other sheets being imported with nothing but NaN values.
Result
Chingis Sauranbayev is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.