Apologies if this problem has an obvious answer that I have missed; I am new to using pandas in python, but couldn’t seem to find an appropriate answer in the documentation.
I have currently written a script to load and combine time-series data from two .csv files that have the same base filename (specified as a path using Pathlib), but different suffixes. A minimal working example of this is as follows:
import pandas as pd
def load_data(filename):
headers_0 = ['a', 'b', 'c'] # Headers for first file. May have more entries than columns in file
headers_1 = ['d', 'e'] # Headers for second file.
data_0 = pd.read_csv(str(filename.with_suffix('')) + '_0', header=None, delim_whitespace=True)
data_0.columns = headers_0[0:(data_0.shape[1])]
data_1 = pd.read_csv(str(filename.with_suffix('')) + '_1', header=None, delim_whitespace=True)
data_1.columns = headers_1[0:(data_1.shape[1])]
data = data_0.join(data_1)
data.fillna(0, inplace=True)
return data
Thus far, I have only been using load_data
for datasets where both data_0 and data_1 have the same length of columns (same length of time-series). However, I am now encountering a situation where data_1 has a shorter column length than data_0; this is because the data in data_1 only starts getting recorded at some later time than data_0.
How do I use pandas to fill the columns of data_1 with leading zeros, such that the column length in both data_0 and data_1 are the same? I believe that the line data.fillna(0, inplace=True)
is filling the length mismatch with trailing zeros; is there an obvious way to change this to leading zeros? Note that I do not know the length of either dataset a priori so I would appreciate help towards a solution that works based on the length of the data loaded using pandas.
I have tried different options for DataFrame.fillna
such as method=backfill
, but none of these attempts have yielded the expected result.
PhysyCola is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.