Here are the files in the folder below:
C:temporary_locationfile_name_20240901.csv
C:temporary_locationfile_name_20240902.csv
C:temporary_locationfile_name_20240903.csv
C:temporary_locationfile_name_20240904.csv
C:temporary_locationfile_name_20240905.csv
C:temporary_locationfile_name_20240906.csv
C:temporary_locationfile_name_20240907.csv
C:temporary_locationfile_name_20240908.csv
C:temporary_locationfile_name_20240909.csv
C:temporary_locationfile_name_20240910.csv
C:temporary_locationfile_name_20240911.csv
C:temporary_locationfile_name_20240912.csv
C:temporary_locationfile_name_20240913.csv
Here is the code I use to select a file only and it will only read that file,
import pandas as pd, numpy as np
from numpy import *
import glob
import os
import os.path
import duckdb
_path = r'C:temporary_location\'
_frame = []
for loc in glob.glob(_path + '*.csv'):
if "20240905" in loc:
print('Loading', loc)
data = pd.read_csv(loc, low_memory=False)
How do I use the same code above to choose the files with dates greater than a specific date?
For example:
for loc in glob.glob(_path + '*.csv'):
if 20240905 > loc:
print('Loading', loc)
data = pd.read_csv(loc, low_memory=False)
The code should print below files.
C:temporary_locationfile_name_20240906.csv
C:temporary_locationfile_name_20240907.csv
C:temporary_locationfile_name_20240908.csv
C:temporary_locationfile_name_20240909.csv
C:temporary_locationfile_name_20240910.csv
C:temporary_locationfile_name_20240911.csv
C:temporary_locationfile_name_20240912.csv
C:temporary_locationfile_name_20240913.csv
Try getting the last 8 digits excluding .csv
if int(loc[-12:-4]) > 20240905:
You might use pandas.to_datetime
function following way
import glob
import pandas as pd
df = pd.DataFrame({"filename": glob.glob("file_name_*.csv")})
df["when"] = pd.to_datetime(df.filename, format="%Y%m%d", exact=False)
files_after_2024_09_10 = df[df.when > "2024-09-10"].filename
for fname in files_after_2024_09_10:
print(fname)
gives output
file_name_20240912.csv
file_name_20240911.csv
file_name_20240913.csv
Observe that you should inform pd.to_datetime
about format used and that it should for it somewhere in string by passing exact=False
. Note: for brevity sake I show working with files in current working directory, whole path will be present if you change glob.glob
‘s argument.