I am transforming strings, and retriving start and end dates to make a range out of this, to later build a graph. The function return all years between(inclusive of boundaries) if there are two dates. If there is one where string begins from ’20’ it returns lists until 2024. and otherwise it returns a single year.
Expected output is a list with years in range of start and end value of the years.
ex.
#input '(1992—1997)'
#ouput [1992, 1993, 1994, 1995, 1996, 1997]
for inputs like this '(декабрь 1997 — сентябрь 1998)'
its working properly, output: [1997, 1998]
the_dataframe
the function is not working on some inputs like: and returning only a single output (the end year). There are no error messages but the logic seemes to be wrong somewhere. I tried to pass individual string copy of the data in the df, and it worked. However when applying a function on a column, some rows return only one values instead of a range
(1996—1997) #iloc[25] #an example of a faulty row
def unpack_dates(year_range):
years = year_range.replace('—', ' ').replace('-', ' ').replace('–', ' ').strip('(').strip(')').split()
end = 2024
unpacked = [int(word) for word in years if word.isdigit() and len(word)==4]
if len(unpacked)==2:
return(list(range(unpacked[0], unpacked[1]+1)))
elif len(unpacked)==1 and str(unpacked[0]).startswith('202'):
return(list(range(unpacked[0], end+1)))
else:
return unpacked[0]