I’ve written a simple function to bin data into seasons based on years and months and I’m trying to use it to update a Pandas dataframe (The binning is arbitrary as described in the comments, and is irrelevant to the problem). I think the function is fine (I know it returns logical results when supplied with inputs manually) but I cannot use it to update the DF in vectorized fashion.
The function in question is as follows:
def getSeason(year, mth):
#return a tuple containing the year and season for the given year, month combination
#this assumes that seasons are lagged by one calendar month, such that:
#year and month should be integer values
#winter = Dec-Feb
#spring = Mar-May
#summer = June-Aug
#fall = Sept-Dec
sYear = year
season = ""
mthsDict = {1:"winter",2:"winter",3:"spring",4:"spring",5:"spring",6:"summer",7:"summer",8:"summer",9:"fall",10:"fall",11:"fall",12:"winter"}
season = mthsDict.get(mth)
if mth == 12:
sYear = sYear+1 #December is attached to the next year for our purposes
return(sYear, season)
This works fine so long as year and month are supplied as integers, which they are in my dataset. I want to use the returned values to populate two dataframe columns, “SYR” and “SEASON”. I can do it the “bad” way, with iterrows, simply and easily:
#this works:
for index, row in DF.iterrows():
thisYr = row["YEAR"]
thisMth = row["MONTH"]
thisSeason = getSeason(thisYr,thisMth)
DF.at[index,"SEASON"] = thisSeason[1]
DF.at[index,"SYR"] = thisSeason[0]
I’d really prefer to vectorize this- for this particular dataset it’s meaningless, but I’d like to stick with good habits. So I’ve tried apply and a lambda, but I’ve clearly got something wrong in the syntax. It gives an unhashable type: ‘Series’ error pointing at the part of the function that returns the month string from the dictionary:
#this fails:
DF["SYR"] = DF.apply(lambda row: getSeason(DF['YEAR'], DF['MONTH'])[0])
DF["SEASON"] = DF.apply(lambda row: getSeason(DF['YEAR'], DF['MONTH'])[1])
Any thoughts are much appreciated- thanks in advance.
beachcombr is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1
There is no single, encapsulated way to vectorize pandas operations. A lambda/UDF used with apply is almost always a non-vectorized approach since the lambda/UDF has to be applied to each row/column.
If you’re manipulating individual rows/columns, you’re not vectorizing. If you create expressions that deal with series/arrays/dataframes as a whole, you’re vectorizing.
The way to vectorize depends heavily on the operation in question. So in your specific example a vectorized approach would like this:
DF['SEASON'] = DF['MONTH'].map(mthsDict)
DF['SYR'] = np.where(DF['MONTH'] == 12, DF['YEAR']+1, DF['YEAR'])
Hope this helps!