I want to make a function that take the average of a column but with two conditions:
- Include only values greater than 0
- Replace outliers* with the median
*In this case, an outlier is defined as a value that is beyond one standard deviation of the mean
This is what I have:
def AdjustedAverage(df, head):
NoZeros = df[df[head] > 0]
Median = NoZeros[head].median()
Mean = NoZeros[head].mean()
std = NoZeros[head].std()
Max = Mean + std
Min = Mean - std
sum = 0
for i in range(len(NoZeros)):
if NoZeros[head][i] < Min or NoZeros[head][i] > Max:
sum += Median
else:
sum += NoZeros[head][i]
avg = sum / len(NoZeros)
return avg
The KeyError appears in the “if” statement line. When I write the function’s code on the main code it works, but when I put in a function it raises a KeyError:0. I really want to create a function for this because I’ll use it many times in my code.
I don’t get why I get a KeyError. In the preceding lines, the “head” value is found, but suddenly in the If statement the program doesn’t find it anymore…
I tried to change the if statement with at or loc but I still get KeyError.
Axel Sánchez is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.