For a list of percentage data, I need to check if the last value (90.2
) is somehow higher and somewhat “abnormal” than the rest of the data. Clearly it is in this sequence.
delivery_pct = [59.45, 55.2, 54.16, 66.57, 68.62, 64.19, 60.57, 44.12, 71.52, 90.2]
But for the below sequnece the last value is not so:
delivery_pct = [ 63.6, 62.64, 60.36, 72.8, 70.86, 40.51, 52.06, 61.47, 51.55, 74.03 ]
How do I check if the last value is abnormally higher than the rest?
About Data: The data point has the range between 0-100%. But since this is percentage of delivery taken for a stock for last 10 days, so it is usually range bound based on nature of stock (highly traded vs less frequently traded), unless something good happens about the stock and there is higher delivery of that stock on that day in anticipation of good news.
4
Once you’ve determined a threshold (deviation from mean) you could do this:
import statistics
t = 2 # this is the crucial value
pct = [59.45, 55.2, 54.16, 66.57, 68.62, 64.19, 60.57, 44.12, 71.52, 90.2]
mean = statistics.mean(pct)
tsd = statistics.pstdev(pct) * t
lo = mean - tsd
hi = mean + tsd
print(*[x for x in pct if x < lo or x > hi], sep="n")
Output:
90.2
It’s the threshold value that (effectively) determines what’s “abnormal”
The interquartile range (IQR) method produces the same result:
import statistics
pct = [59.45, 55.2, 54.16, 66.57, 68.62, 64.19, 60.57, 44.12, 71.52, 90.2]
spct = sorted(pct)
m = len(spct) // 2
Q1 = statistics.median(spct[:m])
m += len(spct) % 2 # increment m if list length is odd
Q3 = statistics.median(spct[m:])
IQR = Q3 - Q1
lo = Q1 - 1.5 * IQR
hi = Q3 + 1.5 * IQR
print(*[x for x in pct if x < lo or x > hi], sep="n")
Output:
90.2
2
You can separate the values by slicing them, but you’ll need to know what method by which something is “abnormal”
For example, to compare it to the simple average
def test(data):
# opportunity to verify length
earlier = data[:-1]
last = data[-1]
avg = sum(earlier) / len(earlier) # exchange as needed
return last > avg
NOTE I doubt this simple comparison is sufficient as it fails for your example
The builtin statistics
library has a variety of useful methods
The NIST “div898” textbook is an excellent resource for learning about this area too https://www.itl.nist.gov/div898/handbook/
Or if this is a homework problem, you were likely learning about and/or expressly given a way to work with the data
1