Let us say I have an n element vector consisting of certain measurements with spikes that need to be located (n is small, say 5-7). My task is to locate all elements in the vector that are “much greater than the rest”. A method based on calculating standard deviations (or z scores) would work well if n were large, and the number of “large” elements were small. However, n is small, and the number of “large” elements can even equal n. So, a statistical method is not going to work.
This is akin to finding the bright spots in an image, where the entire image can be bright. One could threshold the vector, for instance, but that is risky since the actual content of the vector could change. For now, I am going with the following code:
pnormalize=p/np.max(p)
pthreshold=np.where(pnormalize>0.6,1,0)
The unease I have with this is that the choice of 0.6 is arbitrary. In practice, with vectors like [13.45 0.3 1.4 0.8 11.1], which result in [1 0 0 0 1], this works, this could have easily failed for [13.45 0.3 1.4 0.8 7.1], where it is “obvious” I have two elements that are much larger than the rest, but the code provides a false negative. I cannot lower the threshold arbitrarily, as noise does exist, and don’t want any false positives either.
Any suggestions?