Currently I am working in R on a project that aims to identify Dragon King events (massive outliers) in large datasets. These outliers appear for example in the city sizes in England, where London is the outlier.
The problem I have encountered is that I am using the test statistics described in the paper “Multiple Outlier Detection in Samples with Exponential & Pareto Tails: Redeeming the Inward Approach & Detecting Dragon Kings” by Spencer Wheatly and Didier Sornette. The problem specifically is that these test statistics do not follow the usual distributions under the null hypothesis (no outliers present in the dataset). The reference mentioned above does refer to other papers, which should seemingly clarify the distribution function but these don’t seem to be very applicable as they are either very outdated, or don’t discuss the test statistic from the original reference.
What I have done now is performed simulations, where I have run many different cases of the null hypothesis and used the test statistic on it and then computed the distribution function empirically using the ecdf function in R. This gives me, what I have to believe to be, a reliable empirical distribution function. Unfortunately the p-values that I compute don’t seem plausible. This is in the sense that when I compute the p-values for samples that clearly contain outliers, the p-value is not significant enough to reject the null. Thus I assume I must be misunderstanding something or doing something wrong.
I had expected to be able to replicate the graphs found in the original reference (on page 10 for example) but have thus far been unable to do this.
I would be grateful for any help or pointers in the right direction! Thanks in advance!
user25936873 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.