I am trying to calculate 95th Percentile
from the data sets which I have populated in my below ConcurrentHashMap
.
I am interested in finding out how many calls came back in 95th percentile of time
My Map will look like this and it will always be sorted in ascending order on the keys- In which
key - means number of milliseconds
value - means number of calls that took that much milliseconds
Milliseconds Number
0 1702
1 15036
2 14262
3 13190
4 9137
5 5635
6 3742
7 2628
8 1899
9 1298
10 963
11 727
12 503
13 415
14 311
15 235
16 204
17 140
18 109
19 83
20 72
For example, from the above data sets, it means
1702 calls came back in 0 milliseconds
15036 calls came back in 1 milliseconds
Now I can calculate the 95th percentile by plugging the above data sets in the Excel sheet
. But I was thinking to calculate the percentile in Java code.
I know the algorithm will look something like this-
Sum all values from the map, calculate 95% of the sum, iterate the map keys in ascending order keeping a running total of values, and when sum equals or exceeds the previously calculated 95% of the total sum, the key should be the 95th percentile I guess.
But I am not able to plugin this algorithm in the Java code. Below is the map which will have above datasets.
Map<Long, Long> histogram = new ConcurrentHashMap<Long, Long>
I am not sure what is the best way to calculate the percentile in Java. I am not sure whether I am algorithm is also correct or not. I am just trying to find out how many calls came back in 95th percentile of time.
private static void calculatePercentile() {
for (Long time : CassandraTimer.histogram.keySet()) {
}
}
Can anyone provide some example how to do that?
Any help will be appreciated.
Updated code:-
Below is the code I have got so far. Let me know if I got everything correct in calculating the 95th percentile-
/**
* A simple method to log 95th percentile information
*/
private static void logPercentileInfo() {
double total = 0;
for (Map.Entry<Long, Long> entry : CassandraTimer.histogram.entrySet()) {
long value = entry.getKey() * entry.getValue();
total += value;
}
double sum = 0.95*total;
double totalSum = 0;
SortedSet<Long> keys = new TreeSet<Long>(CassandraTimer.histogram.keySet());
for (long key : keys) {
totalSum += CassandraTimer.histogram.get(key);
if(totalSum >= sum) {
//this is the 95th percentile I guess
System.out.println(key);
}
}
}
8
I’m not sure what you’re trying to accomplish here, but
it’s easy to get a percentile. Suppose you have 100 numbers. You sort them and extract the 95th one (if you want the 95th percentile). If you don’t have a multiple of 100 numbers you may have to do some interpolation. I assume you know how to do that.
EDIT: OK, you already have the numbers in order. First get the total of the column called “Number”. Call that Tot. Then enumerate through them, keeping a running sum of the column and call that RS. When RS passes 0.95 * Tot, you’ve found it. As I said, you might want to do some interpolation so you get a fractional number of milliseconds.
Your question has the right idea. It’s not a big deal.
for (i=0, sum=0; i<n; i++) sum += Number[i];
tot = sum;
for (i=0, sum=0; i<n && sum < 0.95*tot; i++) sum += Number[i];
// i is about it
5
In kotlin :
// The q-th quantile represents the value below
// which q percent of the data falls.
fun numpy_quantile(data: IntArray, quantile: Double): Int {
require (quantile in 0.0..1.0)
val total = data.sum() * quantile
val sortedData = data.copyOf() // or sort in place, which changes data
sortedData.sort()
var i = 0
var runningTotal = 0
while (runningTotal < total) {
runningTotal += sortedData[i++]
}
return sortedData[i]
}