I have a metric with 900K users, and I can’t generate a distribution that would roughly repeat the existing one. I need this to quickly generate samples for AA AB tests (with guaranteed uplift).
Here are the sample data
count 953086.000000
mean 483.013657
std 1410.598133
min 0.000000
25% 33.000000
50% 125.000000
75% 421.000000
max 151074.000000
Here’s what a sample of 10K users looks like for this metric:
How to determine the distribution? qualitatively? because the Fit methods of statistical modules are not very helpful
I tried the methods of statistical packages fit, but they didn’t help properly.
I want someone to help me to repeat the distribution accurately enough, so that when I bring the metrics from 0 to 100 units together, I will match quite accurately, and the tail will be more random, because it is expected.