I want to generate a right-skewed distribution of data in Excel. The categories are 1-2, 2-3, 3-4, 4-5, More than 5. I want more frequency in 1-2, a little less in 2-3, and so on with the least being More than 5 category.
I used the following formula using If and Randbetween, but it’s giving me a normal distribution.
Formula giving Normal Distribution:
=IF(RANDBETWEEN(1,5)>4,”More than 5″,IF(RANDBETWEEN(1,5)>3,”4-5″,IF(RANDBETWEEN(1,5)>3,”3-4″,IF(RANDBETWEEN(1,5)>2,”2-3″,”1-2″))))
Also, tried the following formula to get the skewed distribution but it’s not working on my end.
=IF(AND(RANDBETWEEN(1,5)>4, RANDBETWEEN(0,1)<0.2),”More than 5″,IF(AND(RANDBETWEEN(1,5)>3, RANDBETWEEN(0,1)<0.35),”4-5″,IF(AND(RANDBETWEEN(1,5)>3, RANDBETWEEN(0,1)<0.55),”3-4″,IF(AND(RANDBETWEEN(1,5)>2, RANDBETWEEN(0,1)<0.6),”2-3″,”1-2″))))
Expected outcome:
Any guidance will be helpful.
Nikhita Rao is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
0
I was thinking something involving logs…
=XLOOKUP(RANDARRAY(100,1)*LOG10(6),LOG10(SEQUENCE(1,5)),{"1-2","2-3","3-4","4-5",">5"},,-1)
You can also try something like this to get a highly-skewed distribution, adjusting the power as required:
=LET(criteria,SCAN(0,SEQUENCE(1,6),LAMBDA(a,c,a+1/c^3))-1,
XLOOKUP(RANDARRAY(100,1)*INDEX(criteria,6),DROP(criteria,,-1),{"1-2","2-3","3-4","4-5",">5"},,-1))
or just a hand-crafted one like this:
=XLOOKUP(RANDARRAY(100,1),{0.4,0.7,0.85,0.95,1},{"1-2","2-3","3-4","4-5",">5"},,1)
which implies that the expected percentages in the 5 groups are 40,30,15,10 and 5.
Conversely if you know that the required percentages are 40,30,15,10 and 5, you can calculate the cumulative values that you need to get the distribution:
=LET(criteria,SCAN(0,{40,30,15,10,5},SUM)/100,
XLOOKUP(RANDARRAY(100,1),criteria,{"1-2","2-3","3-4","4-5",">5"},,1))
=Choose(0.5+ABS(5.5-(RANDBETWEEN(1,5)+RANDBETWEEN(0,5))), "1–2","2–3","3–4","4–5","More Than 5")
So, how does it work? Well, as you may or may not already know, the most likely result when rolling 2d6 and summing the values is 7 — this is because it has the most possible number of ways of being made:
Result | Methods |
---|---|
2 | 1+1 |
3 | 1+2; 2+1 |
4 | 1+3; 2+2; 3+1 |
5 | 1+4; 2+3; 3+2; 4+1 |
6 | 1+5; 2+4; 3+3; 4+2; 5+1 |
7 | 1+6; 2+5; 3+4; 4+3; 5+2; 6+1 |
8 | 2+6; 3+5; 4+4; 5+3; 6+2 |
9 | 3+6; 4+5; 5+4; 6+3 |
10 | 4+6; 5+5; 6+4 |
11 | 5+6; 6+5 |
12 | 6+6 |
(Note that there are an odd number of combinations here)
Now, by changing our dice out for 0–5 and 1–5, we can get the following table:
+ |
1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 |
1 | 2 | 3 | 4 | 5 | 6 |
2 | 3 | 4 | 5 | 6 | 7 |
3 | 4 | 5 | 6 | 7 | 8 |
4 | 5 | 6 | 7 | 8 | 9 |
5 | 6 | 7 | 8 | 9 | 10 |
An important thing here is that it has an even number of possible results: 1 method each for 1 and 10, up to 5 methods each for 5 and 6.
Then, we subtract 5.5 (or, subtract it from 5.5) and take the Absolute Value (i.e. ignore Negatives). This effectively folds the table in half:
Result | Methods to get |
---|---|
0.5 | 10 |
1.5 | 8 |
2.5 | 6 |
3.5 | 4 |
4.5 | 2 |
(Note: this is why the even number of results was important. With an Odd number of results, the number of methods for our first value would be halved!)
Then we add 0.5 (so our results become 1–5), and use that in a CHOOSE
to give our output.
I was able to make a somewhat similar distribution through this formula:
=IF(AND(RANDBETWEEN(1,5)>4, RANDBETWEEN(0,1)<0.00045),"More than 5",IF(AND(RANDBETWEEN(1,5)>3, RANDBETWEEN(0,1)<0.005),"4-5",IF(AND(RANDBETWEEN(1,5)>2, RANDBETWEEN(0,1)<0.575),"3-4",IF(AND(RANDBETWEEN(1,5)>1, RANDBETWEEN(0,1)<0.85),"2-3","1-2"))))
Thought I would post it if anyone else had the same question.
Nikhita Rao is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.