Given a maximum integer value M, I want to generate N integer values that are distributed with a lower frequency when approaching M (preferably M should still have a non-zero probability).
I don’t really care much about the probability function, let’s assume (half of) a normal distribution.
How would I do that if I don’t want to keep a history?
for i := 1 to N do GetNextTestValue(M)
What could GetNextTestValue look like?
FWIW I’m doing this in Delphi
4
Building off of GrandmasterB’s comment about pick two and select the lowest, I wrote a quick perl script to plot the random numbers and see the distributions – Pick the lowest of two provides one distribution, but others provide other distributions…
First, the code to see how it works and reproduce the results if you so desire.
#!/usr/bin/perl
use strict;
my $count = shift @ARGV;
my $nth = shift @ARGV;
my @nums = ();
for(my $i = 0; $i < 10_000; $i++) {
my @rnd;
for(my $j = 0; $j < $count; $j++) {
push @rnd, int(rand(100));
}
@rnd = sort { $a <=> $b } @rnd;
$nums[$rnd[$nth]]++;
}
for(my $i = 0; $i < 100; $i++) {
print "$it" . (defined $nums[$i] ? $nums[$i] : 0) . "n";
}
The pick min(rand(100),rand(100))
produces a straight line distribution sloping down.
If you min(rand(100),rand(100), rand(100))
, you can see a bit of a curve to it.
Exploring this just a bit further…
Picking the middle number of 7 produces something that looks more like a nice bell curve.
While the 2nd number of 7 produces a skewed one.
So, from this and the application of some stats to decide what shape you want it (or throwing out samples that don’t meet the desired range to get the desired shape and skew)
7
I found a solution.
The Controul blog by Hristo Dachev describes several methods, of which I have implemented the Marsaglia polar method
.
Delphi code for a quick-and-dirty implementation with a standard normal distribution (multiplied to count integer values) is below.
The blog references the Wikipedia article which has pseudo code for any mean and standard deviation.
Sample distribution generated by the program (10000 points):
Delphi XE2 .pas file:
unit uMarsaglia;
// Based on http://blog.controul.com/2009/04/standard-normal-distribution-in-as3/
// Refers to http://en.wikipedia.org/wiki/Marsaglia_polar_method
interface
uses
Winapi.Windows, Winapi.Messages, System.SysUtils, System.Variants, System.Classes, Vcl.Graphics, System.Math,
Vcl.Controls, Vcl.Forms, Vcl.Dialogs, VCLTee.TeEngine, Vcl.ExtCtrls,
VCLTee.TeeProcs, VCLTee.Chart, Vcl.StdCtrls, VCLTee.Series;
type
TFrmMarsaglia = class(TForm)
Chart1: TChart;
Series1: TBarSeries;
procedure FormShow(Sender: TObject);
private
FReady : Boolean;
FCache : Real;
function StandardNormal: Real;
public
end;
var
FrmMarsaglia: TFrmMarsaglia;
implementation
{$R *.dfm}
function Ln(R: Real): Real;
begin
Result := Log10(R) / Log10(2.7182818);
end;
procedure TFrmMarsaglia.FormShow(Sender: TObject);
const
cTestMax = 1000000;
cScale = 1000;
cMaxScale = 5*cScale; // We don't count generated numbers higher than that
var
A : Array[1..cMaxScale] of Integer;
i: integer;
lMax,
Nr: Integer;
rScale: Real;
begin
for Nr := 1 to cMaxScale do A[Nr] := 0;
rScale := cScale;
for Nr := 1 to cTestMax do
begin
i := Trunc(rScale * StandardNormal);
if i <= cMaxScale then Inc(A[i]);
end;
lMax := 0;
for Nr := 1 to cMaxScale do if A[Nr] > lMax then lMax := A[Nr];
Series1.Clear;
for Nr := 1 to cMaxScale do Series1.Add(A[Nr]);
end;
function TFrmMarsaglia.StandardNormal: Real;
// Returns real numbers between N(0,1) that when repeated have a normal distribution
var x,y,w,l: Real;
begin
if FReady then
begin // Return a cached result from a previous call if available.
FReady := false;
Result := FCache;
Exit;
end;
// Repeat extracting uniform values in the range (-1,1) until 0 < w = x*x + y*y < 1
repeat
x := Random;
y := Random;
w := x * x + y * y;
until (w > 0) and (w < 1);
l := Ln(w);
w := sqrt (-2 * l / w);
FReady := true;
FCache := x * w; // Cache one of the outputs
Result := y * w; // and return the other.
end;
end.
Delphi .frm file using a TChart (TeeChart) showing the results:
object FrmMarsaglia: TFrmMarsaglia
Left = 0
Top = 0
Caption = 'Marsaglia polar method'
ClientHeight = 626
ClientWidth = 964
Color = clBtnFace
Font.Charset = DEFAULT_CHARSET
Font.Color = clWindowText
Font.Height = -11
Font.Name = 'Tahoma'
Font.Style = []
OldCreateOrder = False
OnShow = FormShow
PixelsPerInch = 96
TextHeight = 13
object Chart1: TChart
Left = 0
Top = 0
Width = 964
Height = 626
BackWall.Visible = False
BottomWall.Visible = False
Foot.Visible = False
LeftWall.Visible = False
Legend.Visible = False
Title.Text.Strings = (
'TChart')
Title.Visible = False
View3D = False
Align = alClient
BevelOuter = bvNone
TabOrder = 0
ExplicitLeft = 272
ExplicitWidth = 692
ColorPaletteIndex = 13
object Series1: TBarSeries
Marks.Arrow.Visible = True
Marks.Callout.Brush.Color = clBlack
Marks.Callout.Arrow.Visible = True
Marks.Visible = False
XValues.Name = 'X'
XValues.Order = loAscending
YValues.Name = 'Bar'
YValues.Order = loNone
end
end
end
Note that ‘not keeping a history’ is not satisfied: there’s one boolean and one real number to remember state. That’s good enough for me; I was afraid of having to maintain arrays of already generated numbers.
If rolling two and taking the least is too much 😉 you could scale a single uniform random so that the distribution favors small results (pseudocode):
def scaled_random(M):
r = random()
return M * r * r
-
random() produces a uniform random number from 0..1
-
When r is small, r * r will be even smaller, so there are lots of ways to get numbers close to zero.
-
As r gets closer to 1, then r * r gets closer to 1, so the return value is closer to M. But very few numbers will get that high.
Note that this is going to be a lot faster than the least of 2 random numbers, because multiplication (r * r) is a lot faster than generating a second random. Random number generation is slow compared to basic operations — even “fast” random number generators are just “fast” relatively.
…
Also, some math libraries support a “power” distribution for random numbers — I believe that would also do the trick.