I have a set S
of size n
whose members each have associated with them a number in the range 0.00
to 1.00
inclusive.
I want to select a subset T
of size m
with this property:
- the average of the numbers associated with the members of
T
must fall within a specified rangex
toy
(for example,0.65
to0.75
). Expressed differently: the sum of the numbers associated with the members ofT
must fall within a specified range (for example,0.65m
to0.75m
)
Further, out of all the possible T
s for a given S
, x
, y
, I want to choose one (uniformly) randomly.
My current method is to randomly select m
members of S
, and then check if the sum falls in the desired range. I repeat until I get a satisfactory result. Is there an algorithm (possibly dynamic programming?) to get the desired subset without using guess and check?
Example:
S
is a set of n = 200
questions, each assigned a difficulty rating between 0
and 1
inclusive. I want to generate a test T
with m = 50
questions where the average difficulty is between 0.65
and 0.75
inclusive. Furthermore I want to select a (uniformly) random T
, out of all the possible T
‘s that satisfy my conditions.
Another Example:
S = {0.1, 0.3, 0.5, 0.8, 1.0}
n = 5
m = 3
x = 0.5
y = 0.75
All possible T and their average value
{0.1, 0.3, 0.5} = 0.8 / 3 = 0.26
{0.1, 0.3, 0.8} = 1.2 / 3 = 0.40
{0.1, 0.3, 1.0} = 1.4 / 3 = 0.46
{0.1, 0.5, 0.8} = 1.4 / 3 = 0.46
{0.1, 0.5, 1.0} = 1.6 / 3 = 0.53
{0.1, 0.8, 1.0} = 1.9 / 3 = 0.63
{0.3, 0.5, 0.8} = 1.6 / 3 = 0.53
{0.3, 0.5, 1.0} = 1.8 / 3 = 0.60
{0.3, 0.8, 1.0} = 2.1 / 3 = 0.70
{0.5, 0.8, 1.0} = 2.3 / 3 = 0.76
Subsets of S with size m with average values between x and y
{0.1, 0.5, 1.0}
{0.1, 0.8, 1.0}
{0.3, 0.5, 0.8}
{0.3, 0.5, 1.0}
{0.3, 0.8, 1.0}
I am trying to come up with an algorithm to produce one of these 5 subsets at random, without first calculating every subset of S with size m. It seems that guess and check is the best method.
12
I do not think it is possible to deterministically build such a random set in “one go”, whatever it exactly means. However here is what I think is the closest alternative:
Start with N random numbers. As long as your average is not satisfying, randomly remove a number and replace it with a smaller/larger one. To quickly find the numbers to remove and their substitutes you could either split both your input and output sets in two (numbers below the wanted average versus numbers above the wanted average), or you could sort the input set and use binary searches.
An alternative yet similar idea is to start with an empty set and add “small” or “big” numbers depending on your current average. However you could find yourself with an invalid set in the end and you would then have to resort to method #1, so I think it is more elegant to only use #1.
Finally, regarding redundant values, I suggest to use a pre-processing step: sort your input set, then scan it to find redundant items and randomly pick one.
Rephrased to clarify things up.
6