I have a list of entities. Every entity contains a number that holds how many times the entity has been selected. I need to make a function that selects n
(say 25%) of the entities, randomly. What I want to do is increase the odds for entities that have been selected the least times. Suppose after 5 runs the times the entities have been selected can be anywhere from 0 to 5. But I don’t want to have such a spread. I want the times the entities are selected more or less to be equal.
How can I write a function that increases the odds for entities that haven’t been selected as often as others?
One way I could think of is to make a list that has more occurrences of lesser selected entities, increasing the chance the randomizer selects that entity. Any hints, tips, ideas?
EDIT:
Wow. Closed as not being a real question and reopened again. For not being a real question it did get a lot of answers and comments. Thanks for that. I got exactly what i wanted from them. I got some nice ideas to work out and to test.
4
Sort By Random Range
You can sort the items by an expression, where a random number is selected between the items current number of views and the max number of views. This ensures that items with a high view count are more likely to be given a random number that is large. While items with a low view count are more likely to be randomized.
Sort the list by the following expression random(item.views,max_viewx+1)
I gave the max_views+1
so that a new list of items with 0
views will still produce a random sort order.
Tested via SQL
I think this is something I could one day use on a blog site. So here is an attempt to verify this approach as a MySQL table.
First create a basic articles
table.
CREATE TABLE `articles` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`title` VARCHAR(45) NOT NULL ,
`views` INT NOT NULL DEFAULT 0 ,
PRIMARY KEY (`id`) );
Now insert some articles with different view counters.
INSERT INTO `articles` (`title`, `views`) VALUES ('Home Page', 20);
INSERT INTO `articles` (`title`, `views`) VALUES ('Portfolio', 10);
INSERT INTO `articles` (`title`, `views`) VALUES ('Contact Us', 5);
INSERT INTO `articles` (`title`, `views`) VALUES ('Product - ABC', 2);
INSERT INTO `articles` (`title`, `views`) VALUES ('Product - SHR', 1);
INSERT INTO `articles` (`title`, `views`) VALUES ('Product - DBS', 0);
INSERT INTO `articles` (`title`, `views`) VALUES ('Product - ZZZ', 100);
Now I could sort by the views
column to put Product - ZZZ
at the bottom, but we want this to feel random.
SELECT * FROM `articles`
ORDER BY `views`+(RAND()*(100-`views`+1));
That works for me, placing Product - ZZZ
at the bottom 100% of the time, but low view items are more likely to be sorted higher in the list.
SELECT * FROM `articles`
ORDER BY `views`+(RAND()*(100-`views`+25));
By adding +25
to the ORDER BY
, then I’m able to raise Product - ZZZ
slightly higher in the list and make the rest of the items more random. So you can control the randomizing effect.
1
First, it’s worth pointing out that if your selection function is truly random (or even convincingly pseudorandom), all items will have about the same number of selections over the long run.
I can think of a number of ways to accomplish what you want. Here’s one:
- Give each item an initial weight of, say, 1.0.
- Arrange all the items in a list. Sort the list according to the items’ weights.
- When you want to choose new item, use a random function to choose a number between 0 and the sum of the weights of all items. Next, add the weights of the items starting from the beginning of the list until you reach the number you just chose. Choose that item.
- Finally, you’ll need to adjust the weights and ensure that the list stays sorted (either sort it again, or adjust in a way that maintains the sort).
How you adjust the weights will have a big impact on your algorithm, but I’d suggest something like: subtract 1.0 from the chosen item’s weight and add 1/(n-1) to the weights of all the other items.
2
The times entities are selected will average out to be equal… given time and enough selections. There is indeed a chance that one entity could get lucky and win the first 5 gambles. But that’s rare. In general, I’d say that you don’t have to worry about it.
But if you want it to be impossible, simply exclude any entity from the lotto which has more than a standard deviation of wins. At low numbers, this would exclude anyone who has won a single gamble.
Note, that is this is for a game, gamers are particularly keen about when the odds have been tampered with. If someone can never win twice in a row, they’ll notice it.
You’ve got the entries A, B, C, D, and E that have been selected 3, 1, 1, 0, 0 times.
The max of this set is 3. Subtract the currently selected value from the max + 1 (an entry that has the max values should have some chance of still winning, though a one as that entry will only have one representative in the next raffle).
You now have the association of: A => 1, B => 3, C => 3, D => 4, E => 4.
Build an array based on this, for example: A,B,B,B,C,C,C,D,D,D,D,E,E,E,E
and select a random winner. The entries that have won the fewest times in the past will have an increased chance of winning this time.
Your question can be interpreted as the “selects n of the entries” being with or without replacement. If it is without replacement, then recompute, removing the winner from the previous round until you select the required number of winners.
Variations:
- Instead of using max of the set, one might instead use the sum of the set.
- Change the value of N in max + N value to even out the distribution
I would suggest running a number of Monte-carlo simulations on the data set to identify what values work best for your situation.
If you an want to limit the difference in occurences to a maximum of 1:
- Shuffle the list of entities (using a Fisher-Yates shuffle)
- Pick your random values according to the order of the shuffled list
- Once you reached the end of the shuffled list goto 1
You can also allow higher differences in occurences by adding each entity n
-times to the list when you want a maximum difference of n
.
1
I once implemented something like this:
Initially assign every item in your pool the same number of chances. When making a random choice add up the total number of chances and then go through your list to find which one you got. (Yes, that’s O(n) rather than O(1). You’ll need something fancier if your list is gigantic or not in memory.)
When an item is picked you remove a chance from it.
When you have picked as many items as your list contains go through and add one to every chance.
(I didn’t have the last step as I actually wanted the successful picks to go away and never return.)