I need an algorithm that distributes same items across a list, so maximizing the distance between occurrences.
E.g.
I have a list of 15 items:
{a,b,c,c,c,d,e,f,f,f,g,h,h,i,j}
The algorithm should reorder these in such a way that all the duplicates are spread as uniformly as possible.
The mentioned list should result in something like this:
{c,f,a,h,b,c,d,e,c,f,g,c,h,i,j,f}
Preferably I’d like pseudo code, and even better would be TSQL (since that is the platform it needs to run on). It needs to process hundreds of these lists in one go.
I also tested a proposed method called ‘Weighted shuffle’ but this will still allow two of the same items in the list to appear next to each other even when this is not needed.
8
First, make sure there is a solution according to your requirements (that means, there is not a single letter which occurs more that n/2 times, when n is the total number of elements).
Then I suggest you try the following
-
start with a random shuffle or weighted shuffle
-
afterwards, for each remaining pair of similar neighbours, pick one of the items, pick another randomly choosen item among those with different neighbours, and switch their places
-
repeat the last step until all pairs are removed.
This approach will just make sure you get no neighboured pairs, but it does not maximize the possible distances between similar letters. If you want to achieve the latter (which is not clear from your question), I suggest you introduce a score function to your list, for example like this:
Score(list) := Sum(1/(abs(a-b)-0.999))
a,b
where the sum goes over all pairs (a,b)
of positions of equal letters. The “-0.999” in the denominator makes sure the whole expression will become very big when there are 2 equal neighbours. Now, you can apply random swaps to your list and try to minimize the score function, for example by hill climbing or simulated annealing.
If you’re only worried about spreading the similar rows apart, and not as bothered by making sure they are in regular intervals, you can use something like the following:
It determines a weight for each group of letters, then uses the ROW_NUMBER function to calculate a distribution of sorts. By tweaking the weighting and/or sorting in the final select, you may get the results you need.
CREATE TABLE #items (letter char(1))
INSERT INTO #items VALUES ('a')
INSERT INTO #items VALUES ('b')
INSERT INTO #items VALUES ('c')
INSERT INTO #items VALUES ('c')
INSERT INTO #items VALUES ('c')
INSERT INTO #items VALUES ('d')
INSERT INTO #items VALUES ('e')
INSERT INTO #items VALUES ('f')
INSERT INTO #items VALUES ('f')
INSERT INTO #items VALUES ('f')
INSERT INTO #items VALUES ('g')
INSERT INTO #items VALUES ('h')
INSERT INTO #items VALUES ('h')
INSERT INTO #items VALUES ('i')
INSERT INTO #items VALUES ('j')
ALTER TABLE #items ADD weight numeric(4,2)
--Add weight for each letter
DECLARE @itemcount numeric(4,2) = (SELECT COUNT(*) FROM #items)
UPDATE #items set weight = @itemcount / (SELECT COUNT(*) FROM #items i WHERE letter = #items.letter)
--Sort items by weight, using row_number to space out letter groups
;WITH cteNumbered AS (SELECT letter, weight, ROW_NUMBER() OVER (PARTITION BY letter ORDER BY letter) as rownum FROM #items)
SELECT letter from cteNumbered ORDER BY rownum * weight, weight desc, letter