Here is a problem I’m trying to solve for a personal project. I’m not exactly sure how best to even approach it.
Problem:
I have a single stack already populated with several thousands of unknown items. (Not completely unique items, maybe 300 different values. So it’s a stack with plenty of duplicates shuffled together in a random order.) I want to sort these items by moving them between a limited number of other stacks. (Maybe between 10-30 stacks at most.) An algorithm with less moves is ideal.
Limitations:
- The initial stack is full of unknown values (except for the top item).
- I can read the top item in a stack.
- I can move the top item of a stack directly to the top of another stack.
- I can check if a stack is empty.
- I can remember and store all the items I have read and moved.
Based on the above limitations, all the items need to be moved once to another stack just so they can be read/scanned. Then I would know where and what everything is.
Once this is done I can do the real sorting. (This is the part I mainly need help with.) After I’ve read and moved all the items once, I can then calculate and queue up all the stack moves needed to actually sort the items.
Presorting and optimizing:
It might be too early to think about this, but I can potentially do some pre-sorting onto different stacks as the items are first read and moved.
I do have a overall idea about the distribution of all the items. For instance, if the items were sorted alphabetically, I might know that items values starting with “S” are five times more common then items that start with “V” and that “S” is also twice as common as items starting with “A”. Is knowing this even useful here?
I’m not sure if this would be helpful for pre-sorting but in some instances I might also know all the possible values of the items. So I could calculate if the value of an item is “odd” or “even”. Even other similar calculations can be done (like the remainder of a larger modulos operation). Maybe separating the items into “odd” or “even” stacks would make it easier to weave into the final sorted stacks. I’m just speculating here, since I haven’t run any tests yet.
Desired Result:
Items don’t need to be sorted back into a single stack, it would probably actually be more ideal if they were moved into several smaller stacks of somewhat equal size. For example, I might end with alphabetically sorted stacks as such: A-D, D-K, K-R, R-S, S-Z.
What kind of algorithms would be best given these limitations? Pseudo code?
5
Do a “binning” of the initial black box stack into N groups of equal proportions (your A-D, D-K…). At that point you can use any O(n log n) algorithm to sort each group in turn.
Actually, you could pop items from the black-box stack directly into a group of N red-black trees.
But if you have lots of identical elements… could you not just store a counter into a single tree? If you only have 300 values, you might then store your several thousands values into a single tree of comparatively small height. When extracting values you’d just decrement the counter and delete the element when the counter eventually reached 0. And you could use the tree to fill the output stacks if you found that you still needed them.