Please allow me to use this example/metaphor to describe an algorithm I need.
Objects
-
There are 5 thousand pennies.
-
There are 50 cups.
-
There is a tracking history (Passport “stamp” etc) that is associated with each penny as it moves between cups.
Definition
I’ll define a “highly diffused” penny as one that passes through many cups.
A “poorly diffused” penny is one that either passes back and forth between 2 cups
Question
How can I objectively measure the diffusion of a penny as:
- The number of moves the penny has gone through
- The number of cups the penny has been in
- A unit of time (day, week, month)
Why am I doing this? I want to detect if a cup is hoarding pennies.
Resistance from bad actors
Since hoarding is bad, the “bad cup” may simply solicit a partner and simply move pennies between each other. This will reduce the amount of time a coin isn’t in transit, and would skew hoarding detection.
A solution might be to detect if a cup (or set of cups) are common “partners” with each other, though I’m not sure how to think though this problem.
Broad applicability
Any assistance would be helpful, since I would think that this algorithm is common to
- Economics
- The study of migration patterns of animals, citizens of a country
- Other natural occurring phenomena
… and probably exists as a term or concept I’m unfamiliar with.
2
Simplest is often the best. Or at least a really good start.
You already have the history of which cups each penny has visited, hence you know the number of cups visited by each penny. We might expect to see a normal distribution of the number of cups visited vs the number of visits per penny.
A similar approach can be used to observe the distribution of pennies in each cup.
A penny that is poorly diffused has visited few cups, but one that is being hoarded (in your definition) is one that also has had many visits. Patterns of hoarding would skew the distributions.
So I would suggest looking for pennies where
number-of-visits / number-of-cups-visited
is greater than one, as a starting point.
This analysis can be performed for each time period you are interested in.
The cutoffs you choose (I suggested one, above) should be able to be adjusted; the warnings posted by @Cort Ammon should be taken seriously.
2