I am working on a problem where I need to simulate when an entity remains “active in a system” as time passes by.The following is a simplified setting that describes my issue.
I have estimated probabilities of being active in the system. These probabilities look reasonable. For instance, the chance of being active in the system are 94.29% (or 0.9429 as used in model), 93.57%,92.77%, 91.88% for the first, second, third and fourth period respectively. Probabilities decrease as expected, with the chance of being active in the system after 10 years being 84.11% (similarly, 60% for 20 years). With such high probabilities of remaining active, I would expect entities to remain active for a long period of time as is the case in empirical information.
Since I want to assess entities being active for the next 60 years, I generate 60 random numbers in R using the instruction random_numbers<-runif(60). Then the entity is active in a given year under two conditions: first, it had to be active in the previous year (no re-entry), and second the respective random number must be smaller than the probability of remaining active. For instance, an entity would be active in the third year only if 1) it was active in the second year, and 2) the third randomly generated number is smaller than 0.9277. I do find this methodology reasonable, and results seem to be as expected in a single simulation.
However, when I run 1000 such simulations, I notice that entities are not remaining active for as long as I would expect given the high probabilities I have computed. Indeed, average number of years active is just about 8, but the probability of being active after ten years is still super high (84.11% as mentioned previously). When I check the matrix of random numbers generated (which I have saved), I realize that indeed there are most often than not very high numbers (0.90+) in the first 10 positions. How is that possible? Doesn’t that mean that high numbers are appearing much more often than expected? Is there something that I may be doing wrong?
I appreciate any help.
I have increased the number of simulations to 2000, 3000, and even 4000 (which takes a long time and memory often collapses), and I always get an average of about 8 years in the system.
1