Supposed I have a dataframe called “Main” of this format:
Region Type Value
A 1 600
A 2 700
B 1 700
B 2 900
I also have an another dataframe that consists of
Region Type Probability
A 1 50%
A 2 30%
B 1 50%
B 2 30%
For each row in the top table, I look up the probability and get Python to roll a dice – which tell me randomly based on the probability whether I’ll add an extra row to the “Main” dataframe. However that extra row of data I’ll add will be taken randomly from another dataframe “Extras” which has the same form to “Main”.
Region Type Value
A 1 600
A 1 300
A 2 700
A 2 950
B 1 700
B 1 50
B 2 900
B 2 300
But: I can only pull a row from “Extras” that has the same Region and Type as in the “Main” dataframe – and once I’ve taken that row of data, I remove it from the “Extras” dataframe ready for the next row calculation in “Main” as I don’t want to pull the same row twice from the “Extras” dataframe (in reality these dataframes will be much larger).
Am trying to get my head around what I assume is a multi-step process – possibly in a loop? – that can do this fairly simply. In the end I’ll have a dataframe for example as something that looks like:
Region Type Value
A 1 600
A 2 700
B 1 700
B 2 900
A 2 950
…where in my four rows of the “Main” dataframe on Region A, Type 2 was “lucky” and we perform the adding row calculation from Extras above and pull the A / 2 / 950 row, and add it to the dataframe (whilst removing it from the Extras dataframe).