I have created a pandas dataframe as follows:
ds = {'col1' : ["A","B"], 'probability' : [0.3, 0.6]}
df = pd.DataFrame(data=ds)
The dataframe looks like this:
print(df)
col1 probability
0 A 0.3
1 B 0.6
I need to create a new dataframe which duplicates each row and assign to the duplicated record a probability needed to sum up to 1.
From the example above:
- I need to duplicate record 0 such that A gets a probability of 0.3 (so it keeps what’s already in there) and the duplicated record gets a probability of 0.7 (0.3 + 0.7 = 1)
- I need to duplicate record 1 such that B gets a probability of 0.6 (so it keeps what’s already in there) and the duplicated record gets a probability of 0.4 (0.6 + 0.4 = 1)
The resulting dataframe looks like this:
col1 probability
0 A 0.3
1 A 0.7
2 B 0.6
3 B 0.4
Can anyone help me doing it in pandas, please?