So I have data with the following format
Name | Duration | y_dur1 | y_dur2 | y_dur3 |
---|---|---|---|---|
John | dur1 | .01 | .02 | .02 |
Mike | dur2 | .05 | .03 | .04 |
Chris | dur3 | .03 | .04 | .08 |
I want to create a column y that takes from y1 y2 or y3 depending on the entry in the table
Name | Duration | y_dur1 | y_dur2 | y_dur3 | y |
---|---|---|---|---|---|
John | dur1 | .01 | .02 | .02 | .01 |
Mike | dur2 | .05 | .03 | .04 | .03 |
Chris | dur3 | .03 | .04 | .08 | .08 |
I want to do this in the cleanest way possible. I have to actually do this for about 30 columns. 5 groups of 6. This is more simplified. My current code looks like this:
df = pd.read_csv(path)
cols = df.columns
dur_cols = pd.Series([col for col in df.columns if "_Dur" in col])
val_cols = dur_cols.replace('d',"", regex=True)
val_cols.drop_duplicates(inplace=True)
for col in val_cols:
df[col] = np.where(df["Duration_to_use"]=="dur1",df[col+"1"],np.where(df["Duration_to_use"]=="dur2",df[col+"2"],df[col+"3"]))
I think this works, but I want to find a better way.