I have the following “covar” dataframe (it’s a covariance matrix), where I have the same items both as index and as column names.
covar_data = {
'a': [0.04, np.nan, 0.03, np.nan, -0.04],
'XY': [np.nan, np.nan, np.nan, np.nan, np.nan],
'b': [0.03, np.nan, 0.09, np.nan, 0.00],
'YZ': [np.nan, np.nan, np.nan, np.nan, np.nan],
'c': [-0.04, np.nan, 0.00, np.nan, 0.16]
}
covar_index = ['a', 'XY', 'b', 'YZ', 'c']
covar = pd.DataFrame(covar_data, index=covar_index)
a XY b YZ c
a 0.04 NaN 0.03 NaN -0.04
XY NaN NaN NaN NaN NaN
b 0.03 NaN 0.09 NaN 0.00
YZ NaN NaN NaN NaN NaN
c -0.04 NaN 0.00 NaN 0.16
Some items (“XY” and “YZ” in this example, but many more in the real dataset) are clones of other items (“XY” is clone of “a” and “YZ” is clone of “b”).
I need to:
- fill each clone column with the column of the cloned item
- fill each clone row with the row of the cloned item.
The fill can be with the same or the opposite sign.
The missing diagonal values should be the same as the diagonal cell of the corresponding cloned item, always with the same sign. So [“XY”:”XY”] = [“a”:”a”] and [“YZ”:”YZ”] = [“b”:”b”]
I have another dataframe (“df”) where I have the clone, the item it clones and the sign (“1” means same sign, “-1” means opposite sign).
clone_data = {
'cloned_item': ['a', 'b'],
'sign': [1, -1]
}
clone_index = ['XY', 'YZ']
df = pd.DataFrame(clone_data, index=clone_index)
cloned_item sign
clone
XY a 1
YZ b -1
This is the expected output:
a XY b YZ c
a 0.04 0.04 0.03 -0.03 -0.04
XY 0.04 0.04 0.03 -0.03 -0.04
b 0.03 0.03 0.09 -0.09 0.00
YZ -0.03 -0.03 -0.09 0.09 0.00
c -0.04 -0.04 0.00 0.00 0.16
As you can see, “XY” column/row is the same as “a” column/row, with the same sign.
“YZ” column/row is the same as “b” column/row, but with the opposite sign.
Diagonal value for “XY” and “YZ” are the same as those for “a” and “b”.
Any ideas? Thanks
You can do this in two parts: replicate the rows (multiplied by sign), and then replicate the columns (multiplied by sign). This returns the output you are looking for.
# set rows to be multiple of cloned rows
covar.loc[df.index] = (
covar.loc[df.cloned_item]
# multiply row-wise
.mul(df.set_index("cloned_item")["sign"], axis=0)
# rename index
.set_axis(df.index, axis=0)
)
# set columns to be multiple of cloned columns
covar.loc[:, df.index] = (
covar.loc[:, df.cloned_item]
# multiply column-wise
.mul(df.set_index("cloned_item")["sign"], axis=1)
# rename columns
.set_axis(df.index, axis=1)
)
As an alternative you .set_axis(...)
, you could use .values
.
You need to identify and fill the clones rows and columns and then update the diagonal:
import pandas as pd
import numpy as np
covar_data = {
'a': [0.04, np.nan, 0.03, np.nan, -0.04],
'XY': [np.nan, np.nan, np.nan, np.nan, np.nan],
'b': [0.03, np.nan, 0.09, np.nan, 0.00],
'YZ': [np.nan, np.nan, np.nan, np.nan, np.nan],
'c': [-0.04, np.nan, 0.00, np.nan, 0.16]
}
covar_index = ['a', 'XY', 'b', 'YZ', 'c']
covar = pd.DataFrame(covar_data, index=covar_index)
clone_data = {
'cloned_item': ['a', 'b'],
'sign': [1, -1]
}
clone_index = ['XY', 'YZ']
df = pd.DataFrame(clone_data, index=clone_index)
for clone, row in df.iterrows():
cloned_item = row['cloned_item']
sign = row['sign']
covar[clone] = covar[cloned_item] * sign
covar.loc[clone] = covar.loc[cloned_item] * sign
covar.loc[clone, clone] = covar.loc[cloned_item, cloned_item] * np.sign(sign)
print(covar)
which gives
a XY b YZ c
a 0.04 0.04 0.03 -0.03 -0.04
XY 0.04 0.04 0.03 -0.03 -0.04
b 0.03 0.03 0.09 -0.09 0.00
YZ -0.03 -0.03 -0.09 -0.09 -0.00
c -0.04 -0.04 0.00 -0.00 0.16