Data:
# Sample data
data = {
'Start Date': ['2022-10-18', '2022-10-25', '2023-04-17'],
'End Date': ['2022-10-20', '2023-04-06', '2023-07-04'],
'Close1': [17486.95, 17656.35, 17706.85],
'Close2': [17563.95, 17599.15, 19389.00],
'NF_BEES1': [0.58, 0.19, 0.12],
'NF_BEES2': [0.63, 0.75, 0.73],
'Difference': [77.00, -57.20, 1682.15],
'Days Difference': [2, 163, 78]
}
# Create DataFrame
df = pd.DataFrame(data)
df
# Convert date columns to datetime
df['Start Date'] = pd.to_datetime(df['Start Date'])
df['End Date'] = pd.to_datetime(df['End Date'])
Output:
Start Date End Date Close1 Close2 NF_BEES1 NF_BEES2 Difference Days Difference
0 2022-10-18 2022-10-20 17486.95 17563.95 0.58 0.63 77.00 2
1 2022-10-25 2023-04-06 17656.35 17599.15 0.19 0.75 -57.20 163
2 2023-04-17 2023-07-04 17706.85 19389.00 0.12 0.73 1682.15 78
If Days Difference
column > 8 , then create new row in df & End Date
of Earlier row should be Start Date
in new row & Start Date
of current row (Days Difference >8) should be End Date
.
Close1
, Close2
, NF_BEES1
, NF_BEES2
values should also be changed like Start Date
& End Date
. Based on that Days Difference
needs to be calculated for new row
My Code (Not working as Expected):
# Iterate through the DataFrame
for idx, row in df.iterrows():
rows.append(row)
if row['Days Difference'] > 8:
# Create a new row
new_row = row.copy()
new_row['Start Date'] = rows[-2]['End Date']
new_row['Close1'] = rows[-2]['Close2']
new_row['NF_BEES1'] = rows[-2]['NF_BEES2']
new_row['End Date'] = row['Start Date']
new_row['Close2'] = row['Close1']
new_row['NF_BEES2'] = row['NF_BEES1']
new_row['Difference'] = (new_row['Close2'] - new_row['Close1'])
new_row['Days Difference'] = (new_row['End Date'] - new_row['Start Date']).days
print(new_row)
print(type(new_row))
# Create a new DataFrame with the split rows
new_df = pd.DataFrame(rows)
new_df
My Output:
Start Date End Date Close1 Close2 NF_BEES1 NF_BEES2 Difference Days Difference
0 2022-10-18 2022-10-20 17486.95 17563.95 0.58 0.63 77.00 2
1 2022-10-25 2023-04-06 17656.35 17599.15 0.19 0.75 -57.20 163
2 2023-04-17 2023-07-04 17706.85 19389.00 0.12 0.73 1682.15 78
Expected Output:
Start Date End Date Close1 Close2 NF_BEES1 NF_BEES2 Difference Days Difference
18 2022-10-18 2022-10-20 17486.95 17563.95 0.58 0.63 77.00 2
19 2022-10-20 2022-10-25 17563.95 17656.35 0.63 0.19 93 5
20 2023-04-06 2023-04-17 17599.15 17706.85 0.75 0.12 107 11
2
You can use shift
conditionally:
out = df.copy()
mask = df["End Date"] - df["Start Date"] > pd.Timedelta(days=8)
out.loc[mask, ["Start Date", "Close1", "NF_BEES1"]] = df.shift().shift(-1, axis=1)
out.loc[mask, ["End Date", "Close2", "NF_BEES2"]] = df.shift(axis=1)
out["Difference"] = out["Close2"] - out["Close1"]
out["Days Difference"] = (out["End Date"] - out["Start Date"]).dt.days
Start Date End Date Close1 Close2 NF_BEES1 NF_BEES2 Difference Days Difference
0 2022-10-18 2022-10-20 17486.95 17563.95 0.58 0.63 77.0 2
1 2022-10-20 2022-10-25 17563.95 17656.35 0.63 0.19 92.4 5
2 2023-04-06 2023-04-17 17599.15 17706.85 0.75 0.12 107.7 11
3
IIUC, use a mask, a copy of End and shift
:
# identify rows to change
m = df['Days Difference'].gt(8)
# keep a copy of the previous End
prev_end = df['End Date'].shift()
# update the rows
df.loc[m, 'End Date'] = df['Start Date']
df.loc[m, 'Start Date'] = prev_end
Output:
Start Date End Date Close1 Close2 NF_BEES1 NF_BEES2 Difference Days Difference
0 2022-10-18 2022-10-20 17486.95 17563.95 0.58 0.63 77.00 2
1 2022-10-20 2022-10-25 17656.35 17599.15 0.19 0.75 -57.20 163
2 2023-04-06 2023-04-17 17706.85 19389.00 0.12 0.73 1682.15 78
generalization
If you want a new DataFrame as output, just do the same on a copy, and if you want to update multiple columns apply the shift
to all at once, and update the differences then perform a subtraction (and convert to days
when needed):
# make a copy
new_df = df.copy()
# shift the columns
m = df['Days Difference'].gt(8)
new_df.loc[m, ['End Date', 'Close2', 'NF_BEES2']] = df[['Start Date', 'Close1', 'NF_BEES1']][m].values
new_df.loc[m, ['Start Date', 'Close1', 'NF_BEES1']] = df[['End Date', 'Close2', 'NF_BEES2']].shift()[m].values
# update the differences
new_df['Difference'] = new_df['Close2'].sub(new_df['Close1'])
new_df['Days Difference'] = new_df['End Date'].sub(new_df['Start Date']).dt.days
Output:
Start Date End Date Close1 Close2 NF_BEES1 NF_BEES2 Difference Days Difference
0 2022-10-18 2022-10-20 17486.95 17563.95 0.58 0.63 77.0 2
1 2022-10-20 2022-10-25 17563.95 17656.35 0.63 0.19 92.4 5
2 2023-04-06 2023-04-17 17599.15 17706.85 0.75 0.12 107.7 11
4