I was wondering what the expected behavior of pandas 2.2.2 and later with active copy-on-write (CoW) actually is.
I understand that changing a dataframe (or list etc) in a function in python generally propagates the change to the caller as well due to how python works.
The pandas documentation mentions “Only one pandas object is updated at once” in https://pandas.pydata.org/docs/user_guide/copy_on_write.html and also recommends to activate CoW and adaot to it as it changes behavior and will be mandatory in pandas 3.
I have some functions which process a number of pandas dataframes, the functions are required to NOT change their input parameters. So far, I used .copy()
calls to those dataframes at the top of all my functions to ensure changes to the inputs remain local.
The documentation statement “Only one pandas object is updated at once” made me believe I don’t need the .copy()
calls anymore with CoW but all examples there deal with just one scope. So I tested the behavior and the following code example shows that a change is visible in the caller as well:
import pandas as pd
pd.options.mode.copy_on_write = True
def change_df(df):
# df = df.copy()
df["b"] += 1
def main():
df = pd.DataFrame({"a": [0], "b": [1]})
change_df(df)
print(df)
main()
Output:
a b
0 0 2
Is that the expected behavior and am I misinterpreting the pandas documentation (which might very well be possible)?
Does that mean I still need to call .copy()
at the top of my functions with CoW and the statement “Only one pandas object is updated at once” only applies within one scope, i.e. doesn’t apply to caller and callee?
Thanks!
ManuR is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.