I am having some issues around setting new column values that are produced by a pandas apply
function. If you scroll to the bottom I’ve provided a MRE you should be able to copy-paste and run.
Explanation
Let’s assume we have a DataFrame df
as follows:
df = pd.DataFrame({"a":["ABC", "DEF", "GHI"], "b":[1, 2, 3]})
a | b |
---|---|
ABC | 1 |
DEF | 2 |
GHI | 3 |
The specific data is irrelevant. Let’s assume I have another function some_function()
that operates on both columns, performs some logic, and returns a tuple
of two values that I wish to make into columns.
The logic inside some_function()
is irrelevant again, in this example it appends the value of b
to an upper and lowercase a
value.
def some_func(param_a: str, param_b: int) -> tuple[str, str]:
output_a = param_a + " " + str(param_b)
output_b = param_a.lower() + " " + str(param_b)
return output_a, output_b
If you use []
indexing to insert the columns, then you get:
df[["output_a", "output_b"]] = df.apply(
lambda row:
some_func(row["a"], row["b"]),
axis=1,
result_type="expand"
)
Which results in print(df)
giving this (the desired output):
a | b | output_a | output_b |
---|---|---|---|
ABC | 1 | ABC 1 | abc 1 |
DEF | 2 | DEF 2 | def 2 |
GHI | 3 | GHI 3 | ghi 3 |
However, if you use the .loc
operator instead like so:
df.loc[:, ["output_a", "output_b"]] = df.apply(
lambda row:
some_func(row["a"], row["b"]),
axis=1,
result_type="expand"
)
Then print(df)
gives:
a | b | output_a | output_b |
---|---|---|---|
ABC | 1 | nan | nan |
DEF | 2 | nan | nan |
GHI | 3 | nan | nan |
Question
Is this behaviour expected, or is this a bug? Alternatively, is there something that I am doing wrong?
MRE
You should be able to copy this and run it locally. You will need pandas==2.2.1
.
import pandas as pd
df = pd.DataFrame({"a":["ABC", "DEF", "GHI"], "b":[1, 2, 3]})
df2 = df.copy()
def some_func(param_a: str, param_b: int) -> tuple[str, str]:
output_a = param_a + " " + str(param_b)
output_b = param_a.lower() + " " + str(param_b)
return output_a, output_b
df[["output_a", "output_b"]] = df.apply(lambda row: some_func(row["a"], row["b"]), axis=1, result_type="expand")
df2.loc[:, ["output_a", "output_b"]] = df.apply(lambda row: some_func(row["a"], row["b"]), axis=1, result_type="expand")
print("DF 1 with [ ]")
print(df)
print()
print("DF 2 with .loc")
print(df2)
Has anyone experienced this issue before, or has some way around it? Alternatively, what is the correct way of accomplishing what I want to accomplish?