Thiết kế website giá rẻ

Question

I am having some issues around setting new column values that are produced by a pandas apply function. If you scroll to the bottom I’ve provided a MRE you should be able to copy-paste and run.

Explanation

Let’s assume we have a DataFrame df as follows:

<code>df = pd.DataFrame({"a":["ABC", "DEF", "GHI"], "b":[1, 2, 3]})

</code>

<code>df = pd.DataFrame({"a":["ABC", "DEF", "GHI"], "b":[1, 2, 3]}) </code>

df = pd.DataFrame({"a":["ABC", "DEF", "GHI"], "b":[1, 2, 3]})

a	b
ABC	1
DEF	2
GHI	3

The specific data is irrelevant. Let’s assume I have another function some_function() that operates on both columns, performs some logic, and returns a tuple of two values that I wish to make into columns.

The logic inside some_function() is irrelevant again, in this example it appends the value of b to an upper and lowercase a value.

<code>def some_func(param_a: str, param_b: int) -> tuple[str, str]:

output_a = param_a + " " + str(param_b)

output_b = param_a.lower() + " " + str(param_b)

return output_a, output_b

</code>

<code>def some_func(param_a: str, param_b: int) -> tuple[str, str]: output_a = param_a + " " + str(param_b) output_b = param_a.lower() + " " + str(param_b) return output_a, output_b </code>

def some_func(param_a: str, param_b: int) -> tuple[str, str]:
    output_a = param_a + " " + str(param_b)
    output_b = param_a.lower() + " " + str(param_b)

    return output_a, output_b

If you use [] indexing to insert the columns, then you get:

<code>df[["output_a", "output_b"]] = df.apply(

lambda row:

some_func(row["a"], row["b"]),

axis=1,

result_type="expand"

)

</code>

<code>df[["output_a", "output_b"]] = df.apply( lambda row: some_func(row["a"], row["b"]), axis=1, result_type="expand" ) </code>

df[["output_a", "output_b"]] = df.apply(
    lambda row: 
        some_func(row["a"], row["b"]),
    axis=1,
    result_type="expand"
    )

Which results in print(df) giving this (the desired output):

a	b	output_a	output_b
ABC	1	ABC 1	abc 1
DEF	2	DEF 2	def 2
GHI	3	GHI 3	ghi 3

However, if you use the .loc operator instead like so:

<code>df.loc[:, ["output_a", "output_b"]] = df.apply(

lambda row:

some_func(row["a"], row["b"]),

axis=1,

result_type="expand"

)

</code>

<code>df.loc[:, ["output_a", "output_b"]] = df.apply( lambda row: some_func(row["a"], row["b"]), axis=1, result_type="expand" ) </code>

df.loc[:, ["output_a", "output_b"]] = df.apply(
    lambda row: 
    some_func(row["a"], row["b"]), 
    axis=1, 
    result_type="expand"
    )

Then print(df) gives:

a	b	output_a	output_b
ABC	1	nan	nan
DEF	2	nan	nan
GHI	3	nan	nan

Question

Is this behaviour expected, or is this a bug? Alternatively, is there something that I am doing wrong?

MRE

You should be able to copy this and run it locally. You will need pandas==2.2.1.

<code>import pandas as pd

df = pd.DataFrame({"a":["ABC", "DEF", "GHI"], "b":[1, 2, 3]})

df2 = df.copy()

def some_func(param_a: str, param_b: int) -> tuple[str, str]:

output_a = param_a + " " + str(param_b)

output_b = param_a.lower() + " " + str(param_b)

return output_a, output_b

df[["output_a", "output_b"]] = df.apply(lambda row: some_func(row["a"], row["b"]), axis=1, result_type="expand")

df2.loc[:, ["output_a", "output_b"]] = df.apply(lambda row: some_func(row["a"], row["b"]), axis=1, result_type="expand")

print("DF 1 with [ ]")

print(df)

print()

print("DF 2 with .loc")

print(df2)

</code>

<code>import pandas as pd df = pd.DataFrame({"a":["ABC", "DEF", "GHI"], "b":[1, 2, 3]}) df2 = df.copy() def some_func(param_a: str, param_b: int) -> tuple[str, str]: output_a = param_a + " " + str(param_b) output_b = param_a.lower() + " " + str(param_b) return output_a, output_b df[["output_a", "output_b"]] = df.apply(lambda row: some_func(row["a"], row["b"]), axis=1, result_type="expand") df2.loc[:, ["output_a", "output_b"]] = df.apply(lambda row: some_func(row["a"], row["b"]), axis=1, result_type="expand") print("DF 1 with [ ]") print(df) print() print("DF 2 with .loc") print(df2) </code>

import pandas as pd

df = pd.DataFrame({"a":["ABC", "DEF", "GHI"], "b":[1, 2, 3]})
df2 = df.copy()

def some_func(param_a: str, param_b: int) -> tuple[str, str]:
    output_a = param_a + " " + str(param_b)
    output_b = param_a.lower() + " " + str(param_b)

    return output_a, output_b


df[["output_a", "output_b"]] = df.apply(lambda row: some_func(row["a"], row["b"]), axis=1, result_type="expand")

df2.loc[:, ["output_a", "output_b"]] = df.apply(lambda row: some_func(row["a"], row["b"]), axis=1, result_type="expand")

print("DF 1 with [ ]")
print(df)
print()
print("DF 2 with .loc")
print(df2)

Has anyone experienced this issue before, or has some way around it? Alternatively, what is the correct way of accomplishing what I want to accomplish?

Thiết kế website giá rẻ

Danh mục

Pandas apply with “expand result” results in NaN columns when using loc but not [ ]

Explanation

Question

MRE