This is 50% a question and 50% an observation that baffles me a bit. Maybe someone can enlighten me.
Also I would like to know opinions on using lists as cell values. Yes/No and why please.
Here is a trivial example:
data = [[['apple', 'banana'],1], [['grape', 'orange'],2], [['banana', 'lemon'],4]]
df = pd.DataFrame(data, columns=['Fruit', 'Count'])
which results in:
Fruit Count
0 [apple, banana] 1
1 [grape, orange] 2
2 [banana, lemon] 4
Given a new list:
input_list = ['melon', 'kiwi']
The using ‘loc’ approach:
(A) Outright doesn’t work.
df.loc[df['Count'] == 2, 'Fruit'] = [input_list] # with or without wrapping brackets is both bust
(B) Using Series also doesn’t work
ser = pd.Series(input_list) # NO wrapping which is an incorrect length Series object - fair enough
df.loc[df['Count'] == 2, 'Fruit'] = ser
# wrong result --->
Fruit Count
0 [apple, banana] 1
1 kiwi 2
2 [banana, lemon] 4
(C) Series Take 2
ser = pd.Series([input_list]) # WITH wrapping = Series --> 0 [melon, kiwi]
df.loc[df['Count'] == 2, 'Fruit'] = ser
# wrong result ---> NaN??? HUH?
Fruit Count
0 [apple, banana] 1
1 NaN 2
2 [banana, lemon] 4
The using ‘at’ approach:
(D)
mask = df['Count'] == 2
mask_match_idx = df[mask].index.values[0] # first match int value
df.at[mask_match_idx, 'Fruit'] = input_list
# results in (finally) the correct result
Fruit Count
0 [apple, banana] 1
1 [melon, kiwi] 2
2 [banana, lemon] 4
I understand that B is bust because of the wrong length Series object.
But why are (A) (or a version thereof) and (C) wrong? Or how could they work? Especially the NaN result is confusing. Why is that happening?
Is the conclusion to always use ‘at’ in those kind of cases?
And again: What are the takes for using lists as cell values in regards to stuff like this happening etc. Would love some input here and potential alternative suggestions if lists are a no go.
Thank you!