In pandas, is there a way to work with one row of a DataFrame at a time, and for each row, indexing into the columns by name but without indexing into the rows? My current approach is (say) modifying values in column A
based on values in columns A
and B
:
import pandas as pd
# Represents complex operation on elements from multiple columns
def ComplexFunc( A, B ): return A+B
# Apply ComplexFunc to DataFrame one row at a time
df = pd.DataFrame({'A':[1,2],'B':[3,4]})
for idx in df.index:
df.loc[idx,'A'] = ComplexFunc( df.loc[idx,'A'], df.loc[idx,'B'] )
All the repeated 2-axes indexing results in noisy code for a simple calculation (or rather, a simple invocation of a function, which may contain complex calculations). Since the calculation involves only the data in a one row at a time, I would like to avoid inflating the amount of punctuations and code by repeated row-indexing.
I don’t use iterrows
because each row is still a DataFrame albeit of just one row. Thereore, I still need row indexing to access the individual scalar elements.
The itertuples
method seems reader-friendly, but as tuples, I can’t modify the contents of that cell in the table.
Ideally, there might be something like an “iterstruct” which can be used like:
for rowstruc in df.iterstruct():
rowstruc.A = ComplexFunc( rowstruc.A , rowstruc.B )
An dict-like counterpart might help too, at the cost of more punctuation (and code noise) but allowing for fielding names containing non-alphanumeric characters that prevent indexing as an attribute:
for rowdict in df.iterdict():
rowdict['A'] = ComplexFunc( rowdict['A'] , rowdict['B'] )
P.S. This is a non-vectorized use-case, e.g., if ComplexFunc
works on scalars. I am hoping to find more readable, lower volume code. Performance in not my driving consideration in posting this question. For cases where performance matters, I would consider other alternatives based on performance