I have a Pandas DataFrame
, as defined here:
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],
'Age': [25, 30, 35],
'Location': ['Seattle', 'New York', 'Kona']},
index=([10, 20, 30]))
However, when I index into this DataFrame
, I can’t accurately predict what type of object is going to result from the indexing:
# (1) str
df.iloc[0, df.columns.get_loc('Name')]
# (2) Series
df.iloc[0:1, df.columns.get_loc('Name')]
# (3) Series
df.iloc[0:2, df.columns.get_loc('Name')]
# (4) DataFrame
df.iloc[0:2, df.columns.get_loc('Name'):df.columns.get_loc('Age')]
# (5) Series
df.iloc[0, df.columns.get_loc('Name'):df.columns.get_loc('Location')]
# (6) DataFrame
df.iloc[0:1, df.columns.get_loc('Name'):df.columns.get_loc('Location')]
Note that each of the pairs above contain the same data. (e.g. (2)
is a Series that contains a single string, (4)
is a DataFrame that contains a single column, etc.)
Why do they output different types of objects? How can I predict what type of object will be output?
Given the data, it looks like the rule is based on how many slices (colons) you have in the index:
- 0 slices (
(1)
): scalar value - 1 slice (
(2)
,(3)
,(5)
):Series
- 2 slices (
(4)
,(6)
):DataFrame
However, I’m not sure if this is always true, and even if it is always true, I want to know the underlying mechanism as to why it is like that.
I’ve spent a while looking at the indexing documentation, but it doesn’t seem to describe this behavior clearly. The documentation for the iloc
function also doesn’t describe the return types.
I’m also interested in the same question for loc
instead of iloc
, but, since loc
is inclusive, the results aren’t quite as bewildering. (That is, you can’t get pairs of indexes with different types where the indexes should pull out the exact same data.)