Consider the following piece of python code, which is essentially copied from the first code insert in the Transformation section of pandas‘ user guide’s Group by: split-apply-combine chapter.
import pandas as pd
import numpy as np
speeds = pd.DataFrame(
data = {'class': ['bird', 'bird', 'mammal', 'mammal', 'mammal'],
'order': ['Falconiformes', 'Psittaciformes', 'Carnivora', 'Primates', 'Carnivora'],
'max_speed': [389.0, 24.0, 80.2, np.NaN, 58.0]},
index = ['falcon', 'parrot', 'lion', 'monkey', 'leopard']
)
grouped = speeds.groupby('class')['max_speed']
grouped.diff()
When executed in Google Colab, the output is:
falcon NaN
parrot -365.0
lion NaN
monkey NaN
leopard NaN
Name: max_speed, dtype: float64
This is the same output as shown in the user guide.
Why is the value corresponding to the parrot
index element -365.0
rather than NaN
like the rest of the values in this Series?
The output is correct and expected. Here is a breakdown of what is does for clarity:
falcon NaN # NaN since first of the group
parrot -365.0 # 24 - 389 = NaN
lion NaN # NaN since first of the group
monkey NaN # NaN - 80.2 = NaN
leopard NaN # 58 - NaN = NaN
Name: max_speed, dtype: float64