Pandas groupby month and year

I have the following dataframe:

Date        abc    xyz
01-Jun-13   100    200
03-Jun-13   -20    50
15-Aug-13   40     -5
20-Jan-14   25     15
21-Feb-14   60     80

I need to group the data by year and month. I.e., Group by Jan 2013, Feb 2013, Mar 2013, etc…

I will be using the newly grouped data to create a plot showing abc vs xyz per year/month.

I’ve tried various combinations of groupby and sum, but I just can’t seem to get anything to work. How can I do it?

You can use either resample or Grouper (which resamples under the hood).

First make sure that the datetime column is actually of datetimes (hit it with pd.to_datetime). It’s easier if it’s a DatetimeIndex:

In [11]: df1
Out[11]:
            abc  xyz
Date
2013-06-01  100  200
2013-06-03  -20   50
2013-08-15   40   -5
2014-01-20   25   15
2014-02-21   60   80

In [12]: g = df1.groupby(pd.Grouper(freq="M"))  # DataFrameGroupBy (grouped by Month)

In [13]: g.sum()
Out[13]:
            abc  xyz
Date
2013-06-30   80  250
2013-07-31  NaN  NaN
2013-08-31   40   -5
2013-09-30  NaN  NaN
2013-10-31  NaN  NaN
2013-11-30  NaN  NaN
2013-12-31  NaN  NaN
2014-01-31   25   15
2014-02-28   60   80

In [14]: df1.resample("M", how='sum')  # the same
Out[14]:
            abc  xyz
Date
2013-06-30   40  125
2013-07-31  NaN  NaN
2013-08-31   40   -5
2013-09-30  NaN  NaN
2013-10-31  NaN  NaN
2013-11-30  NaN  NaN
2013-12-31  NaN  NaN
2014-01-31   25   15
2014-02-28   60   80

Note: Previously pd.Grouper(freq="M") was written as pd.TimeGrouper("M"). The latter is now deprecated since 0.21.

I had thought the following would work, but it doesn’t (due to as_index not being respected? I’m not sure.). I’m including this for interest’s sake.

If it’s a column (it has to be a datetime64 column! as I say, hit it with to_datetime), you can use the PeriodIndex:

In [21]: df
Out[21]:
        Date  abc  xyz
0 2013-06-01  100  200
1 2013-06-03  -20   50
2 2013-08-15   40   -5
3 2014-01-20   25   15
4 2014-02-21   60   80

In [22]: pd.DatetimeIndex(df.Date).to_period("M")  # old way
Out[22]:
<class 'pandas.tseries.period.PeriodIndex'>
[2013-06, ..., 2014-02]
Length: 5, Freq: M

In [23]: per = df.Date.dt.to_period("M")  # new way to get the same

In [24]: g = df.groupby(per)

In [25]: g.sum()  # dang not quite what we want (doesn't fill in the gaps)
Out[25]:
         abc  xyz
2013-06   80  250
2013-08   40   -5
2014-01   25   15
2014-02   60   80

To get the desired result we have to reindex…

Keep it simple:

GB = DF.groupby([(DF.index.year), (DF.index.month)]).sum()

giving you,

print(GB)
        abc  xyz
2013 6   80  250
     8   40   -5
2014 1   25   15
     2   60   80

and then you can plot like asked using,

GB.plot('abc', 'xyz', kind='scatter')

There are different ways to do that.

I created the data frame to showcase the different techniques to filter your data.

  df = pd.DataFrame({'Date': ['01-Jun-13', '03-Jun-13', '15-Aug-13', '20-Jan-14', '21-Feb-14'],
                     'abc': [100, -20, 40, 25, 60], 'xyz': [200, 50,-5, 15, 80] })

I separated months/year/day and separated month-year as you explained.

  def getMonth(s):
      return s.split("-")[1]

  def getDay(s):
      return s.split("-")[0]

  def getYear(s):
      return s.split("-")[2]

  def getYearMonth(s):
      return s.split("-")[1] + "-" + s.split("-")[2]

I created new columns: year, month, day and ‘yearMonth‘. In your case, you need one of both. You can group using two columns 'year','month' or using one column yearMonth

  df['year'] = df['Date'].apply(lambda x: getYear(x))
  df['month'] = df['Date'].apply(lambda x: getMonth(x))
  df['day'] = df['Date'].apply(lambda x: getDay(x))
  df['YearMonth'] = df['Date'].apply(lambda x: getYearMonth(x))

Output:

        Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13
2  15-Aug-13   40   -5   13   Aug  15    Aug-13
3  20-Jan-14   25   15   14   Jan  20    Jan-14
4  21-Feb-14   60   80   14   Feb  21    Feb-14

You can go through the different groups in groupby(..) items.

In this case, we are grouping by two columns:

  for key, g in df.groupby(['year', 'month']):
      print key, g

Output:

('13', 'Jun')         Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13
('13', 'Aug')         Date  abc  xyz year month day YearMonth
2  15-Aug-13   40   -5   13   Aug  15    Aug-13
('14', 'Jan')         Date  abc  xyz year month day YearMonth
3  20-Jan-14   25   15   14   Jan  20    Jan-14
('14', 'Feb')         Date  abc  xyz year month day YearMonth

In this case, we are grouping by one column:

  for key, g in df.groupby(['YearMonth']):
      print key, g

Output:

Jun-13         Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13
Aug-13         Date  abc  xyz year month day YearMonth
2  15-Aug-13   40   -5   13   Aug  15    Aug-13
Jan-14         Date  abc  xyz year month day YearMonth
3  20-Jan-14   25   15   14   Jan  20    Jan-14
Feb-14         Date  abc  xyz year month day YearMonth
4  21-Feb-14   60   80   14   Feb  21    Feb-14

In case you want to access a specific item, you can use get_group

  print df.groupby(['YearMonth']).get_group('Jun-13')

Output:

        Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13

Similar to get_group. This hack would help to filter values and get the grouped values.

This also would give the same result.

  print df[df['YearMonth']=='Jun-13']

Output:

        Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13

You can select list of abc or xyz values during Jun-13

  print df[df['YearMonth']=='Jun-13'].abc.values
  print df[df['YearMonth']=='Jun-13'].xyz.values

Output:

[100 -20]  #abc values
[200  50]  #xyz values

You can use this to go through the dates that you have classified as “year-month” and apply criteria on it to get related data.

  for x in set(df.YearMonth):
      print df[df['YearMonth']==x].abc.values
      print df[df['YearMonth']==x].xyz.values

I recommend also to check this answer as well.

You can also do it by creating a string column with the year and month as follows:

df['date'] = df.index
df['year-month'] = df['date'].apply(lambda x: str(x.year) + ' ' + str(x.month))
grouped = df.groupby('year-month')

However this doesn’t preserve the order when you loop over the groups, e.g.

for name, group in grouped:
    print(name)

Will give:

So then, if you want to preserve the order, you must do as suggested by @Q-man above:

grouped = df.groupby([df.index.year, df.index.month])

This will preserve the order in the above loop:

(2007, 11)
(2007, 12)
(2008, 1)
(2008, 2)
(2008, 3)
(2008, 4)
(2008, 5)
(2008, 6)
(2008, 7)
(2008, 8)
(2008, 9)
(2008, 10)

Some of the answers are using Date as an index instead of a column (and there’s nothing wrong with doing that).

However, for anyone who has the dates stored as a column (instead of an index), remember to access the column’s dt attribute. That is:

# First make sure `Date` is a datetime column
df['Date'] = pd.to_datetime(
  arg=df['Date'],
  format='%d-%b-%y' # Assuming dd-Mon-yy format
)

# Group by year and month
df.groupby(
  [
    df['Date'].dt.year,
    df['Date'].dt.month 
  ]
).sum()

A simple and more generic method is to create a column that gives the bins you want the data in. For this problem, the bins will be years and months. Note that since you are plotting the result, you don’t need to include months with no data.

The index will be converted to a datetime index, and will be used to create the bins. The groupby method removes the column when processing the bins, which become the rows in the index. Optionally the index can be converted to a datetime index again at the end, allowing proper sorts.

import pandas as pd

idx = ['01-Jun-13', '03-Jun-13', '15-Aug-13', '20-Jan-14', '21-Feb-14']
data = {'abc': [100, -20, 40, 25, 60],
        'xyz': [200, 50, -5, 15, 80]}

df = pd.DataFrame(index=idx, data=data)
df.index = pd.to_datetime(df.index, format='%d-%b-%y')
df['bin'] = df.index.strftime('%Y-%m-01')
result = df.groupby('bin').sum()
result.index = pd.to_datetime(result.index)

df and result are now:

In [1]: df
Out[1]: 
            abc  xyz         bin
2013-06-01  100  200  2013-06-01
2013-06-03  -20   50  2013-06-01
2013-08-15   40   -5  2013-08-01
2014-01-20   25   15  2014-01-01
2014-02-21   60   80  2014-02-01

In [2]: result
Out[2]: 
            abc  xyz
bin                 
2013-06-01   80  250
2013-08-01   40   -5
2014-01-01   25   15
2014-02-01   60   80

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 10:37

Thẻ: pythonpandas

Thiết kế website giá rẻ

Danh mục

Pandas groupby month and year