group by pandas dataframe and select latest in each group

Question

How to group values of pandas dataframe and select the latest(by date) from each group?

For example, given a dataframe sorted by date:

    id     product   date
0   220    6647     2014-09-01 
1   220    6647     2014-09-03 
2   220    6647     2014-10-16
3   826    3380     2014-11-11
4   826    3380     2014-12-09
5   826    3380     2015-05-19
6   901    4555     2014-09-01
7   901    4555     2014-10-05
8   901    4555     2014-11-01

grouping by id or product, and selecting the earliest gives:

    id     product   date
2   220    6647     2014-10-16
5   826    3380     2015-05-19
8   901    4555     2014-11-01

Show source
| pandas   | python   2017-01-07 21:01 2 Answers

Answers ( 2 )

  1. 2017-01-07 21:01

    use idxmax in groupby and slice df with loc

    df.loc[df.groupby('id').date.idxmax()]
    
        id  product       date
    2  220     6647 2014-10-16
    5  826     3380 2015-05-19
    8  901     4555 2014-11-01
    
  2. 2017-01-08 10:01

    You can also use tail with groupby to get the last n values of the group:

    df.sort_values('date').groupby('id').tail(1)
    
        id  product date
    2   220 6647    2014-10-16
    8   901 4555    2014-11-01
    5   826 3380    2015-05-19
    
◀ Go back