Pandas dataframe reverse sort on subsets using groupby

Question

I have a dataframe with names = ['id','t','metric_1','metric_2','metric_3']. I am running some signal processing on each grp in groupby('id'). I need to reverse the time for another process that is taking in the entire dataframe and doing it's processing under the hood. Simply put, given a grp, I need to reverse the time column only, leaving all other columns intact, and all rows not in grp intact.

Input dataframe:

    id  t   metric_1    metric_2    metric_3
0   0   86  13.333  61.989  0.017444
1   0   87  13.333  61.993  0.017569
2   0   88  13.333  61.992  0.017711
3   0   89  13.333  61.998  0.017746
4   0   90  13.333  61.993  0.017871
5   1   32  13.333  61.964  0.018511
6   1   33  20.000  61.913  0.020058
7   1   34  20.000  61.864  0.022475
8   1   35  26.667  61.802  0.025995
9   1   36  33.123  61.563  0.032345
10  1   37  33.763  61.836  0.060189
11  2   2   13.333  61.964  0.018511
12  2   3   20.000  61.613  0.020058
13  2   4   20.000  61.164  0.027475
14  2   5   26.667  61.802  0.024995
15  2   6   33.333  61.736  0.030689

I would like to use an operation that yields a dataframe like this:

    id  t   metric_1    metric_2    metric_3
0   0   90  13.333  61.989  0.017444
1   0   89  13.333  61.993  0.017569
2   0   88  13.333  61.992  0.017711
3   0   87  13.333  61.998  0.017746
4   0   86  13.333  61.993  0.017871
5   1   37  13.333  61.964  0.018511
6   1   36  20.000  61.913  0.020058
7   1   35  20.000  61.864  0.022475
8   1   34  26.667  61.802  0.025995
9   1   33  33.333  61.736  0.030689
10  1   32  33.763  61.836  0.060189
11  2   6   13.333  61.964  0.018511
12  2   5   20.000  61.613  0.020058
13  2   4   20.000  61.164  0.027475
14  2   3   26.667  61.802  0.024995
15  2   2   33.333  61.736  0.030689

Show source
| sorting   | pandas   | python   | group-by   2017-01-04 23:01 2 Answers

Answers to Pandas dataframe reverse sort on subsets using groupby ( 2 )

  1. 2017-01-04 23:01

    UPDATE2: sort / replace values in the t column, but only for those rows where id == 0 (as described in this comment):

    In [373]: df
    Out[373]:
       id   t  metric_1  metric_2  metric_3
    0   0  86    13.333    61.989  0.017444
    1   0  87    13.333    61.993  0.017569
    2   0  88    13.333    61.992  0.017711
    3   0  89    13.333    61.998  0.017746
    4   0  90    13.333    61.993  0.017871
    5   1  86    13.333    61.964  0.018511
    6   1  87    20.000    61.913  0.020058
    7   1  88    20.000    61.864  0.022475
    8   1  89    26.667    61.802  0.025995
    9   1  90    33.333    61.736  0.030689
    
    In [374]: df.loc[df.id == 0, 't'] = df.loc[df.id == 0, 't'].sort_values(ascending=0).values
    
    In [375]: df
    Out[375]:
       id   t  metric_1  metric_2  metric_3
    0   0  90    13.333    61.989  0.017444
    1   0  89    13.333    61.993  0.017569
    2   0  88    13.333    61.992  0.017711
    3   0  87    13.333    61.998  0.017746
    4   0  86    13.333    61.993  0.017871
    5   1  86    13.333    61.964  0.018511
    6   1  87    20.000    61.913  0.020058
    7   1  88    20.000    61.864  0.022475
    8   1  89    26.667    61.802  0.025995
    9   1  90    33.333    61.736  0.030689
    

    UPDATE: for updated data sets

    original DF:

    In [363]: df
    Out[363]:
       id   t  metric_1  metric_2  metric_3
    0   0  86    13.333    61.989  0.017444
    1   0  87    13.333    61.993  0.017569
    2   0  88    13.333    61.992  0.017711
    3   0  89    13.333    61.998  0.017746
    4   0  90    13.333    61.993  0.017871
    5   1  86    13.333    61.964  0.018511
    6   1  87    20.000    61.913  0.020058
    7   1  88    20.000    61.864  0.022475
    8   1  89    26.667    61.802  0.025995
    9   1  90    33.333    61.736  0.030689
    

    sorting complete rows:

    In [364]: df.sort_values(['id','t'], ascending=[1,0])
    Out[364]:
       id   t  metric_1  metric_2  metric_3
    4   0  90    13.333    61.993  0.017871
    3   0  89    13.333    61.998  0.017746
    2   0  88    13.333    61.992  0.017711
    1   0  87    13.333    61.993  0.017569
    0   0  86    13.333    61.989  0.017444
    9   1  90    33.333    61.736  0.030689
    8   1  89    26.667    61.802  0.025995
    7   1  88    20.000    61.864  0.022475
    6   1  87    20.000    61.913  0.020058
    5   1  86    13.333    61.964  0.018511   # <--
    

    sorting values for two columns (['id','t']), replacing their values:

    In [366]: df[['id','t']] = df[['id','t']].sort_values(['id','t'], ascending=[1,0]).values
    
    In [367]: df
    Out[367]:
       id   t  metric_1  metric_2  metric_3
    0   0  90    13.333    61.989  0.017444
    1   0  89    13.333    61.993  0.017569
    2   0  88    13.333    61.992  0.017711
    3   0  87    13.333    61.998  0.017746
    4   0  86    13.333    61.993  0.017871
    5   1  90    13.333    61.964  0.018511
    6   1  89    20.000    61.913  0.020058
    7   1  88    20.000    61.864  0.022475
    8   1  87    26.667    61.802  0.025995
    9   1  86    33.333    61.736  0.030689  # <--
    

    OLD answer:

    IIUC you can simply sort your data by two columns:

    In [349]: df.sort_values(['id','t'], ascending=[1,1])
    Out[349]:
       id   t  metric_1  metric_2  metric_3
    4   0  86    13.333    61.993  0.017871
    3   0  87    13.333    61.998  0.017746
    2   0  88    13.333    61.992  0.017711
    1   0  89    13.333    61.993  0.017569
    0   0  90    13.333    61.989  0.017444
    9   1  86    33.333    61.736  0.030689
    8   1  87    26.667    61.802  0.025995
    7   1  88    20.000    61.864  0.022475
    6   1  89    20.000    61.913  0.020058
    5   1  90    13.333    61.964  0.018511
    

    if you want to sort it as in your desired data set (replacing t column values):

    In [357]: df[['id','t']] = df[['id','t']].sort_values(['id','t']).values
    
    In [358]: df
    Out[358]:
       id   t  metric_1  metric_2  metric_3
    0   0  86    13.333    61.989  0.017444
    1   0  87    13.333    61.993  0.017569
    2   0  88    13.333    61.992  0.017711
    3   0  89    13.333    61.998  0.017746
    4   0  90    13.333    61.993  0.017871
    5   1  86    13.333    61.964  0.018511
    6   1  87    20.000    61.913  0.020058
    7   1  88    20.000    61.864  0.022475
    8   1  89    26.667    61.802  0.025995
    9   1  90    33.333    61.736  0.030689   # 1 90 33.333 61.736 0.030689 as in your desired DF
    
  2. 2017-01-05 07:01

    if you want to reverse the 't' column leaving all other columns intact, you can try following code:

    df.t=df['t'].sort_values(ascending=False)
    

Leave a reply to - Pandas dataframe reverse sort on subsets using groupby

◀ Go back