Answers to Pandas - Find longest stretch without Nan values ( 4 )

  1. 2017-01-05 21:01

    So you can get the index values of the NaN's in the following way:

    import numpy as np
    
    index = df['x'].index[df['x'].apply(np.isnan)]
    df_index = df.index.values.tolist()
    [df_index.index(indexValue) for indexValue in index]
    
    >>> [0, 1, 3, 7]
    

    Then one solution would be to see the largest difference between subsequent index values and that would give you the longest stretch of non NaN values.

  2. 2017-01-05 22:01

    Maybe a faster way would be the following (given that you say you have a long dataframe, speed matters):

    In [19]: df = pd.DataFrame({'time':[1,2,3,4,5,6,7,8],'x':[1,np.NAN,3,np.NAN,8,7,5,np.NAN]})
    
    In [20]: index = df['x'].isnull()
    
    In [21]: df[index].index.values
    Out[21]: array([1, 3, 7])
    
  3. 2017-01-05 22:01

    Here's a vectorized approach with NumPy tools -

    a = df.x.values  # Extract out relevant column from dataframe as array
    m = np.concatenate(( [True], np.isnan(a), [True] ))  # Mask
    ss = np.flatnonzero(m[1:] != m[:-1]).reshape(-1,2)   # Start-stop limits
    start,stop = ss[(ss[:,1] - ss[:,0]).argmax()]  # Get max interval, interval limits
    

    Sample run -

    In [474]: a
    Out[474]: 
    array([  1.,  nan,   3.,  nan,  nan,  nan,  nan,   8.,   7.,   5.,   2.,
             5.,  nan,  nan])
    
    In [475]: start, stop
    Out[475]: (7, 12)
    

    The intervals are set such that the difference between each start and stop would give us the length of each interval. So, by ending index if you meant to get the last index of non-zero element, we need to subtract one from stop.

  4. 2017-01-05 22:01

    pandas

    f = dict(
        Start=pd.Series.first_valid_index,
        Stop=pd.Series.last_valid_index,
        Stretch='count'
    )
    
    agged = df.x.groupby(df.x.isnull().cumsum()).agg(f)
    agged.loc[agged.Stretch.idxmax(), ['Start', 'Stop']].values
    
    array([ 4.,  6.])
    

    numpy

    def pir(x):
        # pad with np.nan
        x = np.append(np.nan, np.append(x, np.nan))
        # find where null
        w = np.where(np.isnan(x))[0]
        # diff to find length of stretch
        # argmax to find where largest stretch
        a = np.diff(w).argmax()
        # return original positions of boundary nulls
        return w[[a, a + 1]] + np.array([0, -2])
    

    demo

    pir(df.x.values)
    
    array([4, 6])
    

    a = np.array([1, np.nan, 3, np.nan, np.nan, np.nan, np.nan, 8, 7, 5, 2, 5, np.nan, np.nan])
    pir(a)
    
    array([ 7, 11])
    

Leave a reply to - Pandas - Find longest stretch without Nan values

◀ Go back