How to find all possible daughters in a sequence of numbers stored in dataframe

Question

I have a python dataframe which one of its column such as column1 contains series of numbers. I have to mention that each these numbers are the result of cell mutation so cell with number n deviates to two cells with following numbers: 2*n and 2*n+1. I want to search in this column to find all rows corresponds to daughters of specific number k. I mean the rows which contains all possible {2*k, 2*k+1, 2*(2*k), 2*(2*k+1), ... } in their column1. I don't want to use tree structure, how can I approach the solution ? thanks


Show source
| numpy   | python   | dataframe   | search   2016-12-15 17:12 2 Answers

Answers to How to find all possible daughters in a sequence of numbers stored in dataframe ( 2 )

  1. 2016-12-15 18:12

    Ugly but seems to work alright. What I think you might have needed to know is the newer yield from construction. Used twice in this code. Never thought I would.

    from fractions import Fraction
    from itertools import count
    
    def daughters(k):
        print ('daughters of cell', k)
        if k<=0:
            return
        if k==1:
            yield from count(1)
    
        def locateK():
            cells = 1
            newCells = 2
            generation = 1
            while True:
                generation += 1
                previousCells = cells
                cells += newCells 
                newCells *= 2
                if k > previousCells and k <= cells :
                    break
            return ( generation, k - previousCells )
    
        parentGeneration, parentCell = locateK()
    
        cells = 1
        newCells = 2
        generation = 1
        while True:
            generation += 1
            previousCells = cells
            if generation > parentGeneration:
                if parentCell%2:
                    firstChildCell=previousCells+int(Fraction(parentCell-1, 2**parentGeneration)*newCells)+1
                else:
                    firstChildCell=previousCells+int(Fraction(parentCell, 2**parentGeneration)*newCells)+1
                yield from range(firstChildCell, firstChildCell+int(newCells*Fraction(1,2)))
            cells += newCells
            newCells *= 2
    
    for n, d in enumerate(daughters(2)):
        print (d)
        if n > 15:
            break
    

    Couple of representative results:

    daughters of cell 2
    4
    5
    8
    9
    10
    11
    16
    17
    18
    19
    20
    21
    22
    23
    32
    33
    34
    
    
    daughters of cell 3
    6
    7
    12
    13
    14
    15
    24
    25
    26
    27
    28
    29
    30
    31
    48
    49
    50
    
  2. 2016-12-16 02:12

    The two sequences look like the numbers who's binary expansion starts with 10 and the numbers for which the binary expansion starts with 11.

    Both sequences can be found directly:

    import math
    
    def f(n=2):
        while True:
            yield int(n + 2**math.floor(math.log(n,2)))
            n += 1
    
    def g(n=2):
        while True:
            yield int(n + 2 * 2**math.floor(math.log(n,2)))
            n += 1
    
    a, b = f(), g()
    print [a.next() for i in range(15)]
    print [b.next() for i in range(15)]
    >>> [4, 5, 8, 9, 10, 11, 16, 17, 18, 19, 20, 21, 22, 23, 32]
    >>> [6, 7, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31, 48]
    

    EDIT:

    For an arbitrary starting point, you can do the following, which I think meets your criteria.

    import Queue
    
    def f(k):
        q = Queue.Queue()
        q.put(k)
    
        while not q.empty():
            p = q.get()
            a, b = 2*p, 2*p+1
            q.put(a)
            q.put(b)
            yield a
            yield b
    
    a = f(4)
    print [a.next() for i in range(16)]
    >>> [8, 9, 16, 17, 18, 19, 32, 33, 34, 35, 36, 37, 38, 39, 64, 65] # ...
    
    a = f(5)
    print [a.next() for i in range(16)]
    >>> [10, 11, 20, 21, 22, 23, 40, 41, 42, 43, 44, 45, 46, 47, 80, 81] # ...
    

    Checking those sequences against OEIS:

    f(2) - Starting 10  - A004754
    f(3) - Starting 11  - A004755
    f(4) - Starting 100 - A004756
    f(5) - Starting 101 - A004756
    f(6) - Starting 110 - A004758
    f(7) - Starting 111 - A004759
    ...
    

    Which means you can simply do:

    import math
    
    def f(k, n=2):
        while True:
            yield int(n + (k-1) * 2**math.floor(math.log(n, 2)))
            n+=1
    
    for i in range(2,8):
        a = f(i)
        print i, [a.next() for j in range(16)]
    
    >>> 2 [4, 5, 8, 9, 10, 11, 16, 17, 18, 19, 20, 21, 22, 23, 32]
    >>> 3 [6, 7, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31, 48]
    >>> 4 [8, 9, 16, 17, 18, 19, 32, 33, 34, 35, 36, 37, 38, 39, 64]
    >>> 5 [10, 11, 20, 21, 22, 23, 40, 41, 42, 43, 44, 45, 46, 47, 80]
    >>> 6 [12, 13, 24, 25, 26, 27, 48, 49, 50, 51, 52, 53, 54, 55, 96]
    >>> 7 [14, 15, 28, 29, 30, 31, 56, 57, 58, 59, 60, 61, 62, 63, 112]
    # ... where the first number is shown for clarity.
    

Leave a reply to - How to find all possible daughters in a sequence of numbers stored in dataframe

◀ Go back