How to optimize data generation for numpy call

Question

I'd like to know how to make the following code shorter and/or more efficient. Could I (or should I) get rid of the for loop by using a functional method, or is there method I should be using from numpy?

The code calculates the expected value of an array of of integers.

vals = np.arange(self.n+1)

# array of probability of each value in vals
parr = np.ones(len(vals))
for i in range(len(vals)):
    parr[i] *= self.prob(vals[i])

return np.dot(vals,parr)

As requested in comments, the implementation of the method prob():

def prob(self, x):

    """Computes probability of removing x items

    :param x: number of items to remove
    :returns: probability of removing x items
    """

    # p is the probability of removing an item
    # sl.choose computes n choose x
    return sl.choose(self.n, x) * (self.p**x) * \
           (1-self.p)**(self.n-x)

Show source
| numpy   | functional-programming   | python   | performance   2017-01-01 20:01 2 Answers

Answers to How to optimize data generation for numpy call ( 2 )

  1. 2017-01-01 21:01

    The loop can be reduced to a list comprehension:

    vals = np.arange(self.n+1)
    
    # array of probability of each value in vals
    parr = [self.prob(v) for v in vals]
    
    return np.dot(vals, parr)
    
  2. 2017-01-01 21:01

    I think it will be most faster:

    vals = np.arange(self.n+1)
    
    # array of probability of each value in vals
    parr = self.prob(vals)     
    
    return np.dot(vals,parr)
    

    and function:

    def prob(list_of_x):
    
        """Computes probability of removing x items
    
        :param list_of_x: numbers of items to remove
        :returns: probability of removing x items
        """
    
        # p is the probability of removing an item
        # sl.choose computes n choose x
        return np.asarray([sl.choose(self.n, e) for e in list_of_x]) * (self.p ** list_of_x) * \
               (1-self.p)**(self.n - list_of_x)
    

    Because numpy is faster:

    import timeit
    
    import numpy as np
    
    list_a = [1, 2, 3] * 1000
    list_b = [4, 5, 6] * 1000
    
    np_list_a = np.asarray(list_a)
    np_list_b = np.asarray(list_b)
    
    print(timeit.timeit('[a * b for a, b in zip(list_a, list_b)]', 'from __main__ import list_a, list_b', number=1000))
    print(timeit.timeit('np_list_a * np_list_b', 'from __main__ import np_list_a, np_list_b', number=1000))
    

    Result:

    0.19378583212707723
    0.004333830584755033
    

Leave a reply to - How to optimize data generation for numpy call

◀ Go back