Numba corrupts data by affecting in-place

Question

Numba and NumPy don't execute the following foo function in the same way:

from numba import jit
import numpy as np

@jit
def foo(a):
    a[:] = a[::-1] # reverse the array

a = np.array([0, 1, 2])
foo(a)
print(a)

With NumPy (without @jit) it prints [2, 1, 0], while with Numba (with @jit) it prints [2, 1, 2]. It looks like Numba modifies the array in-place, which leads in data corruption. It is easy to work around by making a copy of the array:

a[:] = a[::-1].copy()

But is this the desired behavior? Shouldn't Numba and NumPy give the same result?

I am using Numba v0.26.0 in Python 3.5.2.


Show source
| numpy   | python   | numba   2017-01-07 00:01 2 Answers

Answers ( 2 )

  1. 2017-01-07 01:01

    Your jit has the same sort of in-place problems that this Python loop does.

    In [718]: x=list(range(3))
    In [719]: for i in range(3):
         ...:     x[i] = x[2-i]
    In [720]: x
    Out[720]: [2, 1, 2]
    

    The x[:] = x[::-1] is buffered, not because numpy recognizes that something special is happening, but because it always uses some sort of buffering with doing assignments.

    The Python interpreter translates [] notation into calls to __setitem__ and __getitem__. So 681 and 682 do the same thing:

    In [680]: x=np.arange(3)
    In [681]: x[:] = x[::-1]
    In [682]: x.__setitem__(slice(None), x.__getitem__(slice(None,None,-1)))
    In [683]: x
    Out[683]: array([0, 1, 2])
    

    That means that x[::-1] is evaluated in full - to a temporary array, before being copied to x[:]. Now x[::-1] is a view, not a copy, so the setitem step must do some sort of buffered copy.

    Another way to do this copy is with

    np.copyto(x, x[::-1])
    

    Checking the x.__array_interface__ I see that the data buffer address remains the same. So it is doing a copy, not just changing the data buffer address. But it's in low level compiled code.

    Usually buffering is just a implementation issue that users don't need to worry about. ufunc.at is designed to deal with cases where that buffering creates problems. This topic comes up periodically; search for add.at.

    =============

    Note that Python lists behave the same way. The translation to 'get/setitem' is the same.

    In [699]: x=list(range(3))
    In [700]: x[:] = x[::-1]
    In [701]: x
    Out[701]: [2, 1, 0]
    

    ======================

    I'm not entirely sure this is relevant or not, but since I tested these ideas I'll document them. https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html suggests using np.nditer as a stepping stone for implementing iterative tasks in cython.

    A first stab at using nditer is:

    In [769]: x=np.arange(5)
    In [770]: it = np.nditer((x,x[::-1]), op_flags=[['readwrite'], ['readonly']])
    In [771]: for i,j in it:
         ...:     print(i,j)
         ...:     i[...] = j
         ...:     
    0 4
    1 3
    2 2
    3 3
    4 4
    In [772]: x
    Out[772]: array([4, 3, 2, 3, 4])
    

    This produces the same sort of overlapping result as numba.

    Adding a copy makes for a clean reversal.

    it = np.nditer((x,x[::-1].copy()), op_flags=[['readwrite'], ['readonly']])
    

    If I add the external_loop flag I also get a clean reversal:

    In [781]: x=np.arange(5)
    In [782]: it = np.nditer((x,x[::-1]), op_flags=[['readwrite'], ['readonly']], fl
         ...: ags = ['external_loop'])
    In [783]: for i,j in it:
         ...:     print(i,j)
         ...:     i[...] = j
         ...:     
    [0 1 2 3 4] [4 3 2 1 0]
    In [784]: x
    Out[784]: array([4, 3, 2, 1, 0])
    
  2. 2017-01-09 18:01

    This is a known issue (https://github.com/numba/numba/issues/1960) and it was fixed in numba 0.27. Following NumPy behavior, the fix detects for overlap and makes temporary copies to avoid corrupting the data.

◀ Go back