Numba 3x Slower Than Numpy
Solution 1:
Try moving the call to np.bitwise_and
outside of the loop since numba can't do anything to speed it up:
@jit(nopython=True)defnumba_get_pos_neg_bitwise(df, mask):
posneg = np.zeros((df.shape[0], 2))
vandmask = np.bitwise_and(df[:, 1:], mask)
for idx inrange(df.shape[0]):
# numba fail with # if np.all(vandmask == mask):
vandm_equal_m = 1for i, val inenumerate(vandmask[idx]):
if val != mask[i]:
vandm_equal_m = 0breakif vandm_equal_m == 1:
if df[idx, 0] == 1:
posneg[idx, 0] = 1else:
posneg[idx, 1] = 1
pos = np.nonzero(posneg[:, 0])[0]
neg = np.nonzero(posneg[:, 1])[0]
return (pos, neg)
Then I get timings of:
==> pos, neg made; p=3920, n=4023 in [0.02352 s] numpy
==> pos, neg made; p=3920, n=4023 in [0.2896 s] numba
==> pos, neg made; p=3920, n=4023 in [0.01539 s] numba
So now numba is a bit faster than numpy.
Also, it didn't make a huge difference, but in your original function you return numpy arrays, while in the numba version you were converting pos
and neg
to lists.
In general though, I would guess that the function calls are dominated by numpy functions, which numba can't speed up, and the numpy version of the code is already using fast vectorization routines.
Update:
You can make it faster by removing the enumerate
call and index directly into the array instead of grabbing a slice. Also splitting pos
and neg
into separate arrays helps to avoid slicing along a non-contiguous axis in memory:
@jit(nopython=True)
def numba_get_pos_neg_bitwise(df, mask):
pos = np.zeros(df.shape[0])
neg = np.zeros(df.shape[0])
vandmask = np.bitwise_and(df[:, 1:], mask)
for idx in range(df.shape[0]):
# numba fail with # if np.all(vandmask == mask):
vandm_equal_m = 1for i in xrange(vandmask.shape[1]):
if vandmask[idx,i] != mask[i]:
vandm_equal_m = 0breakif vandm_equal_m == 1:
if df[idx, 0] == 1:
pos[idx] = 1else:
neg[idx] = 1pos = np.nonzero(pos)[0]
neg = np.nonzero(neg)[0]
returnpos, neg
And timings in an ipython notebook:
%timeit pos1, neg1 = get_pos_neg_bitwise(df, mask)
%timeit pos2, neg2 = numba_get_pos_neg_bitwise(df, mask)
100 loops, best of 3: 18.2 ms per loop
100 loops, best of 3: 7.89 ms per loop
Post a Comment for "Numba 3x Slower Than Numpy"