Skip to content Skip to sidebar Skip to footer

Numpy: Check If 1-d Array Is Sub-array Of Another

given two general numpy 1-d arrays (no guarantees about values whatsoever), I need to check if one is a sub-array of the other. It's short and easy to do by casting to strings, bu

Solution 1:

EDIT: You can also make things a lot faster (like 1000x) with less memory using Numba:

import numpy as np
import numba as nb

defis_sub_arr_np(a1, a2):
    l1, = a1.shape
    s1, = a1.strides
    l2, = a2.shape
    a1_win = np.lib.stride_tricks.as_strided(a1, (l1 - l2 + 1, l2), (s1, s1))
    return np.any(np.all(a1_win == a2, axis=1))

@nb.jit(parallel=True)defis_sub_arr_nb(a1, a2):
    for i in nb.prange(len(a1) - len(a2) + 1):
        for j inrange(len(a2)):
            if a1[i + j] != a2[j]:
                breakelse:
            returnTruereturnFalse# Test
np.random.seed(0)
arr1 = np.random.randint(100, size=100_000)
arr2 = np.random.randint(100, size=1_000)
print(is_sub_arr_np(arr1, arr2))
# Falseprint(is_sub_arr_nb(arr1, arr2))
# False# Now enforce a match at the end
arr1[-len(arr2):] = arr2
print(is_sub_arr_np(arr1, arr2))
# Trueprint(is_sub_arr_nb(arr1, arr2))
# True# Timing
%timeit is_sub_arr_np(arr1, arr2)
# 99.4 ms ± 567 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit is_sub_arr_nb(arr1, arr2)
# 124 µs ± 863 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Not sure this is the most efficient answer but it is one possible solution:

import numpy as np

defis_sub_arr(a1, a2):
    l1, = a1.shape
    s1, = a1.strides
    l2, = a2.shape
    a1_win = np.lib.stride_tricks.as_strided(a1, (l1 - l2 + 1, l2), (s1, s1))
    return np.any(np.all(a1_win == a2, axis=1))

arr1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
arr2 = np.array([4, 5, 6])
arr3 = np.array([4, 5, 7])
print(is_sub_arr(arr1, arr2))
# Trueprint(is_sub_arr(arr1, arr3))
# False

Solution 2:

Although there is already an accepted answer, I'd like to throw in my quick and dirty solution:

memoryview(arr2).tobytes() in memoryview(arr1).tobytes()

This of course only works if the arrays use contiguous memory, so the full function is:

defis_sub_arr_mem(a, sub):
    if a.flags['FORC'] and sub.flags['FORC']: 
        returnmemoryview(sub).tobytes() inmemoryview(a).tobytes()
    returnNone

This is way faster than the short and easy string conversion in the OP and also faster than the (non-numba accelerated) stride and the cut solutions for different array sizes:

Original data (n1 = 10, n2 = 4)
str:     0.142 msstride:  0.034 mscut:     0.014 msmem:     0.003 ms

n1 = 1000, n2 = 4
str:     3.072 msstride:  0.100 mscut:     0.017 msmem:     0.008 ms

n1 = 1000, n2 = 500
str:     4.543 msstride:  0.339 mscut:     0.018 msmem:     0.009 ms

(n1 and n2 are the sizes of arr1 and arr2, respectively)

Solution 3:

You can try something like this:

import numpy as np

arr1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 20, 67, -1])
arr2 = np.array([4, 5, 6])
arr3 = np.array([4, 5, 7])


def is_sub_arr(original, sub):
    first_match = np.where(original == sub[0])
    iflen(first_match) == 0:
        print("no matches")
        returnelse:
        formatchin first_match[0]:

            cut_original = original[match:match + len(sub)]
            if np.all(cut_original == sub):
                print("Got a match at index ", match)
            else:
                print("no match")
        return


is_sub_arr(arr1, arr2)
is_sub_arr(arr1, arr3)

First, we check if the first element of the subarray is in the original array and get its index with np.where. Then, for every match, we cut the original array starting at that index and ending at that index plus the length of the sub, and check if this matches the subarray itself.

Post a Comment for "Numpy: Check If 1-d Array Is Sub-array Of Another"