I have a list of sorted numpy arrays. What is the most efficient way to compute the sorted intersection of these arrays?
In my application, I expect the number of arrays to be less than 10^4, I expect the individual arrays to be of length less than 10^7, and I expect the length of the intersection to be close to p*N, where N is the length of the largest array and where 0.99 < p <= 1.0. The arrays are loaded from disk and can be loaded in batches if they won't all fit in memory at once.
A quick and dirty approach is to repeatedly invoke
numpy.intersect1d(). That seems inefficient though as
intersect1d() does not take advantage of the fact that the arrays are sorted.