I have a pair of ordered lists. I want to generate a new ordered list (using the same ordering) of length n by applying a binary operator to pairs of elements, one from each list, along with the index of each list's elements.

For example, if I have two lists of real numbers in increasing order (sorted via Less),

By the way, welcome to Mathematica.SE! For some reason I thought you had already joined, early on, but can't find the account now in the user list.
–
Verbeia♦Sep 11 '12 at 10:34

2

Hey Paul, nice to see you here. We met last year at the WRI tech conference (though you probably won't remember that).
–
Sjoerd C. de VriesSep 11 '12 at 10:40

3

I notice that all responders so far have solved the problem, but I wonder about the efficiency aspect. The solutions generally generate all tuples, use the operator and then sort, and pick the first few elements. This is fine for short lists, but with n=10^6 it will be difficult, whereas it should be possible to pick out the first 5 elements without bruteforcing as it is given that the lists are sorted.
–
Sjoerd C. de VriesSep 11 '12 at 11:38

2

@SjoerdC.deVries For a generic binary operator (i.e. non monotonic) how to use the information about the input lists being sorted ?
–
b.gatessucksSep 11 '12 at 13:00

The algorithm

The main idea is that for many common binary operators, the order of the list resulting from applying the operator to a list and a single element of another list can be deduced from the order of the original list and the element in question, often without explicit sorting of the resulting list.

The algorithm itself will be based on lazy merging of lazy lists, rsulting from application of the binary operator in question to a list (say l1) and each of the elements of the other list l2. What we will do is to create a binary search tree,which holds the information nodes of the following form:

{{currentNumber, index1,index2},iterator}

where an iterator is a construct which supplies the next element of a resulting list on demand. What is important is that an iterator encodes the information about the ordering of the new sub-list.

What we will do is to merge the resulting sub-lists lazily, very analogously to what merge function does for 2 lists in the mergesort algorithm, but now we have Length[l2] sub-lists (represented by their iterators). Here is the outline of the general algorithm to get a single element with indices:

Create Length[l2] iterators, each encoding a lazy version of the result of an application of our binary operator to a list l1 and i-th element of l2.

Construct an initial binary search tree by requesting each iterator to provide the corresponding first element and constructing nodes of the form outlined above. The ordering used for the binary seacrh tree construction can be also specified.

If we already picked enough elements, exit.

Pick the node with the smallest (according to our ordering) element, and save it in a temporary variable.

Delete this node from a tree.

Construct the new node by requesting the next element from the iterator of the smallest (deleted) node, if that iterator still has elements, and add this node to the tree.

Return the first part of the deleted node, {currentNumber, index1,index2}

Go to 3

Since insertion and deletion time for a (balanced) binary search tree is O(Log(m)), where m is the number of nodes (which is the current number of iterators, can be at most Length[l2]), the complexity of the search alone will be of the order of n Log[m], where n is the number of elements requested.

This can be very fast for small enough n and large enough lists, but the main question is whether or not we can build the iterators efficiently. In general, this would involve copying and sorting all sub-lists, and therefore should have a complexity Length[l1]Log[Length[l1]]Length[l2]. However, for some common cases (such as Times andPlus`), the resulting ordering can be deduced from the initial ordering and the element of the second list in just constant time, and then we win big both speed-wise and memory-wise.

Mathematica impelentation

Binary search tree

For simplicity, I will use the unbalanced binary search tree implementation, which is a slightly generalized (to allow a custom ordering function) version of an excellent implementation due to Roman Maeder ("Computer Science With Mathematica", chapter 6). Here it is:

Our iterators so constructed are closures, embedding addition as a part of producing an element on demand. Here we encode our knowledge of the resulting ordering (which, in the case of Plus, is the same as the original one).

Here I had to randomize the second sample because the implementation of the binary search tree I use is for unbalanced trees, which would degenerate to a line for an ordered second list. This somewhat violates the initial condition that both lists are ordered, but I only use this for benchmarks, and the resulting list won't change in any case.

We see that the timing linearly depends on the number of elements, but our code is really slow.

Java implementation

Now that we have our slow Mathematica prototype, we can port it to Java in the hopes that this will run (much) faster. To work with this part, you will need the Java reloader (which I modified to recognize interfaces, so if you have the old one you should change to use the recent version).

While what follows is conceptually a direct port of the Mathematica part, there will be significant differences in details. In particular, I will use the Java TreeSet library class, so that I don't need to implement the binary search tree from scratch.

So, run the Java reloader code first.

General interface for customization

The following interface should be implemented for each binary operator and ordering, to customize the algorithm (execute this code):

The methods of this class and the way it works require some explanation. The class takes a customizer and two arrays of doubles in constructor, and creates iterators and the tree also in constructor. It has the method getNext, which is overloaded to have 2 forms. You can either request a single element (with indices), or do that in a bulk. The latter possibility is particularly relevant for uses from Mathematica, to avoid the killing overhead of many individual method calls in JLink.

which is quite fast for lazy lists to my mind, and this time includes the data transfer!(possibly being dominated by it).

Conclusions

I presented the abstract algorithm which promises good overall performance and very good performance for practically important special cases. I have also presented two implementations of it, using Mathematica and Java.

While Mathematica implementation has the right algorithmic behavior, it is very slow. This is one of the cases where the symbolic overhead of full Mathematica evaluator really gets in the way. The data structures involved and the structuring of code which allows to separate customizable parts from general parts (iterators, lazy lists, binary search tree) here did not allow to easily Compile any part of the algorithm, so there are little chances to improve its speed within Mathematica.

Implementation in Java allowed to use powerful built-in data structures and idioms which in Java do not cost such an overhead as in Mathematica. As a result, the Java version is thousands times faster.

In both cases, I could also include the customizations for other common binary operators such as Times, but did not do so simply because this post is already excessively long.

@RonaldMonson Thanks for the upvote. As to your suggestion, I will think about it, but my first impression is that it is hard to do such optimizations generally. One can probably construct more complex scheme of merging, where when we know that we should take further values from the same node / iterator and if the node/ interator remains in the same place in the tree after we extract one or more elements, we can skip repeated tree searches / insertions and take values from the iterator directly. But even now, the algorithm is already quite fast, and it is lazy as well. So, basically, you ...
–
Leonid ShifrinSep 12 '12 at 22:52

@RonaldMonson ... get a single extra element on demand in quazi-constant (multiplied by a log of the length of one of the smallest of the two lists) time for many operators, what can be much better than that? Of course, some time is spent because I use some additional objects to make things flexible. I can imagine a C version with ahead-of-time compilation similarly in spirit to what R.M does in his answer, with yet much better (than the current one of my answer) performance.
–
Leonid ShifrinSep 12 '12 at 22:54

Yes, it perhaps unlikely to be materially different in this particular case but I'm wondering in the general; Mma is great for testing optimizations but is it a case of diminishing efficiency gains as it becomes increasingly difficult to implement refinements in the corresponding Java version?
–
Ronald MonsonSep 12 '12 at 23:06

@RonaldMonson I would think that in a case like this, there is a high chance that efficiency gains can be hidden in the mma version by a huge symbolic overhead. OTOH, since my mma implementation has (hopefully) correct algorithmic behavior, and the overhead is just in the huge constant multiplicative factor, one should be able to comparatively analyze modifications in mma. As to Java, the current algorithm is quite simple (you can see that the algorithm itself is only a dozen lines or so), and it should not be hard to modify that one as well. Java development is harder for a different ...
–
Leonid ShifrinSep 12 '12 at 23:14

The solutions using Outer and Tuples will all blow up for large lists. The following is a compiled function that exploits the ordered nature of the lists and should be very fast and low on memory footprint even on very large lists.

You will have to create a different compiled function for a different binary operator. You can probably even write a wrapper around this to generate compiled code on the fly (assuming your binary operator is compilable).

@J.M. Lol, no particular reason other than it doesn't matter in compiled code and this might be simpler to follow for someone reading procedural code :) Boy, I've gotten a few comments about this already here and in chat... ;)
–
rm -rf♦Sep 12 '12 at 2:12

2

Gives errors in the following case: l1 = Sort@RandomReal[{-100, 100}, 100];l2 = Sort@RandomReal[{-100, 100}, 10]; cf[large1, large2, 100]. Generally, your code seems to assume the monotonous behavior of the total resulting ordering, which is true for Plus, but not true in general, e.g. for Times when lists contain negative numbers (or may be I misunderstood). Very nice code still, I voted for it shortly after you posted.
–
Leonid ShifrinSep 12 '12 at 15:01

1

@LeonidShifrin Ah, in that case you must have used the smaller list as the first argument... This happens when it tries to index the smaller list with a number larger than its length (happens when n is comparable to the length or larger). I'm aware of this and I should've added a few additional checks and a more robust way of getting the bound and doing the ordering. I thought it would be as simple as flipping the order of the lists, but I now see it isn't the case (at least, not yet). I'll get back to this later today and fix some of the issues
–
rm -rf♦Sep 12 '12 at 16:32

In the specific case that l1, l2 are sorted and the operator is monotonic then the wanted first n results must come from using the first n elements of the input lists. In this case the functions can be used more efficiently as :

Interestingly, the 2nd and last number in the output list have a different last decimal than the one in the given example. Apparently MachinePrecision is rather different on both computers or so?
–
Sjoerd C. de VriesSep 11 '12 at 10:48

@SjoerdC.deVries Wow, sharp eye ! I don't know, it would be useful to know how the output was produced.
–
b.gatessucksSep 11 '12 at 10:52

Naturally there will not be a single solution that is the most “elegant, efficient or general” so either one could produce 3 champions on each of these categories or a single solution that has the right trade-offs (in which case perhaps further criteria would be needed) - personally I’d also include “comprehensibility” since I think this is sufficiently distinct from “elegant” to be useful.

None of the provided solutions to date offer a completely general solution since although arbitrary lists and binary operators are specified in the question, the key factor in the complexity and implementation of this problem is the Sort function; in particular, the relationship between the OrderedQ functions used to sort both initial lists (OrderedQ1,OrderedQ2) and the output of the binary operator (OrderedQ3).

If these OrderedQ functions obey a type of transitivity in relation to the operator op (essentially we can use the original sorting in the final sort):

then only the first n elements of both lists need be considered with the resulting function having complexity $O(n^2)$. Clearly in this example T1 is satisfied with Plus and Less so that n up to a thousand should be manageable (with no restriction on list size since elements beyond the nth position can be thrown away)

On the other hand if T1 is not satisfied then a worse-case scenario would involve comparing all pairs from both lists (of size say $m$) before sorting the result and taking the first n elements- a function with complexity $O(m^4)$ and only lists in the hundreds/low thousands could be managed.

In this non-T1 scenario, improvements in the worst-case complexity might come from considering the relationship between the three OrderedQ functions, and the structure/ distribution within and between l1 and l2. Any regularities could probably be tackled in a general way with by recursing down both l1 and l2 and/or using using the problem’s inherent parallellizability.

In summary:

With T1: R.M’s compiled function will be quickest (but not as readable) and all the other answers would get galactic speed-up by migrating the "1;;5" specification inside Sort thereby only considering the first n elements (Artes's solution only sorts the first n of the binary-operator output by using RankedMin but still applies the binary operator to all pairs)
.

Without T1: R.M’s compiled function is no longer applicable (even if the binary operator is compilable) and all the other solutions are easily modifiable by adding the appropriate OrderedQ3 to Sort but with a significant performance hit (in comparison with the aforementioned migration).

I don't really follow what you're trying to say here... The problem with the other answers is not in where the 1;;5 portion lies, but rather that Outer and Tuples will generate gigantic lists that'll quickly blow up memory. If you're saying take only the first 5, then you ought to consider that 5 here is just an example... it might as well be 10^5. Secondly, the compiled solution can be easily modified to use a different ordering although, it must be recompiled and the ordering function must be written such that it is compilable
–
rm -rf♦Sep 12 '12 at 6:46

Granted, the compiled solution comes with a small loss in readability (especially if you're alien to reading procedural code) and for reasonably sized lists, I would probably use Outer myself
–
rm -rf♦Sep 12 '12 at 6:48

My point was that most of the work in Outer and Tuples is not needed if the size of the two lists is >>n since you only need to worry about the first n elements ( a fact that actually only the compiled version seems to have taken into account - provided T1 is accepted). Without T1 however the whole lists needs to be accounted for unlike the compiled version (if I understand its algorithm correctly -yes it is a little alien to me).
–
Ronald MonsonSep 12 '12 at 6:51

Could you explain what you mean by your T1 – maybe I didn't interpret it properly... In the problem presented, OrderedQ1 = OrderedQ2 = OrderedQ3, i.e. the same ordering is used for both lists and n elements of the output are desired following the same ordering. I took his "arbitrary list" to mean any list (albeit sorted) of unknown length, whereas I think you take it to mean arbitrary sorting. Is that a correct interpretation?
–
rm -rf♦Sep 12 '12 at 6:57

Yes. I actually edited out the following in the original answer as I may be being a little pedantic here - "(OK helps to read the question - I think T1 is assumed from "using the same ordering" but then the binary operator is implicitly restricted to have same domain range and all solutions except - R.M' s have much higher complexity than necessary by considering elements beyond the nth position) (No, "using the same ordering" doesn' t necessarily imply T1 so the question was read properly :))" - Is this right, does your compiled version only use the first n values and ignores the rest?
–
Ronald MonsonSep 12 '12 at 7:03

because due to the lists being sorted $a_{n+1}\geq a_i$ and $b_{n+1}\geq b_i$. Hence, there must be at least $n$ results ($f(a_1,b_1), f(a_2,b_1),..,f(a_n,b_1)$ or $f(a_1,b_1), f(a_1,b_2),..,f(a_1,b_n)$) that are smaller than what can obtained using either of the $n+1$th elements.

I realize that this explanation overlaps with that of Ronald, but he sort of sidesteps the proof, by simply asserting that "then only the first n elements of both lists need be considered".

Mathematica is a registered trademark of Wolfram Research, Inc. While the mark is used herein with the limited permission of Wolfram Research, Stack Exchange and this site disclaim all affiliation therewith.