In a sorted array, I am trying to find pairs that sum up to a certain value. I was wondering if anyone could help me improve my code in performance or memory. I believe my code is O(n). If there are other method for finding pairs the sum up to a certain value that is much more efficient, I am open to these methods!

The i++; produces the wrong result. I'd recommend you remove it from your question, since the code under review out to run and produce correct results. Why is it there?
–
Wayne ConradJan 10 '14 at 13:22

3 Answers
3

There are two solutions which come to mind for your problem, but first, let's look through your code:

you have the variables pointer1 and pointer2. These are very descriptive names, which is great, but unfortunately they are not pointers. They are the values at the i and k positions in the array. In other words, i and k are the actual pointers. I would call pointer1 and pointer2 something like val1 and val2. But, in fact, I would drop the pointers entirely, and just use an index in to the data (why copy the data when you don't need to....? )

In most languages the variable name i is 'reserved' for indexed looping through data, just like you have for (int i = 0; i < len; i++) {...}. It is standard to use j and k as the variable names for any inner/nested loops you need. i is the top level, j is the first nested level, k is the subsequent level, and, if you need more than that, you should be calling a function/method (Never use l as a variable name, it is too easy to confuse with 1). You are breaking this convention by using j as an array, and going immediately to k inside your loop. My suggestion is to rename j to be data, and to rename k to be j.

as a general rule, when you have a loop constrained by an index, like you have for (int i = 0; i < len; i++) { ..., you should never modify the index inside the loop. I cannot think of a reason why it should ever be necessary, and, certainly, it should never be conditionally modified (inside an if-statement inside another loop). This will lead to bugs. In fact, the simple fact that you have that inner i++ tells me you do not understand your own code's behaviour.

Your method should return the interface type List<String> instead of the concrete class ArrayList<String> since there is no reason for outsiders to know the physical implementation.

storage is not a great name for a variable.... everything could be called storage since all variables store things... I would choose something like: results.

Your inner if condition makes sure that the sums are a match, but also checks pointer1 != pointer2. This second check is a problem, because it should not be testing the values, but the indexes.... For example what if the data is [4,4] and the target sum is 8?

To get your result pair as a String you do: String pair = Integer.toString(pointer1) + " and " + Integer.toString(pointer2).... this is serious overkill, try this instead: String pair = pointer1 + " and " + pointer2. In Java the + String operator will implicitly convert the integers to a String... no need to do it yourself.

So, you have some style problems, and some concerns about your inner workings.

Now, let's talk about your algorithm..... It loops through each value. For each value, it then compares it against every other value (except itself). So, you loop n times, and, for each n you do m checks. Your complexity is O(n x m), but, in reality, n and m are the same, so your complexity is O(n2)

Now, about that i++ inside the k loop.... the reason you need it is because your k loop is starting from the wrong place. your algorithm is wrong. Let me explain.

If you have the data [ 0, 1, 2, 3, 4, 5 ] , you are using your i index to loop through all the data. Your first data will be 0. You sum this with 5 (Note, you have summed 0 + 5 now), and check the result, then with 4, and so on. When you are done, you move i to the value 1, and you keep moving i until you get to the value 5 for i. You then compare **this value 5 to all the other values, including 0. This is *the second time you sum the items 0 + 5 (but it is now 5 + 0) *.

You need to change the algorithm to only scan those values that have not yet been scanned. The easiest way to do this is to modify your inner/nested loop. Consider your code:

(note, I have renamed k to be j). We start our j index at i + 1, because we don't need to sum values before i since that sum was done already when i was smaller.....

Now, our loops only calculate the sum value once for each pair of values.

Unfortunately, our complexity is still O(n2) because even though our inner loop is only testing half (on average) the values, we have not actually changed the way the algorithm scales (if you double the input data, you with quadruple the execution time).

Still, even with this algorithm change, and some variable renaming, and some bug fixing, your code will look like:

Much more comprehensive than my answer. I hope the OP gives this one the checkmark.
–
Wayne ConradJan 10 '14 at 14:08

Yes thank you so much for taking the time to write this ! I really appreciate it! I learn a lot!! @WayneConrad thank you for your answer as well!
–
LiondancerJan 10 '14 at 14:10

Thanks! I was 2-thirds through the answer when I saw your's pop up .... and I figured I covered things you had not, so I posted.... That's why there's a lot of duplication between our answers, sorry ;-)
–
rolfl♦Jan 10 '14 at 14:10

@rolfi, One of the things I like about codereview is that the fastest gun doesn't always win. I've been out of the Java game for a long time, so I'm glad someone who knows it well was able to cover what I missed.
–
Wayne ConradJan 10 '14 at 14:16

A few things I saw, the results declaration should be List<String>, int more can be directly set and there's a stray space at more --.
–
BobbyJan 10 '14 at 14:30

Code improvements

Before addressing efficiency, let's see if there are any improvements that could be made to the code.

I suspect some of the names could be improved.

pointer1 -> value1

pointer2 -> value2

j -> values

storage -> value_pairs

pairSum -> pairsThatAddUpToASum

I would consider getting rid of this temporary:

int len = j.length;

as it's not carrying its weight. Using j.length inline is not burdensome, and the compiler/runtime are likely to produce adequately fast code without it.

Lastly, I wonder about the decision to return the pairs as strings. This depends upon how the function is being used, but I would often rather return the pairs of numbers, and let the caller worry about formatting. This decouples the function from I/O.

A bug?

This term in the if:

pointer1 != pointer2

Might be a bug. If the list is guaranteed to contain no duplicates, it will work fine (but might be more communicative as i != k). If, however, the list may contain duplicates, and it is the intent of the code to return the resulting duplicate pairs, then this must be i != k.

Efficiency.

In order to be O(n), the algorithm must visit each element a constant number of times. We typically think of a O(n) algorithm as one that visits each element once. In this algorithm, both the outer and inner loop visit each element. That makes this an O(n^2) algorithm.

The algorithm below loops through the list only once. It keeps track of the numbers it's seen using a HashSet. For each number, it works out its required complement. If it's seen the complement already then it's found a match. If not, it adds the number to the set - we may see its complement later.

This actually works for unordered as well as ordered data, with the same performance characteristics. It also works for non-unique values (though we get n-1 repeated pairs of x for n repeated values of x - possibly not what you want, but a final stage could be to remove the duplicates if you so wished).

Note that the order of the pairs returned, and indeed the order of two numbers in a pair, may not be the same as your original algorithm. It wasn't specified if this was important, but I'm guessing not.

This reduced complexity comes at the expense of memory, as it must keep track of the numbers its seen in a separate data structure.

One thing I would point out is that the algorithm is more flexible if it returns raw data, rather than formatted strings. Pretty-printing can be done by the caller.

I've done this in C# because that's what I'm familiar with. I guess it should be trivial to convert to Java.