Just because a problem is NP-complete doesn't mean it can't be usually solved quickly.

The best example of this is probably the traveling salesman problem, for which extraordinarily large instances have been optimally solved using advanced heuristics, for instance sophisticated variations of branch-and-bound. The size of problems that can be solved exactly by these heuristics is mind-blowing, in comparison to the size one would naively predict from the fact that the problem is NP. For instance, a tour of all 25000 cities in Sweden has been solved, as has a VLSI of 85900 points (see here for info on both).

Now I have a few questions:

1) There special cases of reasonably small size where these heuristics either cannot find the optimal tour at all, or where they are extremely slow to do so?

2) In the average case (of uniformly distributed points, let's say), is it known whether the time to find the optimal tour using these heuristics is asymptotically exponential n, despite success in solving surprisingly large cases? Or is it asymptotically polynomial, or is such an analysis too difficult to perform?

3) Is it correct to say that the existence of an average-case polynomial, worst-case exponential time algorithm to solve NP problems has no importance for P=NP?

4) What can be said about the structure of problems that allow suprisingly large cases to be solved exactly through heuristic methods versus ones that don't?

6 Answers
6

This phenomenon extends beyond the traveling salesman problem, and even beyond NP, for there are even some undecidable problems with the feature that most instances can be solved very quickly.

There is an emerging subfield of complexity theory called generic-case complexity, which is concerned with decision problems in the generic case, the problem of solving most or nearly all instances of a given problem. This contrasts with the situtation in the classical complexity theory, where it is in effect the worst-case complexity that drives many complexity classifications. (And even for approximate solutions in NP-hard problems, the worst-case phenomenon is still present.)

Particularly interesting is the black-hole phenomenon, the phenomenon by which the difficulty of an infeasible or even undecidable problem is concentrated in a very tiny region, outside of which it is easy. (Here, tiny means tiny with respect to some natural measure, such as asymptotic density.) For example, many of the classical decision problems from combinatorial group theory, such as the word problem and the conjugacy problem are linear time solvable in the generic case. This phenomenon provides a negative answer to analogue of your question 1 for these problems. The fact that the problems are easily solved outside the black hole provides a negative answer to the analogue of question 2. And I think that the fact that these problems are actually undecidable as total problems suggests that this manner of solving almost all cases of a problem will not help us with P vs. NP, in your question 3.

For question 4, let me mention that an extreme version of the black-hole phenomenon is provided even by the classical halting problem. Of course, this is the most famous of undecidable problems. Nevertheless, Alexei Miasnikov and I proved that for one of the standard Turing machine models with a one-way infinite tape, there is an algorithm that solves the halting problem on a set of asymptotic measure one. That is, there is a set A of Turing machine programs, such that (1) almost every program is in A, in the sense of asymptotic density, (2) A is linear time decidable, and (3) the halting problem is linear time decidable for programs in A. This result appears in (J. D. Hamkins and A. Miasnikov, The halting problem is decidable on a set of asymptotic probability one, Notre Dame J. Formal Logic 47, 2006. http://arxiv.org/abs/math/0504351). Inside the black hole, the complement of A, of course, the problem is intractible. The proof, unfortunately, does not fully generalize to all the other implementations of Turing machines, since for other models one finds a black hole of some measure intermediate between 0 and 1, rather than measure 0.

Hmm, it's not an NP-complete problem, but hopefully it's still relevant to (4) and to a question I think is implicit in (2).

It's well-known that linear programming is in P, but in practice the simplex algorithm (which is exponential in the worst case) is usually the fastest method to solve LP problems, and it's virtually always competitive with the polynomial-time interior point methods. However, sampling uniformly from some space of problems is an unrealistic model, so average-case analysis isn't convincing. Spielman and Tang introduced the notion of "smoothed analysis" to remedy this, and showed that the simplex algorithm has polynomial smoothed time complexity. Glancing at Spielman's page, it looks like this has been applied to the knapsack problem, although the link to the paper is broken.

Re (1): What do you mean by "small?" :) I suspect that the heuristics would fail to help much if you took a random instance of, say, 3SAT with the right number of clauses and variables, and reduced this to an instance of TSP. But you'd get some polynomial-size blowup, so...

Re (3): It's correct that the existence of such an algorithm, on its own, would imply neither P = NP nor P != NP. But for "practical considerations" it might be hugely important, depending on what the constants were, and it would certainly spur investigation into whether there was a worst-case polynomial algorithm along the same lines.

ETA: Actually, here's a construction for an NP-complete problem and an algorithm which unconditionally runs in average-case polynomial time. The problem is the union of a language in P and an NP-complete language (solvable in exp(n) time), such that the number of instances of size n of the first problem is something like exp(n^3), while the number of instances of the second problem is exp(n).

So the interesting thing about (3) is what the existence of an average-case polynomial algorithm for every problem in NP would tell us. And there the answer is still "nothing," but it's conceivable that we could prove P = NP under this assumption.

I don't know what makes you think that the simplex method is usually the fastest. Modern interior-point solvers beat the simplex method quite handily. What is right, though, is that some variations of the simplex method are very easy to restart after adding an extra constraint, while interior-point methods cannot be restarted easily.
–
Dima PasechnikMar 3 '12 at 4:49

Another approach to NP-complete problems is via what has come to be called parameterized complexity.

The idea is a problem may be "hard" when expressed in terms of the input size but one can reformulate the problem with a parameter k so that the problem is polynomial in terms of the input size but exponential in terms of the parameter k. In some situations this allows one to find algorithms that are "tractable" for small values of k.

Re: 1), NP-complete problems generally exhibit a phase transition in computational complexity. See R. Monasson, et al., "Determining computational complexity from characteristic `phase transitions'" Nature400, 133 (1999) (available near the bottom of the page at http://www.math.ucla.edu/~percus/Teaching/RIPS/refs.html ) and also the answer to 3). Picking problems on or very near the phase boundaries does the trick. For TSP I would reckon that having lots of very similar pairwise distances between a subset of cities would be the sort of thing that would achieve this.

Re: 2), I punt. See the other references in this answer.

Re: 3), the answer I think would depend on the characteristic nature of the easy and hard problem instances. Susan Coppersmith has looked at this sort of thing in the context of the renormalization group to formulate a physics-flavored conjectural approach to the P/NP question. See http://arxiv.org/abs/cs/0608053

Re: 4), I doubt much can be said at present. For instance there are extremely good SAT solvers out there, capable of tackling (at least) thousands of variables and hundreds of thousands of clauses. See http://www.satcompetition.org/

Your post seems to assume something like $\mathsf{P} \neq \mathsf{NP}$ and that an $\mathsf{NP\text{-}complete}$ problem is difficult to solve at least in theory. As you are probably aware, although widely believed by experts, it is unknown.

But let's assume a stronger form of $\mathsf{NP}\neq\mathsf{P}$, for example ETH. SAT is one of the most widely studied $\mathsf {NP\text{-}complete}$ problems (if not the most), and there are easy reductions from SAT to TSP and vice versa. So I will use SAT to answer your questions, similar things apply to TSP (e.g. if you have a small hard instance of SAT then you can convert it to a small hard instance of TSP).

Usually the instances we face in practice have considerably more structure than an arbitrary instance of the problem. Therefore researchers can design algorithms that exploit these structures. There are industrial SAT solver algorithms that are used to solve huge instances of SAT (with several hundred thousands variables) in practice. On the other hand, we know that all these SAT solvers perform exponentially bad on natural, simple, and small formulas like PHP (the pigeon hole principle).

Similar structures exist in TSP problems, e.g. the graph of map of Sweden that you have mentioned is a planner graph, and many graph problems become much simpler when the input is restricted to planner graphs. So if the instances of TSP that you are interested in practice is a subset of all graphs then it is a different problem than the original TSP and it is possible that the restricted problem is not $\mathsf{NP\text{-}hard}$ anymore. (For some graph problems that become easier in restricted cases see this question.)

Yes, there are small problem instances for which the algorithms used in practice to solve huge instances fail to solve these small problem instances in reasonable time, e.g. PHP for SAT.

It is believed by many that is the case, but we don't have no unconditional exponential time lower-bounds on the running time of (general) algorithms solving these problems.

If I understand correctly, you are asking something like: does $TSP \in$ $\mathsf{AveP}$ imply $\mathsf{NP} = \mathsf{P}$?
As far as I know, this is unknown.

I am not sure what you mean by solving using heuristics methods, but if you mean $\mathsf{HeurP}$, and want to compare it with $\mathsf{P}$, then we should be careful since the problems in these classes are of different types. If you mean the original decision problem plus the uniform distribution, then the Zoo doesn't say much and the answer is probably unknown.