A Different Kind of da Vinci Code

April 6, 2011

Mathematical sfumato. A traveling salesman tour of 100,000 well-placed dots recreates the smoky atmosphere of the Mona Lisa. Finding the shortest possible tour could earn you a penny per pit stop. Courtesy of Robert Bosch.

Barry A. Cipra

Looking for a quick $1000? All you have to do is edge out a two-year-old attempt at solving an artfully posed instance of the classic traveling salesman problem---that is, find a tour of 100,000 points with total length less than 5,757,191. Be warned, though: The length of the shortest tour is known to be at least 5,757,052, so there's not a lot of room for improvement.

The challenge, along with a history of the TSP and techniques devised to tackle it, were described by William Cook of Georgia Tech in the SIAM Invited Address at the Joint Mathematics Meetings, held in January in New Orleans. Cook, one of the leading experts on the TSP, gave an efficient tour of the mathematical milestones researchers have reached over the last six decades.

Salesmen have pondered the problem of efficient travel for about as long as they've been calling on customers. For example, an 1832 German handbook for der Handlungsreisende---literally, the commercial traveler---dealt explicitly with the question of economical itineraries, including a surprisingly good solution for a 47-city circuit from Frankfurt to Dresden and back (see Figure 1). Mathematicians first took an interest in the problem in the 1930s, initially at Princeton and later at RAND Corporation. Among the theorists investigating the problem was Harold Kuhn, who served as SIAM president in 1954 and 1955---dates that coincide with a turning point for the TSP.

Figure 1. An 1832 handbook for traveling salesmen gave an example of an efficient route with stops in 47 German cities. Courtesy of William Cook.

The earliest documented appearance of the name "traveling salesman problem" is in a 1949 RAND report by Julia Robinson, where it appears in the title, although Robinson's opening sentence makes it clear that the name was already well known: "The purpose of this note is to give a method for solving a problem related to the traveling salesman problem." But the subject really took off in 1954, with the publication of the seminal paper "Solution of a large-scale traveling-salesman problem" by RAND researchers George Dantzig, Ray Fulkerson, and Selmer Johnson.

What counted as large-scale in 1954 is indicated in the paper's one-sentence abstract: "It is shown that a certain tour of 49 cities, one in each of the 48 states and Washington, DC, has the shortest road distance." (The choice of cities was partly pragmatic. "The reason for picking this particular set," the authors explained, "was that most of the road distances between them were easy to get from an atlas." The paper credited Bernice Brown, a statistician at RAND, for preparing the triangular table of inter-city distances.) In fact, the actual problem solved involved only 42 cities: The most direct route from Washington to Boston ran through 7 of the eastern cities, and the reduced problem, which simply omitted those stops, included the Washington–Boston link as part of its shortest tour; putting the omitted stops back in (with the triangle inequality lurking tacitly in the background) gave those cities for free.

The seminality of the 1954 paper lay in the approach Dantzig, Fulkerson, and Johnson took to proving that their tour could not be shortened. Their approach, now known as the "cutting-plane method," is a wrinkle on top of LP-relaxation. It's entirely straightforward to express the traveling salesman problem in terms of integer programming: All you're trying to do is minimize the inner product cx, where c is the vector of "costs" of inter-city travel and x is a 0–1 vector that specifies which roads are taken, subject to linear equalities that define a tour. The relaxation comes in allowing the entries of x to be anything between 0 and 1, not just the integer extremes. Solving the relaxed linear programming problem is a comparative breeze (although in 1954 Dantzig's famous simplex method itself was relatively novel, and the RAND trio used shortcuts tailored to their 42-city problem).

The downside of LP-relaxation is that it gives nonsense answers---"take half the road between Kansas City and Omaha," "two-sevenths of the road from Santa Fe to Phoenix," and so forth. At best, it seems, the relaxed solution offers only a lower bound on the length of an actual tour. But that's where cutting planes come in. The relaxed solution is a vertex of the polytope defined by the LP constraints, which contains all (n–1)!/2 tours (for an n-city TSP). If that vertex isn't a tour itself, then there is some linear inequality---a cutting plane---that separates it from the legitimate tours. If you can identify an appropriate cutting plane, you can toss that inequality into the mix, re-solve the relaxed-LP problem, and hope that the new solution improves the lower bound. For example, if an initial relaxed solution gives 123.4 for the length of its "tour" but the solution of the problem augmented by a cutting plane gives 124.3, then your lower bound for an actual tour has gone from 124 to 125. (TSP distances are restricted to integers.) If you're lucky, you already have a candidate tour of length 125, at which point your work is done.

Luck, of course, seems to be an unavoidable aspect: The traveling salesman problem is known to be NP-complete, meaning that no amount of computational cleverness is likely to streamline the search for useful cutting planes. You could easily be unlucky and "improve" your lower bound from 123.4 to 123.9, then to 123.99, and so on, ad nauseam. But as Louis Pasteur might have put it, luck favors the prepared computer. Just because a problem is NP-complete doesn't mean that it's not worth developing algorithms for solving it. In fact, researchers have done just that---with considerable success.

The largest traveling salesman problem solved to date involves a whopping 85,900 "cities"---actually, points on a computer chip that need to be vaporized by a laser. It was solved in 2006 by Cook and colleagues David Applegate, Robert Bixby, and Vašek Chvátal, using the TSP solver Concorde, which they developed in the 1990s. The 85,900-city problem was the last and largest of 110 "benchmark" problems published in 1991 as TSPLIB by Gerhard Reinelt of the University of Heidelberg.

Most of TSPLIB's benchmarks are based on cities scattered across a map or on a pattern of holes to be drilled on a printed circuit board. The new $1000 problem has a distinctly different provenance: the Mona Lisa. The problem image is an example of the "Opt Art" created by optimization theorist Robert Bosch of Oberlin College.

Bosch's idea, which he developed with Oberlin undergraduate Adrianne Herman and later refined with Craig Kaplan, a computer scientist at the University of Waterloo, starts with a recognizable image, such as Leonardo's Mona Lisa. A grayscale version of the image is then "pointillized," using an algorithm that plots random points with density determined by the local grayscale values; finally, Concorde is run to find an optimal---or at least near-optimal---tour of the points. Two optimizations are actually done: The obvious one is to solve the TSP on the given set of points; the other is to find a set of points that will produce a nice-looking picture.

With their original algorithm, which used a simple, grid-based method to place the points, Bosch and Herman found that their TSP results had lots of short, jagged edges---when you plot points at random, you tend to wind up with unintentional clumps of closely spaced points. Kaplan suggested a refinement called "weighted Voronoi stippling," invented in 2002 by Adrian Secord, then a graduate student at the University of British Columbia. The weighted stippling starts with an initial scattering of points (generically called a stippling), computes their Voronoi diagram---that is, assigns to each point of the stippling the polygon of all points in the plane that are closer to it than to any other point of the stippling---and then moves each stippling point to the center of mass of its Voronoi polygon (using the original image to assign a non-uniform mass density on the plane). This has the effect, in general, of pushing points apart if they start out too close together, and the procedure can be iterated.

Concorde went to work on Bosch's 100,000-city Mona Lisa in February 2009. It quickly found a lower bound of 5,756,619. A day later, a TSP solver called LKH, developed by Keld Helsgaun of Roskilde University, in Denmark, found a tour of length 5,758,831. Reports of shorter and shorter tours poured in over the next month, culminating in the current record of 5,757,191, set on March 17, 2009, by Yuichi Nagata, a computer scientist at the Tokyo Institute of Technology, using a genetic algorithm of his own design. Nagata's tour may or may not be beatable.

The only progress since then has been in the lower bound. The current lower bound of 5,757,052 was found a week after Cook's lecture in New Orleans. The Concorde computation used a refinement of the cutting-plane method called "branch and cut," which grows a search tree of subproblems, each producing an LP bound. As a corollary calculation, Concorde took the cutting planes generated during the branch-and-cut computation---there were more than 7 million of them---and recalculated the LP-relaxation result, improving that bound to 5,757,038.1. (A similar calculation in November 2009 produced an LP-relaxation bound that actually beat what the then-current branch-and-cut computation had found.)

These improvements, and those from a previous round from a year ago, are small compared to the remaining gap, which suggests that new techniques will be needed to polish off the problem. Addressing the large audience that turned out for his talk in New Orleans, Cook wondered whether there might be a way to crowd-source the computation of cutting planes.

Barry A. Cipra is a mathematician and writer based in Northfield, Minnesota.