Several optimization problems that are known to be NP-hard on general graphs are trivially solvable in polynomial time (some even in linear time) when the input graph is a tree. Examples include minimum vertex cover, maximum independent set, subgraph isomorphism. Name some natural optimization problems that remain NP-hard on trees.

$\begingroup$I'm also inclined to think that CW is not necessary$\endgroup$
– Suresh VenkatSep 12 '10 at 23:19

2

$\begingroup$Not sure if CW is needed. I can't think of any problem off the top of my head. It seems like posters should be rewarded for answering this question.$\endgroup$
– Robin KothariSep 12 '10 at 23:19

$\begingroup$This is not what you asked for, but is worth to mention here: there are some problems that are easy on trees but hard on bounded treewidth. For example, edge-disjoint paths (Nishizeki, Vygen, Zhou '01) and constraint matrix span (McDiarmid, Reed '03).$\endgroup$
– Diego de EstradaJun 25 '11 at 4:52

(These are formulated as tree problems, but you can generalise them to arbitrary graphs. Then the above formulations are obtained as the special case when you restrict your input to trees.)

A more general recipe for generating problems that are hard on trees: Take any NP-hard problem related to supersequences, superstrings, substrings, etc. Then re-interpret a string as a labelled path graph. Then pose the analogous question for general graphs (subsequence ≈ graph minor, substring ≈ subgraph). And we know that the problem is NP-hard even on trees (and on paths).

There are also many problems that are hard on weighted stars, by reduction from the subset-sum problem. A natural example is:

TSP with two travellers: given an edge-weighted graph $G$ and a limit $W$, can we find two closed walks $C_1$ and $C_2$ in $G$ such that each walk has total weight at most $W$, and each node of $G$ is covered by at least one walk?

Group Steiner problem is a nice example. The input to this problem is an undirected edge-weighted graph $G=(V,E)$ and k groups of vertices $S_1, S_2, \ldots, S_k$. The goal is to find a minimum weight tree that contains at least one vertex from each group. It is easy to see that the Set Cover problem is a special case even when G is a star. Thus the problem is hard to approximate to within a $O(\log n)$ factor unless P=NP. Moreover, it was shown by Halperin and Krauthgamer that the problem is hard to approximate to within an $O(\log^{2-\epsilon} n)$ factor for any fixed $\epsilon > 0$ unless NP has randomized quasi-polynomial time algorithms (see the paper for a precise statement). There is an $O(\log^2 n)$ approximation on trees by Garg, Konjevod and Ravi.

The firefighter problem has received a fair amount of attention recently, and is (somewhat surprisingly) NP-hard on trees of maximum degree 3. It is actually a fairly natural question, described as follows:

A fire breaks out at the root of the tree (or more generally, a specified vertex in a graph). At every step, the firefighter protects one non-burning vertex, after which time the fire spreads to every unprotected neighbour. The process ends when there is no unprotected vertex next to the fire. Is there a strategy for the firefighter in which at most $k$ vertices burn?

The unweighted edge multicut problem is the following: Given an undirected graph $G$, a collection of pairs of vertices of $G$, and a positive integer $k$, find if there is a subset $S$ of at most $k$ edges in $G$ whose removal disconnects every pair of vertices in the collection.

A problem that one might think would NOT be hard on trees, but is, is the freeze-tag problem in computational geometry: briefly, the problem of scheduling wakeups for robots starting with a single awake 'bot, where makespan is the cost measure.

It's known to be NP-hard on weighted star graphs. However, it's open whether the problem is NP-hard in the plane. One might argue that the NP-hardness comes not from 'tree-ness', but from 'arbitrary metric'-ness, but star graphs only give you a limited space of metrics..

Given a tree $T$, a partition of $V(T)$ in $k$ levels $\phi: V(T)\to \{1,\ldots, k\}$
(i.e., edges of $T$ connect vertices of neighbouring levels $i$ and $i+1$), and an
integer $K$. Can you permute the vertices inside the levels such that the crossing number
is at most $K$?

Let $r$ and $s$ be fixed positive integers, and let $G$ be a graph whose vertex set is
partitioned into blocks (or empires) each containing exactly $r$ vertices. The $(s, r)$-colouring
problem $s$-$\text{COL}_r$ asks for a colouring of the vertices of the graph $G$ that
uses at most $s$ colours, never assigns the same colour to adjacent vertices in different
empires and, conversely, assigns the same colour to all vertices in the same empire,
disregarding adjacencies.

McGrae and Zito, Empires make cartography hard: The complexity of the empire colouring problem,
LNCS 6986 (2011) 179–190,
show that, for trees, $s$-$\text{COL}_r$ is NP-hard for $s \in \{3,\ldots, 2r − 1\}$
(and solvable in polynomial time for other positive values of $s$.)

Maximum leaf-labeled isomorphic subtree.
The input is a set $TS$ of leaf-labeled tree (internal nodes are not labeled).
A solution is any tree $T$ such that for each tree $T_1\in TS$ then $T$ is isomorphic to a subtree of $T_1$.
The optimal solution is the one that maximizes the number of leaves of $T$.

The problem is NP-hard (actually, it is hard to approximate) only when all input trees have unbounded degree.

A harmonious coloring of a simple graph is a proper vertex coloring such that each pair of colors appears together on at most one edge. The Harmonious Chromatic Number of a graph is least number of colors in a harmonious coloring of the graph. This problem of finding Harmonious Chromatic Number was shown to be NP-complete on trees by Edwards and McDiarmid. In fact, they also show that the problem remains NP-complete for trees of radius 3.

The Travelling Repairman Problem (TRP) is known to be NP-hard on weighted trees. In this problem, which is also sometimes called the Minimum Latency Problem, the goal is to find a tour that visits all the vertices of a graph while minimizing the average latency. The latency of a vertex $u$ is the cost of the tour from the origin until the tour visits $u$.

Note that in the related (and more famous) TSP problem, the goal is to minimize the maximum, rather than the average latency. I think the TRP is generally considered a more complicated problem (in fact TSP is in P for tree metrics).

NP-hardness on trees was shown in R.A. Sitters "The Minimum Latency Problem Is NP-Hard for Weighted Trees", ISCO 2002.

There's a (very general) problem I had a look at as part of a project: a variant of this problem remains NP-hard even on graphs with two vertices and a single edge, and a different variant is NP-hard on trees. Since the NP-hardness of the first variant obviously doesn't stem from the shape of the graph, the second is probably more interesting.

The problem is the distribution of files on and the routing of downloads in a network (for instance, the intranet of a company). Let $S$ be a set of servers and $C$ be a set of clients. Let $G=(V,E)$ be a graph such that $S \subset V$, $C \subset V$ and $S \cap C = \emptyset$ (so servers and clients are vertices, but there can be vertices that are not servers or clients, think of them as routers). Every $s \in S$ has a capacity (think the size of the harddrives), denoted as $|s|$. Let $F$ be a set of files, where every $f \in F$ has a filesize, denoted as $|f|$. Every $e \in E$ has a throughput, denoted as $t_e$. We also have a set of requests $R \subseteq C \times F$, where each $(c,f) \in R$ means that client $c$ wants to download file $f$.

The problem is to find for each server $s \in S$ a set of files $A_s$ such that $\sum_{f \in A_s} |f| \leq |s|$ (so to add files to servers without the total filesize exceeding the capacity of the server), and to find a path $P_r$ in $G$ for every request $r = (c,f) \in R$ such that the path starts at $c$ and ends at a server $s$ with $f \in A_s$, such that if for some edge $e$, $D_e$ denotes the set of requests such that for all $r = (c,f) \in D_e$, $P_r$ contains $e$, then $\sum_{(c,f) \in D_e}|f| \leq t_e$ (so to route all requests on the graph such that no edge capacity is exceeded).

If you don't require all downloads to be routed but instead try to maximize the sum of the filesizes of the downloads that are routed you can easily reduce subset-sum to this problem: you have a single server with vast amounts of space, a single client connected to the server with an edge with a capacity equal to the target value of the subset-sum instance and for every integer in the subset-sum instance you create a file with equal size; the client then wishes to download all these files.

A (much?) more interesting variant for this question is the case that you try to minimize the number of edges whose capacity is exceeded - perhaps the network we are working on models the transatlantic internet cables and replacing a cable is so costly that the difference in cost of upgrade to a factor two faster and an upgrade to a factor three faster is negligible. We also say that the placements of files on the servers are already given and cannot be modified, so we look solely at the routing issues.

The set-cover problem can easily be reduced to this variant. We are given a set $U$ called the universe and several subsets $S \subseteq P(U)$ of this universe. We are asked to pick the smallest amount of subsets such that their union equals the universe. For every $u \in U$ we create a file of size 1. We have a single client that wants to download all these files.

For every subset $s \in S$, we create a 'cluster' of servers: a cluster consists of a single vertex (a router) connected to a number of servers such that the servers are only connected to the router. For every $u \in s$ we add a single server to the cluster with the file on it corresponding to $u$. These clusters are then connected to the client with an edge of capacity 1 (so each edge connects the client with the router for the cluster). Furthermore, for every server cluster we add one more server to that cluster hosting a single new file (unique to that cluster) of size 1. All these files (so, in addition to the files corresponding to elements of the universe) are requested by the client.

The idea is that the client needs the files that are unique for all the server clusters, so the edges connecting the client to the server clusters are already at the limit of their capacities (their capacities are 1, the files have size 1). If the client downloads any elements of the universe from any cluster, the edge connecting to that cluster becomes overloaded. Since we only require to minimize the number of overloads (and not by how much we exceed the capacities), the client can download the rest of the elements of the universe hosted at that server cluster (so the rest of the elements of the corresponding subset) without penalty. This therefore corresponds to the subset being chosen. The client wants to download all the files in the universe once, so the universe will indeed be covered, and to minimize the number of edges that are overloaded we need to minimize the number of subsets chosen.

Note that the above construction yields a tree graph, so it's an example of an NP-hard problem on trees.

A graph $G(V, E)$ is 2-splittable if it is possible to partition its edge set into two subsets such that the induced subgraphs are isomorphic. Deciding whether a given graph is 2-splittable is $NP$-complete even if input is restricted to trees.

Formally, the problem is:

PARTITIONED GRAPH ISOMORPHISM

INSTANCE: A tree $T = (V,E)$

QUESTION: Is there a partition $\{E1,E2 \}$ of $E$ such that the two forests $T1 = (V,E1)$ and $T2 = (V,E2)$ are isomorphic?

The NP-completeness column cites the unpublished manuscript of Graham and Robinson, "Isomorphic factorization IX: even trees".

Somehow i missed the Achromatic Number problem in the last answer, but this is one of the most natural problems i know of, which are NP-complete on trees.

A complete coloring of a graph is a proper coloring such that there is an edge between every pair of color classes. The coloring can be stated in contrast to Harmonious Coloring, as a proper coloring such that each pair of colors appears on at least one edge. Also, it can be stated as a complete (or full) homomorphism to a clique. The Achromatic Number problem is a maximization problem, where we look for largest number of color classes in a complete coloring of the graph.

k-Balanced Partition Problem on graphs, in which one has to partition the $n$ vertices into $k$ connected components of size at most $\lceil\frac{n}{k}\rceil$ each and at the same time minimize the total cost of edges connecting vertices in different sets, called the cut cost.
This problem is actually APX-hard even on unweighted trees of constant maximum degree[1].

$\begingroup$Umm, circuits that are trees have a name: formulas. Formula SAT is of course NP-complete, as 3-SAT or even full CNF-SAT are its special cases.$\endgroup$
– Emil JeřábekMar 17 '16 at 15:10

1

$\begingroup$How so? All formulas are trees. If you want to restrict multiple occurrences of variables, that’s an additional constraint. (I also assume that when you write “inputs”, you really mean “literals”, as Circuit SAT with only AND, OR, and positive literals is trivially polynomial-time to begin with.)$\endgroup$
– Emil JeřábekMar 17 '16 at 15:30

1

$\begingroup$Certainly not under the standard terminology. The formulas $((a+b)+c)+d$ and $((a+b)+c)+a$ have the same underlying tree, only with different labels at one of the nodes. A tree stays a tree no matter how are its leaves relabelled.$\endgroup$
– Emil JeřábekMar 17 '16 at 15:59

1

$\begingroup$Emil is right. A formula like $(p \land q) \lor p$ is a tree. Fan-out does not apply to input gates in circuits. With your definition that a variable cannot appear both positively and negatively therefore the formula is trivially satisfiable.$\endgroup$
– KavehMar 17 '16 at 17:33

1

$\begingroup$It is not a toy problem. This is the standard terminology, when when we say a circuit is a tree it does not mean variables appear only once. In any case and independent of what we call it the problem you proposing is trivial as I wrote.$\endgroup$
– KavehMar 17 '16 at 21:35