Theory and Algorithms for Partial Order Based Reduction in Planning

You Xu and Yixin Chen
Washington University in St. Louis
Qiang Lu
University of Science and Technology of China
Ruoyun Huang
Washington University in St. Louis

Abstract

Search is a major technique for planning. It amounts
to exploring a state space of planning domains
typically modeled as a directed graph. However, prohibitively
large sizes of the search space make search expensive. Developing better heuristic functions has been the main technique for improving search efficiency. Nevertheless, recent studies have shown that improving heuristics alone has certain fundamental limits on improving search efficiency. Recently, a new direction of research called partial order based reduction (POR) has been proposed as an alternative to improving heuristics. POR has shown promise in speeding up searches.

POR has been extensively studied in model checking research and is a key enabling technique for scalability of model checking systems. Although the POR theory has been extensively studied in model checking, it has never been developed systematically for planning before. In addition, the conditions for POR in the model checking theory are abstract and not directly applicable in planning. Previous works on POR algorithms for planning did not establish the connection between these algorithms and existing theory in model checking.

In this paper, we develop a theory for POR in planning. The new theory we develop connects the stubborn set theory in model checking and POR methods in planning. We show that previous POR algorithms in planning can be explained by the new theory. Based on the new theory, we propose a new, stronger POR algorithm. Experimental results on various planning domains show further search cost reduction using the new algorithm.

1 Introduction

State space search is a fundamental and pervasive approach to artificial intelligence in general and planning in particular. It is among the most successful approaches to planning. A major concern with state space search is that it has a high time and space cost since the state space that needs to be explored is usually very large.

Much research on classical planning has focused on the design of better heuristic functions. For example, new heuristic functions have recently been developed by analyzing the domain transition graphs (DTGs) and causal graphs on top of the SAS+ formalism [Briel
et al. (2007), Helmert and
Röger (2008)]. Despite the success of using domain-independent heuristics for classic planning, heuristic planners still face scalability challenges for large-scale problems. As shown by recent work, search even with almost perfect heuristic guidance may still lead to very high search cost [Helmert and
Röger (2008)]. Therefore, it is important to improve other components of the search algorithm that are orthogonal to the development of heuristics.

Recently, partial order based reduction (POR), a new way to reduce the search cost from an orthogonal perspective, has been studied for classical planning [Chen
et al. (2009), Chen and Yao (2009)]. POR as a method to reduce search space has been extensively studied in model checking with solid theoretical investigation. However, the theoretical properties of POR in planning have still not been fully investigated. There are three key questions.

1) POR algorithms have been extensively studied in model checking. In fact, POR is an enabling technique for modeling checking, which will not be practical without POR due to its high time complexity. Extensive research has been developed for the theory of POR in model checking. What are the relationships between the previous POR methods designed for model checking and existing work for planning? Understanding these relationships can not only help us understand both problems better, but can also potentially lead to better POR algorithms for planning.

2) In essence, all POR based algorithms reduce the search space by restricting certain actions from expanding at each state. Although these POR algorithms all look similar, what are the differences in the quality of reduction that significantly affect search efficiency? We think it is important to investigate the reduction powers of different POR algorithms.

3) Given the fact that there is more than one POR reduction algorithm for planning, are there other, stronger POR algorithms? To answer this question, in essence, we need to find the sufficient and/or necessary conditions for partial-order based pruning. There are sufficient conditions for POR in model checking. Nevertheless, those conditions are abstract and not directly applicable in planning.

The main contribution of this work is to establish the relationship between the POR methods for model checking and those for planning. We leverage on the existing POR theory for model checking and develop a counterpart theory for planning. This new theory allows existing POR algorithms for planning to be explained in a unified framework. Moreover, based on the conditions given by this theory, we develop a new POR algorithm for planning that is stronger than previous ones. Experimental results also show that our proposed algorithm leads to more reduction.

This paper is organized as follows. We first give basic definitions in Section 2. In Section 3, we present a general theory that gives sufficient conditions for POR in planning. In Section 4, we use the new theory to explain two previous POR algorithms. Based on the theory, in Section 5, we propose a new POR algorithm for planning which is different and stronger than previous ones. We report experimental results in Section 7, review some related work in Section 8, and give conclusions in Section 9.

2 Background

Planning is a core area of artificial intelligence. It entails arranging a course of actions to achieve certain goals under given constraints. Classical planning is the most fundamental form of planning, which deals with only propositional logic. In this paper, we work on the SAS+ formalism [Jonsson and
Bäckström (1998)]
of classical planning.
SAS+ formalism has recently attracted a lot of attention due to
a number of advantages it has over the traditional STRIPS formalism.
In the following, we review this formalism and introduce our notations.

Definition 1

A SAS+ planning taskΠ is defined as a tuple of four elements, Π={X,O,S,sI,sG}.

X={x1,⋯,xN} is a set of multi-valued state variables, each with an associated finite domain Dom(xi).

O is a set of actions and each action o∈O is a tuple (pre(o),eff(o)), where both pre(o) and eff(o) define some partial assignments of state variables in the form xi=vi,vi∈Dom(xi). sG is a partial assignment that defines the goal.

S is the set of states. A states∈S is a full assignment to all the state variables. sI∈S is the initial state. A state s is a goal state if sG⊆s.

Definition 2

Two partial assignment sets are conflict-free if and only if they do not assign different values to the same state variable.

For a SAS+ planning task, for a given state s and an action o, when all variable assignments in pre(o) are met in state s, action o is applicable in state s. After applying o to s, the state variable assignment will be changed to a new state s′ according to eff(o): the state variables that appear in eff(o) will be changed to the assignments in eff(o) while other state variables remain the same.
We denote the resulting state after applying an applicable action o to s as s′=apply(s,o). apply(s,o) is undefined if o is not applicable
in s. The planning task is to find a path, or a
sequence of actions, that transits the initial state sI to
a goal state that includes sG.

An important structure for a given SAS+ task is the domain transition graph defined as follows:

Definition 3

For a SAS+ planning task, each state variable xi(i=1,⋯,N) corresponds to a domain transition graph (DTG)Gi, a directed graph with a vertex set V(Gi)=Dom(xi)∪v0, where v0 is a special vertex, and an edge set E(Gi) determined by the following.

If there is an
action o such that (xi=vi)∈pre(o) and (xi=v′i)∈eff(o), then
(vi,v′i) belongs to E(Gi) and we say that
o is associated with the edge ei=(vi,v′i) (denoted as o⊢ei). It is conventional to call the edges in DTGs transitions.

If there is an
action o such that (xi=v′i)∈eff(o) and no assignment to xi is in pre(o), then
(v0,v′i) belongs to E(Gi) and we say that
o is associated with the transition ei=(v0,v′i) (denoted as o⊢ei).

Intuitively, a SAS+ task can be decomposed into multiple objects, each corresponding to one DTG, which models the transitions of the possible values of that object.

Definition 4

For a SAS+ planning task, an action o is associated with a DTG Gi (denoted as
o⊢Gi) if eff(o) contains an assignment to xi.

Definition 5

For a SAS+ planning task, a DTG Gi is goal-related if the partial
assignments in sG that define the goal states include an assignment xi=gi in
Gi. A goal-related DTG is unachieved in state s if xi=vi in s and
vi≠gi.

A SAS+ planning task can also specify a preference that needs to be optimized. A preference is a mapping from a path p to a numerical value.
In this paper we assume an action set invariant preference.
A preference is action set invariant if two paths have the same preference whenever they contain the same set of actions (possibly in different orders).
Most popular preferences, such as plan length and total action cost, are action set invariant.

3 Partial Order Reduction Theory for Planning

Partial order based reduction (POR) algorithms have been extensively studied
for model checking [Varpaaniemi (2005), Clarke
et al. (2000)], which also requires examining a state space in
order to prove certain properties. POR is a technique that allows a search to explore only part
of the entire search space and still maintain completeness and/or optimality.
Without POR, model checking would be too expensive to be practical [Holzmann (1997)]. However,
POR has not been studied systematically for planning.

In this section, we will first introduce the concept of search reduction. Then, we will present a
general POR theory for planning, which gives sufficient conditions that guide the design of practical POR algorithms.

3.1 Search reduction for planning

We first introduce the concept of search reduction. A standard search,
such as breath-first search (BFS), depth-first search, or A∗ search, needs
to explore a state space graph. A reduction algorithm is an algorithm that
reduces the state space graph into a subgraph, so that a search
will be performed on the subgraph instead of the original one. We first define the state space graph. In our presentation, for any graph G, we use V(G) to denote the set of vertices and E(G) the set of edges. For a directed graph G, for any vertex s∈V(G), a vertex s′∈V(G) is its successor if and only if (s,s′)∈E(G).

For a SAS+ planning task, a state space graph for the task is a directed graph G in which each state s is a vertex and each directed edge (s,s′) represents an action that will be explored during a search process.
Most search algorithms work on the original state space graph as defined below.

Definition 6

For a SAS+ planning task, its original state space graph is a directed
graph G in which each state s is a vertex and there is a directed edge (s,s′) if and only
if there exists an action o such that apply(s,o)=s′. We say that action omarks
the edge (s,s′).

Definition 7

For a SAS+ planning task, for a state space graph G, the successor set of a
state s , denoted by
succG(s), is the set of all the successor states of s.
The expansion set of a state s, denoted by
expandG(s), is the set of actions

expandG(s)={o|omarks(s,s′),(s,s′)∈E(G)}.

Intuitively, the successor set of a state s includes all the successor states that shall be
generated by a search upon expanding s, while the expansion set includes all
the actions to be expanded at s.

In general, a reduction method is a method that
maps each input state space graph G to a subgraph of
G. The POR algorithms we study remove edges from
G. More specifically, each state s is only
connected to a subset of all
its successors in the reduced subgraph.
We note that, by removing edges, a POR algorithm may also
reduce the number of vertices that are reachable from the initial state,
hence reducing the number of nodes examined by a search.
The decision whether a successor state s′ would still be
a successor in the reduced subgraph can be made locally by
checking certain conditions related to the current state and
some precomputed information. Hence, a POR algorithm can be combined with
various search algorithms.

For a SAS+ task, a solution sequence in its state space
graph G is a pair (s0,p), where s0
is a non-goal state, p=(a1,…,ak) is a
sequence of actions, and, let si=apply(si−1,ai),i=1,…,k, (si−1,si) is
an edge in G for i=1,…,k and
sk is a goal state. We now define some generic properties of reduction methods.

Definition 8

For a SAS+ planning task, a reduction method is completeness-preserving if
for any solution sequence (s0,p) in the state space graph,
there also exists a solution sequence (s0,p′) in
the reduced state space graph.

Definition 9

For a SAS+ planning task, a reduction method is optimality-preserving if, for any
solution sequence (s0,p)
in the state space graph, there also exists a solution sequence (s0,p′) in the
reduced state space graph satisfying that p′ has the same preference that p does.

Definition 10

For a SAS+ planning task, a reduction method is action-pre-serving if, for any solution
sequence (s0,p)
in the state space graph, there also exists a solution sequence (s0,p′) in
the reduced state space graph satisfying that the actions in p′ is a permutation of the actions
in p.

Clearly, being action-preserving is a sufficient condition for being completeness-preserving.
When the preference is action set invariant, being action-preserving is also a sufficient condition
for being optimality-preserving.

3.2 Stubborn set theory for planning

Although there are many variations of POR methods, a popular and representative POR algorithm is the stubborn set method [Valmari (1988), Valmari (1989), Valmari (1990), Valmari (1998), Valmari (1991), Valmari (1993)], used for model checking based on Petri nets. The basic idea is to form a stubborn set of applicable actions for each state and expand only the actions in the stubborn set during search. By expanding a small subset of applicable actions in each state, stubborn set methods can reduce the search space without compromising completeness.

Since planning also examines a large search space, we propose to develop a stubborn set theory for planning. To achieve this, we need to handle various subtle issues arising from the differences between model checking and planning. We first define the concept of stubborn sets for planning,
adapted from the concepts in model checking.

Definition 11 (Stubborn Set for Planning)

For a SAS+ planning task, a set of actions T(s) is a stubborn set at state s if
and only if

For any action b∈T(s) and actions b1,⋯,bk∉T(s), if (b1,⋯,bk,b) is a prefix of a path from s to a goal state,
then (b,b1,⋯,bk)
is a valid path from s and leads to the same state that (b1,⋯,bk,b) does; and

Any valid path from s to a goal state contains at least one action in T(s).

The above definition is schematically illustrated in Figure 1.
Once we define the stubborn set T(s) at each state s, we in effect reduce
the state space graph to a subgraph: only the edges corresponding to actions
in the stubborn sets are kept in the subgraph.

Definition 12

For a SAS+ planning task, given a stubborn set T(s) defined at each state s,
the stubborn set method reduces its state space
graph G to a subgraph Gr such that
V(Gr)=V(G) and there is an edge (s,s′) in
E(Gr) if and only if there exists an action o∈T(s)
such that s′=apply(s,o).

A stubborn set method for planning is a reduction method that
reduces the original state space
graph G to a subgraph Gr according to Definition 12.
In other words, a stubborn set method expands actions only in a stubborn set in each state. In
the sequel, we show that such a reduction method preserves actions,
hence, it also preserves completeness and optimality.

Theorem 1

Any stubborn set method for planning is action-preserving.

\proof

We prove that for any solution sequence
(s0,p) in the original state space graph G, there exists a
solution sequence (s0,p′) in the reduced state space graph
Gr resulting from the stubborn set method,
such that p′ is a permutation of actions in p.
We prove this fact by induction on k, the length of p.

When k=1, let a be the only action in p, according to the second condition in
Definition 12, a is in T(s0).
Thus, (s0,p) is also a
solution sequence in Gr. The EC method is
action-preserving in the base case.

When k>1, the induction assumption is that any path in G with length less than
or equal to k−1 has a permutation in Gr that leads to the same final state. Now we consider a solution sequence (s0,p) in
G: p=(a1,…,ak). Let si=apply(si−1,ai),i=1,…,k.
If a1∈T(s), we can invoke the induction assumption for the
state s1 and prove our induction assumption for k.

We now consider the case where a1∉T(s).
Let aj be the first action in p such that aj∈T(s). Such an
action must exist because of the condition A2 in
Definition 11.

Consider the sequence p∗=(aj,a1,⋯,aj−1,aj+1,⋯,ak).
According to condition A1 in Definition 12, (aj,a1,⋯,aj−1) is also a valid sequence from s0 which leads to
the same state that (a1,⋯,aj) does. Hence, we know that (s0,p∗) is also a
solution path. Therefore, let s′=apply(s0,aj), we know (a1,⋯,aj−1) is an
executable action sequence starting from s′.
Let p∗∗=(a1,⋯,aj−1,aj+1,⋯,ak), (s′,p∗∗) is a solution sequence in
G. From the induction assumption, we know there is a
sequence p′ which is a permutation of p∗∗, such that
(s′,p′) is a solution sequence in Gr. Since aj∈T(s0), we know that aj followed by p′ is a solution
sequence from s0 and is a permutation of actions in p∗, which is a
permutation of actions in p. Thus, the stubborn set method
is action-preserving.
\endproof

Since being action-preserving is a sufficient condition for being completeness-preserving
and optimality-preserving, when the preference is action set invariant, we have the following
result.

Corollary 1

A stubborn set method for planning is completeness-preserving. In addition, it is
optimality-preserving when the preference is action set invariant.

3.3 Left commutativity in SAS+ planning

Note that although Theorem 1 provides an important result for reduction, it is
not directly applicable since the conditions in Definition 11 are abstract
and not directly implementable in algorithms. We need to find sufficient conditions for
Definition 11 that can facilitate the design of reduction algorithms. In the
following, we define several concepts that can lead to sufficient conditions for
Definition 11.

Definition 13 (State-Dependent Left Commutativity)

For a SAS+ planning task, an ordered action pair (a,b),a,b∈O is
left commutative in state s, if (a,b) is
a valid path at s, and (b,a) is also a valid path at s and
results in the same state. We denote such a relationship by s:b⇒a.

Definition 14 (State-Independent Left Commutativity)

For a SAS+
planning task, an ordered action pair (a,b),a,b∈O is
left commutative if, for any state s, it is true that s:b⇒a.
We denote such a relationship by b⇒a.

Note the following. 1) Left commutativity is not a symmetric
relationship. b⇒a does not imply a⇒b.
2) The order in the notation b⇒a suggests that we should always
try only (b,a) during the search instead of trying both (a,b)
and (b,a). Also, not every state-independent left commutative action pair
is state-dependent left commutative. For instance, in a SAS+ planning task
with three state variables {x1,x2,x3}, action a with pre(a)={x1=0},
eff(a)={x2=1} and action b with pre(b)={x2=1,x3=2},
eff(b)={x3=3} are left commutative in state s1={x1=0,x2=1,x3=3}
but not in state s2={x1=0,x2=0,x3=2} as b is not applicable in state s2.

We introduce state-independent left commutativity as it can be used to derive sufficient
conditions for finding stubborn sets.

Definition 15 (State-Independent Left Commutative Set)

For a SAS+
planning task, a set of actions T(s) is a left commutative set at a state s if and
only if

For any action b∈T(s) and any action a∈O−T(s), if there exists a valid
path from s to a goal state that contains both a and b, then it is the case
that b⇒a; and

Any valid path from s to a goal state contains at least one action in T(s).

In this diagram, the left part plots the condition L1 in Definition 15
and the right part plots the strategy in the proof to Theorem 2.

Figure 2: Illustration of left commutative set.

Theorem 2

For a SAS+ planning task, for a state s, if a set of actions T(s) is a state-independent
left commutative set, it is also a stubborn set.

\proof

We only need to prove that L1 in Definition 15 implies A1 in Definition
11. The proof strategy is schematically shown in Figure 2.

For an action b∈T(s) and actions b1,⋯,bk∉T(s), if (b1,⋯,bk,b) is a prefix of a path from s to a goal state, then
according to L1, we see that b⇒bi, for i=1,⋯,k. According to the
definition of left commutativity, we see that bk and b can be swapped and that the resulting
path (b1,⋯,b,bk) is still a valid path that leads to the same state that (b1,⋯,bk,b) does. We can subsequently swap b with bk−1, ⋯, and b1 to
obtain equivalent paths, before finally obtaining (b,b1,⋯,bk), as shown in the schematic illustration in the right part of Figure 2. Hence, we have shown that if
p=(b1,⋯,bk,b) is a prefix of a path from s to a goal state, then
p′=(b,b1,⋯,bk) is a also valid path from s that leads to the same state that p does,
which is exactly the condition A1 in Definition 11.
\endproof

From the above proof, we see that the requirement of state-independent left commutativity in
Definition 15 is unnecessarily strong. Instead, only certain state-dependent left
commutativity is necessary. In fact, when we change
(b1,⋯,bk,b) to (b1,⋯,b,bk), we only require s′:b⇒bk
where s′ is the state after bk−1 is executed. Similarly, when we change
(b1,⋯,bk,b) to (b1,⋯,b,bk−1,bk), we only require
s′′:b⇒bk−1 where s′′ is the state after bk−2 is executed. Based on the
above analysis, we can refine the sufficient conditions.

Definition 16 (State-Dependent Left Commutative Set)

For a SAS+
planning task, a set of actions T(s) is a left commutative set at a state s if and
only if

For any action b∈T(s) and actions b1,⋯,bk∉T(s),
if (b1,⋯,bk,b) is a prefix of a path from s to a goal state, then
s′:b⇒bk, where s′ is the state after (b1,⋯,bk−1) is executed; and

Any valid path from s to a goal state contains at least one action in T(s).

We only need to slightly modify the proof to Theorem 2 in order to prove the
following theorem.

Theorem 3

For a SAS+ planning task, for a state s, if a set of actions T(s) is a state-dependent left
commutative set, it is also a stubborn set.

The above result gives sufficient conditions for finding stubborn sets in planning. The
concept of state-dependent left commutative set requires a less stringent condition than the
state-independent left commutative set.
Such a nuance actually leads to different previous POR algorithms with varying performances.
Therefore, it will result in smaller T(s) sets and
stronger reduction. Next, we present our algorithm for finding such a set at each state to satisfy these conditions.

3.4 Determining left commutativity

Theorem 3 provides a key result for POR. However, the conditions in
Definition 13 are still abstract and not directly implementable.
The key issue is to efficiently find left commutative action pairs.
Now we give necessary and sufficient conditions for Definition 13 that can
practically determine left commutativity and facilitate the design of reduction algorithms.

Theorem 4

For a SAS+ planning task, for a valid action path (a,b) in state s,
we have s:b⇒a if and only if pre(a) and eff(b), pre(b) and eff(a),
eff(a) and eff(b) are all conflict-free and b is applicable
at s.

\proof

First, from the definition of s:b⇒a, we know that
action b is applicable in state apply(s,a). This implies that pre(b) and eff(a)
are conflict-free. Symmetrically, since action a is applicable in state
apply(s,b), pre(a) and eff(b) are also conflict-free.
Now we prove eff(a) and eff(b) are conflict-free by contradiction.
If eff(a) and eff(b) are not conflict-free, without loss of generality, we can assume that eff(a) contains xi=vi and eff(b)
contains xi=v′i≠vi. Thus, the value of xi is vi for state
sab=apply(apply(s,a),b) and v′i for state sba=apply(apply(s,b),a),
i.e., sab is different than sba. This contradicts our assumption that a and b are
left commutative. Thus, eff(a) and eff(b) are conflict-free.

Second, if pre(a) and eff(b), eff(a)
and pre(b), eff(a) and eff(b) are all conflict-free,
since a is applicable in s, a is also applicable in state apply(s,b)
as pre(a) and eff(b) are conflict-free. Hence, (b,a) is a valid path at s.
Also, for any state variable xi, its value in states sab=apply(apply(s,a),b) and
sba=apply(apply(s,b),a) are the same, because
eff(a) and eff(b) are conflict-free. Therefore, we have sab=sba.
Hence, we have s:b⇒a.
\endproof

Theorem 4 gives necessary and sufficient
conditions for deciding whether two actions are left-commutative or not. Based
on this result, we later develop practical POR algorithms that find stubborn
sets using left commutativity.

4 Explanation of previous POR algorithms

Previously, we have proposed two POR algorithms for planning: expansion core
(EC) [Chen and Yao (2009)] and stratified planning (SP) [Chen
et al. (2009)], both of
which showed good performance in reducing the search space. However we did
not have a unified explanation for them. We now explain how these two
algorithms can be explained by our theory.
Full details of the two algorithms can be found in our
papers [Chen and Yao (2009), Chen
et al. (2009)].

4.1 Explanation of EC

Expansion core (EC) algorithm is a POR-based reduction algorithm for planning. We will see that, in essence, the EC algorithm exploits the SAS+ formalism to find a left commutative set for each state. To describe the EC algorithm, we need the following definitions.

Definition 17

For a SAS+ task, for each DTG Gi,i=1,…,N, for a vertex
v∈V(Gi), an edge e∈E(Gi) is a potential
descendant edge of v (denoted as v⊲e) if 1) Gi is
goal-related and there exists a path from v to the goal state in
Gi that contains e; or 2) Gi is not goal-related and e
is reachable from v.

Definition 18

For a SAS+ task, for each DTG Gi,i=1,…,N, for a vertex
v∈V(Gi), a vertex w∈V(Gi) is a
potential descendant vertex of v (denoted as v⊲w) if 1) Gi is goal-related and there exists a path from v
to the goal state in Gi that contains w; or 2) Gi is not
goal-related and w is reachable from v.

Figure 3: A SAS+ task with four DTGs.
The dashed arrows show preconditions (prevailing and transitional) of each edge (action). Actions are marked with letters a to f. We see that b and e are associated with more than one DTG.

Definition 19

For a SAS+ task, given a state s=(s1,⋯,sN), for any 1≤i,j≤N,i≠j, we call si a potential precondition of the DTG Gj
if there exist o∈O and ej∈E(Gj) such that

sj⊲ej,o⊢ej,~{}and~{}si∈pre(o)

(1)

Definition 20

For a SAS+ task, given a state s=(s1,…,sN), for any 1≤i,j≤N,i≠j, we call si a potential dependent of the DTG Gj if
there exists o∈O, ei=(si,s′i)∈E(Gi) and
wj∈V(Gj) such that

sj⊲wj,o⊢ei,~{}and~{}wj∈pre(o)

(2)

Definition 21

For a SAS+ task, for a state s=(s1,…,sN),
its potential dependency graph PDG(s) is a
directed graph in which each DTG Gi,i=1,⋯,N corresponds to a vertex, and there is an edge from Gi to
Gj, i≠j, if and only if si is a potential
precondition or potential dependent of Gj.

Figure 3 illustrates the above definitions. In
PDG(s), G1 points to G2 as s1 is a potential
precondition of G2 and G2 points to G1 as s2 is a
potential dependent of G1.

Definition 22

For a directed graph H, a subset C of V(H) is a
dependency closure if there do not exist v∈C and w∈V(H)−C such that (v,w)∈E(H).

Intuitively, a DTG in a dependency closure may depend on other DTGs in
the closure but not those DTGs outside of the closure.
In Figure 3, G1 and G2
form a dependency closure of PDG(s).

The EC algorithm is defined as follows:

Definition 23 (Expansion Core Algorithm)

For a SAS+ planning task, the EC method reduces its state space graph G to a
subgraph Gr such that V(Gr)=V(G) and for each
vertex (state) s∈V(G), it expands actions in the following set T(s)⊆O:

T(s)=⋃i∈C(s){o∣∣∣o∈exec(s)∧o⊢Gi},

(3)

where exec(s) is the set of executable actions in s and
C(s)⊆{1,⋯,N} is an index set
satisfying:

The DTGs {Gi,i∈C(s)} form a
dependency closure in PDG(s); and

There exists i∈C(s) such that Gi is goal-related and si is not the goal state in Gi.

Intuitively, the EC method can be described as
follows. To reduce the original state-space graph, for each state,
instead of expanding actions in all the DTGs, it only expands actions in DTGs that belong to a dependency closure of PDG(s) under the condition
that at least one DTG in the dependency closure is goal-related and not
at a goal state.

The set C(s) can always be found for any non-goal
state s since PDG(s) itself is always such a dependency
closure. If there is more than one such closure,
theoretically any dependency closure satisfying the above
conditions can be used in EC.
In practice, when there are multiple such
dependency closures, EC picks the one with less actions
in order to get stronger reduction. EC has adopted the following scheme
to find the dependency closure for any state s.

Given a PDG(s), EC first finds its strongly connected components (SCCs). If each SCC is contracted to a single vertex, the
resulting graph is a directed acyclic graph S. Note
that each vertex in S with a zero out-degree
corresponds to a dependency closure. It then topologically sorts all the
vertices in S to get a sequence of SCCs: S1,S2,⋯,
and picks the minimum m such that Sm includes a goal-related DTG
that is not in its goal state. It chooses all the DTGs in S1,⋯,Sm
as the dependency closure.

Now we explain the EC algorithm using the POR theory we developed in Section 3.
We show that the EC algorithm can be viewed as an algorithm for finding a state-dependent
left-commutative set in each state.

Lemma 1

For a SAS+ planning task, the EC algorithm defines a state-dependent left commutative set for each state.

\proof

Consider the set of actions T(s) expanded by the EC algorithm in each state s, as defined in (3). We prove that T(s) satisfies conditions L1’
and A2 in Definition 16.

Consider an action b∈T(s) and actions b1,⋯,bk∉T(s) such that (b1,⋯,bk,b) is a prefix of a
path from s to a goal state, we show that s′:b⇒bk, where s′ is the state after (b1,⋯,bk−1) is applied to s.

Let C(s) be the index set of the DTGs that form a dependency closure, as used in in (3).
Since b∈T(s), there must exist m∈C(s) such that b⊢Gm.
Let the state after applying (b1,⋯,bk) to s be s∗.
We see that we must have s∗m=sm because otherwise there must exist a bj,1≤j≤m that changes the assignment of state variable xm. However, that would imply that bk∈T(s). Since b is applicable in s∗, we see that sm=s∗m∈pre(b).

If there exists a state variable xi such that an assignment to xi is in both eff(bk) and pre(b), then Gm will point to the DTG Gi as sm is a potential dependent of Gi, forcing Gi to be included in the dependency closure, i.e. i∈C(s). However, as bk⊢Gi, it will violate our assumption that bk∉T(s).
Hence, none of the precondition assignments of b is added by bk. Therefore, since b is applicable in apply(s′,bk), it is also applicable in s′.

On the other hand, if bk has a precondition assignment in a DTG
that b is associated with, then Gm will point to that
DTG since sm is a potential precondition of bk, forcing that DTG to be in C(s), which
contradicts the assumption that bk∉T(s). Hence, b does not alter any
precondition assignment of bk. Therefore, since bk is applicable in s′, it is also applicable in the state apply(s′,b).

Finally, if there exists a state variable xi such that an assignment to xi is altered by both b and bk, then we know b⊢Gi and bk⊢Gi.
In this case, Gm will point to Gi since sm is a potential precondition of Gi,
making bk∈T(s), which contradicts our assumption.
Hence, eff(b) and eff(bk) correspond to assignments to distinct sets of state variables. Therefore, applying (bk,b) and (b,bk) to s′ will lead to the same state.

From the above, we see that b is applicable in s′, bk is applicable in apply(s′,b), and hence (b,bk) is applicable in s′. Further we see that (b,bk) leads to the same state as (bk,b) does when applied to s′. We conclude that s′:b⇒bk and T(s) satisfies L1’.

Moreover, for any goal-related DTG Gi, if in a state s, its assignment si is not the goal state in Gi, then some actions associated with Gi have to be executed in any solution
path from s. Since T(s) includes all the actions in at least one goal-related DTG Gi, any solution path must contain at least one action in T(s). Therefore,
T(s) also satisfies A2 and it is indeed a state-dependent left commutative set.
\endproof

From Lemma 1 and Theorem 3, we obtain the following result, which
shows that EC fits our framework as a stubborn set method for planning.

Theorem 5

For any SAS+ planning task, the EC algorithm defines a stubborn set in each state.

4.2 Explanation of SP

Definition 24

Given a SAS+ planning task Π with state variable set X, the
causal graph (CG) is a directed graph CG(Π)=(X,E) with X as the vertex set. There is an edge (x,x′)∈E
if and only if x≠x′ and there exists an action o such
that x∈eff(o) and x′∈pre(o) or eff(o).

Definition 25

For a SAS+ task Π, a stratification
of the causal graph CG(Π) as (X,E) is a partition of the node set
X: X=(X1,⋯,Xk) in such a way that there exists no
edge e=(x,y) where x∈Xi,y∈Xj and i>j.

By stratification, each state variable is assigned a level L(x),
where L(x)=i if x∈Xi,1≤i≤k. Subsequently, each
action o is assigned a level L(o), 1≤L(o)≤k. L(o)
is the level of the state variable(s) in eff(o). Note that all
state variables in the same eff(o) must be in the same level, hence, our L(o) is well-defined.

Definition 26 (Follow-up Action)

For a SAS+ task Π, an action b is a follow-up action of a
(denoted as a⊳b) if eff(a)∩pre(b)≠∅ or eff(a)∩eff(b)≠∅.

The SP algorithm can be combined with standard search algorithms,
such as breadth-first search, depth-first search, and best-first
search (including A∗). During the search, for each state s
that is going to be expanded, the SP algorithm examines the action
a that leads to s. Then, for each
applicable action b in state S, SP makes the following
decisions.

Definition 27 (Stratified Planning Algorithm)

For a SAS+ planning task, in any non-initial state s, assuming a is
the action that leads directly to s, and b is an applicable
action in s, then SP does not expand b
if L(b)<L(a) and b is not a follow-up action of a.
Otherwise, SP expands b. In the initial state s0, SP expands all applicable actions.

The following result shows the relationship between the
SP algorithm and our new POR theory.

Lemma 2

If an action b is not SP-expandable after a, and state s is the state
before action a, then s:b⇒a.

\proof

Since b is not SP-expandable after a, following the SP algorithm, we have L(a)>L(b) and b is not a follow-up action of a.
According to Definition 26, we have eff(a)∩pre(b)=eff(a)∩eff(b)=∅. These imply that
eff(a) and pre(b) are conflict-free, and that eff(a)
and eff(b) are conflict-free. Also, since b is
applicable in apply(s,a) and eff(a) and pre(b) are conflict-free,
b must be applicable in s (Otherwise eff(a) must change the value of at least one variable in pre(b), which means
eff(a) and pre(b) are not conflict-free).

Now we prove that pre(a) and eff(b) are conflict-free by showing
pre(a)∩eff(b)=∅. If their intersection is non-empty,
we assume a state variable x is assigned by both pre(a) and eff(b).
By the definition of stratification, x is in layer L(b).
However, since x is assigned by pre(a),
there must be an edge from layer L(a) to layer L(x)=L(b)
since L(a)≠L(b). In this case, we know
that L(a)<L(b) from the definition of stratification.
Nevertheless, this contradicts with the assumption
that L(a)>L(b). Thus, pre(a)∩eff(b)=∅, and
pre(a) and eff(b) are conflict-free.

With all three conflict-free pairs, we have s:b⇒a
according to Theorem 2.
\endproof

Although SP reduces the search space by avoiding the expansion
of certain actions, it is in fact not a stubborn set based reduction algorithm. We have the following theorem for the SP algorithm.

Definition 28

For a SAS+ planning task S, a valid path pa=(a1,⋯,an)
is an SP-path if and only if pa is a path
in the search space of the SP algorithm applied to S.

Theorem 6

For a SAS+ planning task S, for any initial s0 and any valid
path pa=(a1,⋯,an) from s0, there
exists a path pb=(b1,⋯,bn) from s0 such
that pb is an SP-path, and both pa and pb lead to the same state
from s0, and pb is a permutation of actions in pa.

\proof

We prove by induction on the number of actions.

When n=1, since there is no action before s0,
any valid path (a1) will also be a valid path
in the search space of the SP algorithm.

Now we assume this proposition is true of for n=k,k≥1
and prove the case when n=k+1. For a valid
path p0=(a1,⋯,ak,ak+1), by our induction
hypothesis, we can rearrange the first k actions
to obtain a path (a11,a12,⋯,a1k).

Now we consider a new path p1=(a11,⋯,a1k,ak+1).
There are two cases. First, if L(ak+1)<L(a1k), or
L(ak+1)>L(a1k) and ak+1 is a follow-up action of a1k,
then p1 is already an SP-path. Otherwise, we
have
L(ak+1)>L(a1k) and ak+1 is not a follow-up action of a1k.
In this case, by Lemma 2, path p1′=(a11,⋯,a1k−1,ak+1,a1k
is also a valid path that leads s to the same state as pa does.

By the induction hypothesis, if p1′ is still not an SP-path, we can rearrange
the first k actions in p1′ to get a new path p2=(a21,⋯,a2k,a1k).
Otherwise we let p2=p1′.
Comparing p1 and p2, we know L(ak+1)>L(a1k), namely, the level
value of the last action in p1 is strictly larger than that in p2. We can repeat
the above process to generate p3,⋯,pm,⋯ as long as pj(j∈Z+) is not
an SP-path. Our transformation from pj to pj+1 also ensures that every pj is
a valid path from s and leads to the same state that pa does.

Since we know that the layer value of the last action in each pj is monotonically decreasing
as j increases, such a process must stop after a finite number of iterations. Suppose
it finally stops at pm=(a′1,a′2,⋯,a′k,a′k+1, we must have that L(a′k+1)≤L(a′k)
or L(a′k+1)>L(a′k) and a′k+1 is a follow-up action of ak′. Hence,
pm now is an SP-path. We then assign pm to pb and the induction step is proved.
\endproof

Theorem 6 shows that the SP algorithm cannot reduce
the number of states expanded in the search space. The reason is as follows:
for any state in the original search space that is reachable from the initial
state s0 via a path p, there is still an SP-path that reaches s.
Therefore, every reachable state in the search space is still reachable
by the SP algorithm. In other words, SP reduces the number of generated states, but not the number of expanded states.

SP is not a stubborn set based reduction algorithm. This can be illustrated by the following example.

Assuming a SAS+ planning task S that contains two state variables x1 and x2,
where both x1 and x2 have domain {0,1}, with the initial state
as {x1=0,x2=0} and the goal as {x1=1,x2=1}. Actions a
and b are two actions in S where pre(a) is {x1=0} and eff(a) is
{x1=1} and pre(b) is {x2=0} and eff(b) is
{x2=1}. It is easy to see that a and b are not
follow-up actions of each other, and that x1,x2 will be
in different layers after stratification. Without loss of generality, we can assume L(a)=L(x1)>L(x2)=L(b). Therefore, we know that action b
will not be expanded after action a in state s:{x1=1,x2=0}.
However, apply(s,b) is the goal. Not expanding b in state s violates condition A2
in Definition 11 where any valid path from s to a
goal state has to contain at least one action in the expansion set of s.

We can also see in the above example that the search space explored by SP
contains four states, namely, the initial state s0, apply(s0,a), apply(s0,b)
and the goal state. Meanwhile, under the EC algorithm, in state s0, the DTGs for
x1 and x2 are not in each other’s dependency closures.
This implies that in s0, EC expands either action a or b, but not both.
Therefore, EC expands three states while SP expands four.
This illustrates our conclusion
in Theorem 6 that the SP algorithm cannot reduce
the number of expanded states.

5 A New POR Algorithm Framework for Planning

We have developed a POR theory for planning and explained
two previous POR algorithms using the theory. Now, based on the theory, we propose
a new POR algorithm which is stronger
than the previous EC algorithm.

Our theory shows in Theorem 3 that the condition
for enabling POR reduction is strongly related to left commutativity
of actions. In fact, constructing a stubborn set can be
reduced to finding a left commutativity set. As we show in Theorem 5,
the EC algorithm follows this idea. However, the basic unit
of reduction in EC is DTG (i.e. either all actions in a DTG are expanded or none of them are), which
is not necessary according to our theory. Based on this insight,
we propose a new algorithm that operates with the granularity of actions instead
of DTGs.

Definition 29

For a state s, an action set L is a
landmark action set if and only if any valid path starting from s to a
goal state contains at least one action in L.

Definition 30

For a SAS+ task, an action a∈O is supported by an action b
if and only if pre(a)∩eff(b)≠∅.

Definition 31

For a state s, its action support graph (ASG) at s is defined as a directed graph
in which each vertex is an action, and there is an edge from a to b if and only if a is not
applicable in s and a is supported by b.

The above definition of ASG is a direct extension of the definition of a causal graph. Instead of having domains as basic units, here we directly use actions as basic units.

Definition 32

For an action a and a state s, the action core of a at s, denoted
by ACs(a), is the set of actions that are in the transitive
closure of a in ASG(s). The action core for a given set of actions A is the union
of action cores of every action in A.

Lemma 3

For a state s, if an action a is not
applicable in s and there is a valid path p starting from s whose last action is a,
then p contains an action b,b≠a,b∈ACs(a).

\proof

We prove this by induction on the length of p.

In the base case where |p|=2, we assume p=(b,a).
Since a is not applicable in s, it must be supported
by b. Thus, b∈ACs(a).
Suppose this lemma is true for 2≤|p|≤k−1, we prove the case for |p|=k.
For a valid path p=(o1,…,ok), again there exists an action b before a
that supports a. If b is applicable in s, then b∈ACs(a). Otherwise,
we have a path p′=(o1,…,b) with 2≤|p′|≤k−1. Thus,
by the induction assumption, p′ contains at least one action in ACs(b), which
is a subset of ACs(a), according to Definition 31 and 32.
\endproof

Definition 33

Given a SAS+ planning task Π with O as set of all actions O,
for a state s and a set of action A, the action closure of action set A at s, denoted by
by Cs(A), is a subset of O and a super set
of A such that for any applicable action a∈Cs(A) at s and any action b∈O∖Cs(A), eff(a) and eff(b) are conflict-free. In addition,
if pre(b)∈S, eff(a) and pre(b) are conflict-free.

Intuitively, actions in Cs(A) can be executed without affecting the completeness and optimality of search.
Specifically, because any applicable action in Cs(A) and any action not in Cs(A) will not assign different values to the same
state variable, for action a∈Cs(A) and action b∈O∖Cs(A) at s, path (a,b) will lead to the same state that (b,a) does. Additionally, because pre(b) and eff(a) are conflict-free
when pre(b)∈s, executing action a will not affect the applicability of action b in future. Therefore,
actions in Cs(A) can be safely expanded first during the search, while actions outside it can be expanded later.

A simple procedure, shown in Algorithm 1, can be used to find the action closure for a given action set A.

input : A SAS+ task with action set O, an action set A⊆Q, and a state s

output : An action closure C(A) of A

C(A)←A;

repeat

foreachaction a in C(A) applicable in sdo

foreachaction b in O∖C(A)do

ifpre(b)∩s≠∅andpre(b) and eff(a) are not conflict-freethen

C(A)←C(A)∪{b} ;

end if

ifeff(b) and eff(a) are not conflict-free then

C(A)←C(A)∪{b} ;

end if

end foreach

end foreach

untilC(A) is not changing ;

returnC(A) ;

Algorithm 1 A procedure to find action closure

The proposed POR algorithm, called stubborn action core (SAC), works as follows. At any given
state s, the expansion set E(s) of state s is determined by Algorithm 2.

input : A SAS+ planning task and state s

output : The expansion set E(s)

Find a landmark action set L at s ;

Calculate the action core ACs(L) of L using Algorithm 1;

Use ACs(L) as E(s) ;

Algorithm 2 The SAC algorithm

There are various ways to find a landmark action set for a given state.
Here we give one example that is used in
our current implementation. To find a landmark action set L at s, we utilize the DTGs associated with the SAS+ formalism. We first find a transition set that includes all possible transitions (si,vi) in an unachieved goal-related DTG Gi
where si is the current state of Gi in s. It is easy to see
that all actions that mark transitions in this set make up a landmark action set, because Gi is unachieved and at least one action starting from si has to be performed in any solution plan.

There are also other ways to find a landmark action set. For instance, the pre-processor
in the LAMA planner [Richter
et al. (2008)] can be used to find landmark facts, and all actions that lead to these landmark facts also make up a landmark action set.

Theorem 7

For a state s, the expansion set E(s) defined by the SAC algorithm is a stubborn set at s.

\proof

We first prove that our expansion set E(s) satisfies condition
A1 in Definition 11, namely, for any action
b∈E(S), and actions b1,⋯,bk∉E(s),
if (b1,⋯,bk,b) is a valid path from s, then
(b,b1,⋯,bk) is also a valid path, and
leads to the same state that (b1,⋯,bk,b) does.

To simplify this proof, we can treat action
sequence (b1,⋯,bk) as a “macro” action
B where an assignment xt=vt in pre(B) if and only if
xt=vt is in the precondition of some bi∈B and
xt=vt is not in the effects of a previous action bj(j<i),
and an assignment
xt=vt is in eff(B) if and only if
xt=vt is in the effect set of some bi∈B, and
xt is not assigned to any value other than vt in the effects of later action bj(j>i). In the following proof, we use the macro action B in place of the path (b1,⋯,bk).

To prove A1, we only need to prove that if (B,b) is a valid path, then
s:b⇒B. According to Theorem 4,
s:b⇒B if and only if the following four propositions are true.

a) Action b must be applicable in s. We prove this by contradiction. Let s′=apply(s,B),
if b is not applicable in s,
but applicable in s′, then
B supports b. Since all effects of B are from
actions in the path (b1,⋯,bk), there exists
an action bi∈{b1,⋯,bk} such that bi supports b. However,
according to Definition 32, bi is in the transitive closure
of b in ASG(s). According to our algorithm, bi should be in E(s). This
contradicts with our assumption that bi∉E(s). Thus, b must be applicable
at s.

b) pre(B) and eff(b) are conflict-free. We prove this proposition by contradiction.
If pre(B) and eff(b) are not conflict-free, we assume that
pre(B) has xt=vt that conflicts with an assignment in