SMT Fundamentals and Details of the RCLL Encoding

Page Table Of Contents

In this wikientry we present the fundamentals of the SMT planning approach and the details of our encoding of the RCLL.
Detail about the encoding are explained in plugin clips-smt.

SMT Fundamentals

SMT is a procedure to evaluate satisfiability of FOL formulas.
Each formula has a equi-satisfiable version in conjunctive normal form (CNF) such that we can assume to observe only top level clauses (which we will call constraints).
If a formula is satisfiable a so-called SMT solver returns a model containing the assignments for each variable.
SMT can be combined with a procedure called Bounded Model Checking (BMC).
The general idea of BMC is to determine a certain bound and test if some properties are satisfied within this bound.
Depending on the answer we are interested in some higher bound or break the procedure.

It is possible to encode a planning domain inside a FOL formula.
The formula will consist of the fluents, how actions influence the fluents and the initial and goal conditions.
Using the principles of BMC we use a plan horizon ph to check if a plan exists using ph many actions.
If the solver states that the formula is unsatisfiable we know that no plan with ph many actions exists.
Otherwise a plan with ph many actions exist and we can extract the plan by checking the assignments for each action step.

Depending on the size and complexity of the formula, solving can be inefficient.
In this case the task is to reduce complexity by adding additional constraints which reduce the search space of possible assignments (representing additional knowledge over the domain).

The general skeleton of our planning encoding is constructed as follows:

Bounds on variables

Initial consistency of fluents

Inductive consistency of fluents

Influence of actions on fluents

Initial constraints

Goal constraints

Each fluent Fluent is is encoded by variables FluentA/B_i with i representing the action step and A the value before applying the action in step i and B the value after.
The consistency constraints ensure that the values do not change over the evolution of actions, by enforcing that FluentA_i = FluentB_(i-1).
Additional variables store relevant information.
For example A_i specifies which action is executed in which step.

Because we define the variable A_i to be an int the solver will try several values.
What would happen if we define now up to n actions but forget to add a bound for A_i.
Then (if none of the intended actions is applicable) the solver will test non-intended value for A_i which are not tied done by an action definition.
Because of this behavior adding bounds is necessary.

Other relevant information depends on the domain and can be used to keep track of some metric.

OMT

Based on SMT it is possible use OMT which aims to find a satisfying assignment optimizing a certain variable.
This can be used for planning problems in order to return a plan, e.g., reducing the distance traveled by all robots.
The principle behind this technique is that we keep track of the heuristic inside a variable we want to optimize.
Then for each satisfying assignment we add some more constraint stating that the variable must be below or above (with respect to the optimization direction) the currently found assignment.
As long as solutions exist improving the heuristic we find them.
At some point the modified formula becomes unsatisfiable and we stop the procedure with the model before the unsatisfiable call as the optimal assignment.

Of course OMT is way more expensive with respect to solving time than SMT, because we have to call SMT over and over again.

Details of the RCLL Encoding

In this part of the wikientry we will have a closer look on how the RCLL domain is encoded in a FOL formula.

Fluents

We encode what robots hold holdA/B_i, where they are pos, what is inside insideA/B_i and outside outsideA/B_i a machine.
Note that even though we have several machines and robots we only have on fluent for each time step.
Here we use the information that each action is performed by one robot and influencing one machine.
The make advantage of this by keeping the fluents consistent with respect to the corresponding robots.

Imagine that R-1 holds a base after performing the action in step 1 and R-2 does not hold anything after its action in step 2. If R-1 is chosen for step 3 holdA_3 = holdB_1 and if R-2 is about to do something we need holdA_3 = holdB_2.
This principle allows us to represent all fluents in a compact way.

Additional variables in the context of the RCLL are

M_i stating which machine is worked with in step i.

R_i stating which robot is performing in step i.

t_j_i stating how much time robot R-j spent up till step i. Actually we are using distances of the navgraph multiplied with some constant plus machine duration for specific operations to simulate the time a robot needs.

t_i stating how much time all robots spent in total up till step i.

Actions

A action specifies a configuration of fluents before applying the action (precondition) and after (effect).
The general structure of an action encoding for some action j looks like:

This formula encodes that if we chose action j in step j the additional constraints have to be satisfied.
If A_i is not assigned to index_action_j then the complete formula is directly true.
Because we put a bound on all A_i at least ono right-hand-side of a action encoding must be satisfied, thus representing the execution of a action in the domain.

Note that if some fluent will not change (if fluent_a_action_j = fluent_b_action_j) it is possible to merge them into one line:

fluentB_i = fluentA_i

Domain specifc Knowledge

In this part we present how to use domain specific knowledge of the RCLL to improve the general encoding of planning problems such that it is manageable to use in real cases.

Determining the correct plan horizon

Following the general principle of bounded model checking (BMC) we would start with a plan horizon of 1 and increase it up to some upper bound.
Fortunately we can use domain knowledge to precompute the exact plan horizon we need to construct a certain product.
We know that for a C0 complexity order we always need 8 macro action which are described in the next Section.
Then for every ring we need at least 2 macro actions plus two times the amount of additional bases because we need 2 macro actions to get a base and put it to the ring station.
Thus, based on the amount of macro actions which have to be executed in all cases and the ones depending on the rings it is possible to precompute the exact plan horizon.

Macro Actions

After running experiments with pure action representing skills (prepare-bs, wp-put, etc.) we concluded that this accuracy yields unsolvable formulas (with respect to solving time) and thus introduced macro actions.
Each macro action encodes a sequence of actions.
This can be obtained by simply using the fluent_a_action_k constants of the first skill k for the fluent_a_macroaction_j constants and the fluent_b_action_l constants of the last skill l for the fluent_b_macroaction_j constants.
With this technique we reduced the total number of actions to 13 covering all complexities.
Before we already had 11 actions for producing a C0 complexity product.

The original pure action representing skills have been:

prepare-bs

get-bs

get-shelf

prepare-cs-for-retrieve

prepare-cs-for-mount

put-cs-with-capcarrier

put-cs-with-subproduct

get-cs-with-capcarrier

get-cs-with-product

prepare-ds

put-ds

A get action indicates taking something from the machine (a base, the cap carrier after retrieving the cap, the final product before delivering).
A put action in contrast encodes putting something inside a machine (a cap carrier, a product with no cap yet, the final product into the delivery station).
The prepare actions for the cap station are split up into two actions because retrieving the cap from a cap carrier and mounting this cap onto a subproduct are significantly different in their effect.

These are the 11 pure actions for a C0 complexity. If we would extend the pure actions to all complexity we would need:

put-rs-additional-base

prepare-rs-ring1

put-rs-ring1

get-rs-ring1

prepare-rs-ring2

put-rs-ring2

get-rs-ring2

prepare-rs-ring3

put-rs-ring3

get-rs-ring3

The main motivation for these macro actions is that the prepare skills have to be executed before the corresponding put/get and thus it is not necessary to reason about when a prepare should be done and which robot should do it.
We are adding domain specific knowledge to
After constructing macro actions we obtain:

get-shelf

operate-cs-retrieve

get-cs

operate-bs-get

operate-cs-mount

operate-ds-put

put-rs-additional-base

operate-rs-put-ring1

operate-rs-get-ring1

operate-rs-put-ring2

operate-rs-get-ring2

operate-rs-put-ring3

operate-rs-get-ring3

The macro actions with the prefix operate indicate that the prepare and put/get skill are combined.
In total we reduce the number of 21 pure actions to 13 macro actions.
Experiments show that before it was possible to obtain a plan for a C0 action within 1 minute.
By use of macro actions this speed is already significantly improved.

Other additional Information

We know that in the starting position R-1 is the first robot to enter the field, followed by R-2 and then R-3.
Thus we fix the partial order that first R-1 has to be used in any plan and if R-3 is about to be used R-3 must have been used before.
This technique is called symmetry breaking as each plan is equivalent to the plans if we replace the robots by some permutation (if we assume that all robots are on the field).
The search space is reduced and thus the solving process is improved.
Moreover because R-1 is the first robot to enter the field, if the plan was already generated the first action can be executed directly. If the first robot would be R-2 or R-3 and the first action of R-1 depends on R-2 or R-3 we would observe unnecessary waiting time of R-1.

Macro actions exist which need to be performed by the same robot.
In this case we can add a constraint representing this information.
Note that this only works if the macro action needs to be performed only once in the entire plan.
We fix that following pairs of macro actions must be executed by the same robot:

get-shelf and operate-cs-retrieve

operate-rs-get-ring1 and operate-rs-put-ring2 or operate-cs-mount depending on the complexity

operate-rs-get-ring2 and operate-rs-put-ring3 or operate-cs-mount depending on the complexity

operate-rs-get-ring3 and operate-cs-mount depending on the complexity

Some actions have to be executed before/after other actions.
Therefor we introduce inter-action-dependencies.

These additional constraints allow us to generate plans for all complexities in less than 20 seconds.
Moreover optimizing plans for C0 complexity products is possible, but not for higher complexities yet.

Translate sequential into concurrent plan

In order to translate the sequential plan offered by the model we make use of domain knowledge similar to the inter-action-dependencies.
The basic idea is to tag every skill with an id and if necessary add ids of actions which happen earlier in the sequential plan which need to be executed before.
Once every action has a list or so-called parent-ids before executing this action the agent has to make sure that all actions indicated by the parent-ids must be already FINAL.

While translating the macro action plan into a concurrent plan of skills we store some additional information which helps us identifying the correct parent-ids.

Within a macro action every skill has to follow the one before.
Further we skills which need to wait for other skills performed by other robots.

wp-get in get-cs.
For the skill wp-get we have to distinguish between what is currently located at the output of the cap station.
If it is the cap carrier without any cap we have to wait for the last skill of macro action operate-cs-retrieve, otherwise the one of macro actionoperate-cs-mount.

move in operate-bs-get.
For the skill move we have to pay attention if the base we are seeking is the one of the final product.
In this case we have to wait for all additional bases required by the first ring to happen before,
because otherwise we would block the base station or the ring station.
Moreover we tell the agent that we will obtain the working piece representing the final product (WP1) and not a work piece representing an additional base (WP2).

move in operate-cs-mount.
Depending on the complexity one has pay attention to the last skill completing the subproduct without cap.

prepare-cs in operate-cs-mount.
The prepare-cs skill depends on the retrieval of the cap from the cap carrier and thus we have to add the id of the last instance of get-cs.
Note that the second instance getting the final product will not bother us because this action is based on get-cs.

move in operate-ds-put.
This action depends on the last skill of the last instance of the macro action get-cs.

move in put-rs-additional-base.
This action depends on the last skill of the instance of the macro action operate-rs-put-ring1/2/3 depending on the location if this ring station has been filled already with up to 3 additional bases (the maximum number of additional bases for a ring station).

move in operate-rs-put-ring1.
This action depends on the first n-many put-rs-additional-base actions with n the amount of additional bases needed for ring1.
We use only a subset of all possible put-rs-additional-base action in order to mount ring1 as soon as possible.

wp-get in operate-rs-get-ring1.
This action depends on the last skill of macro action operate-rs-put-ring1.

move in operate-rs-put-ring2.
This action depends on all put-rs-additional-base actions.
We use the set of all possible put-rs-additional-base action to simplify.
Besides this skill has to wait for the last skill of macro action operate-rs-get-ring1.

wp-get in operate-rs-get-ring2.
This action depends on the last skill of macro action operate-rs-put-ring2.

move in operate-rs-put-ring3.
This action depends on all put-rs-additional-base actions.
We use the set of all possible put-rs-additional-base action to simplify.
Besides this skill has to wait for the last skill of macro action operate-rs-get-ring1.

wp-get in operate-rs-get-ring3.
This action depends on the last skill of macro action operate-rs-put-ring3.