Tip:
Highlight text to annotate it
X
Before we finish with plan-space search, I want to introduce another refinement to
the generic algorithm we've just seen. This is the algorithm as it was
implemented in the UCPOP planner that was developed about the early 90s,' if I
remember correctly. In the algorithm as we've seen it so far,
it considers two types of flaws, unachieved goals and threats. And every
time we go through our main loop, one of these flaws is removed from the plan and
potentially others introduced. The UCPOP planner only considers
unachieved goals and deals with threads as part of the main loop.
So, here is the data flow of the PSP algorithm as we've seen it so far.
We start off with our initial plan, pi 0, that plan consists of actions, orderings,
bindings, and causal links. Then, what we do is we compute the
threats and the open goals which are the flaws in our over all plan.
Then, if we find there is no flaw in our plan, we can return this plan as a
solution plan. However, if there are flaws in the plan,
then we got to select one of these flaws, compute the resolvers.
If there are no resolvers for the selected flaw, then we can return failure
because this plan can never be refined into a solution plan.
If there is a resolver, then we've got to choose one of the resolvers that are
possible, apply those resolver. And while we apply the resolver, we've
got to retain our ordering constraints and our binding constraints such that
variable bindings and orders are always consistent.
Then, we go back in our loop and start with the beginning with the partial plan
that we consider. The UC pop planner is different and that
it does not deal with the computation and handling of threats at this stage in the
algorithm. In fact, threats are moved to a different
part. Namely, after the open goals are dealt
with. The algorithim will deal with all the
threats that are caused by the resolver for the open goal flaw.
And this may cause a little loop here, namely one where all the threats are
dealt With. The input to the PoP planner or to the
UCPOP planner to be more specific, looks slightly different from what we've seen
so far. We have an extended input that takes a
partial plan as input, that is the same initial plan we've seen before.
And the second component it takes as input is an agenda, an agenda of things
that still need to be done. Abd this is the set of pairs of actions
and preconditions of these actions. So, these are all the open goal flaws
that exist in the current partial plan. Initially, that is, of course, all the
preconditions of the goal dummy action. The agenda is simply the way to avoid
recomputing the open goals every time we go through the loop.
Where the loop here is implemented as a recursive function.
Search control of the algorithm is then by flaw type.
That means, we have the unachieved sub-goals as part of our agenda, and
every time we go through our algorithm, we go through the loop.
We select one of the unachieved sub-goals and remove that as a flaw from our
partial plan, potentially introducing others.
Threats, on the other hand, are resolved as part of the successor generation.
So, internally to the algorithm, we remove all the threats so that when we
generate a new search node, that will have no threats in it.
Finally, we get to the pseudo code for our partial order planner.
This is essentially the algorithm that is implemented by the UCPOP planning system.
Unfortunately, the algorithm is a little more complex than the algortithms we've
seen previously, so I have to distribute it over two slides.
So, this is the first part of the algorithm.
And what we define here, is a function PoP, which implements our partial order
planner. And this function takes two arguments.
Firstly, a partial order plan, which initially is our initial plan pi 0.
And it takes an agenda, which consists of all the unsatisfied subgoals in our
current plan. In our initial plan, these were only the
preconditions of the goal dummy action. Then, the algorithm begins by testing
whether the agenda is empty. So, if there are no more unsatisfied
preconditions, that means our plan contains no more
flaws. And that means it's a solution plan for
our planning problem, and we can return this plan as a
solution. However, if there are still items left on
the agenda, then we need to select one of these as the flaw we want to work on
next. And the selection is deterministic choice
point, which means we don't need to backtrack over it.
The reason is, again, that we need to work on all agenda items eventually and
it's only important for efficiency in which order we do that but not for
completeness. So, now we've chosen a pair, which is an
agenda entry, consistent of an action ag, and the precondition pg.
The next step, simply removes the chosen entry from our agenda.
Then, we have to compute all the providers for our chosen precondition pg.
A provider, of course, can be any action that has an effect that can unify with
pg. This could be an action that is already
in the plan or it could be a new instance of an operator that we introduce into the
plan. So this gives us our set of relevant
actions. If this set is empty,
if there are no relevant actions, then there is no way to resolve this flaw,
which means we can return failure. So, what we do at this point is go up in
our search tree to the nearest backtrack point and try an alternative branch.
Now, if our set of relevant actions wasn't empty, then we can do this now, we
can choose one. This is a nondeterministic choice point,
which means here we set up a backtrack point where we branch in our search tree.
And if any of those branches returns failure, we simply go back and explore
the next branch. Choosing a resolver gives us three
things. Firstly, it gives us an action that we've
chosen as the provider. It gives us a proposition that is one of
its effects that unifies with our unachieved sub-goal, and the substitution
sigma is the substitution that makes the unachieved sub-goal and the effect the
same. Given our chosen resolver, we can now
start to refine the plan accordingly. And the first thing we do is we add a causal
link and the new causal link links our chosen provider, ap, to our consumer, ag.
And the proposition that we are trying to protect is sigma of pg, the instantiated
goal proposition. Note that this is the same as sigma of pp
as sigma unifies the two. And in addition to this causal link, we
also need to add new bindings to our variable binding constraints.
And those are exactly those bindings that are defined by the substitution sigma.
This is the first part of the algorithm, and the next part follows in the next
slide. So far, we have added cause of links and
variable bindings to our partial plan as necessary.
The next thing, the next component we want to modify in our partial plan is the
set of actions. And this depends on whether our chosen
provider, ap, was already an action or plan, or whether it was a new operator
instance that we introduced into the plan.
If it is a new instance of an operator, so if it was not an element of our plan,
then what we have to do is simply add that new action to our plan.
and we also have to add to the agenda all it's preconditions as unachieved
sub-goals. If our chosen provider was already part
of the plan, then we don't need to modify the set of actions of our partial plan.
In the next step, we simply generate a copy of our plan that we can modify in
the loop that follows. As I've explained earlier, the
introduction of new actions or new causal links into a partial plan may also
introduce new threats. And this is what we'll deal with next.
So, what we have here is a loop over all potentially new threats.
These are threats on our newly introduced causal link, or threats that are due to
the new action. Of course, if this was an existing
action, this cannot introduce new threats into the plan.
But we can always have threats on our new causal link.
So, for each of these threats, we have to compute the set of all possible resolvers
for the threat in our current plan. And if there are no resolvers for one of
those threats, that means we can return failure because there's no way or
removing that flaw from the plan. Again,
returning failure here means going to the next backtrack point and trying a
different branch in the search tree. If there are resolvers, then we need to
choose one of them. And again, this is a non deterministic
choice point, which means we may have to back track to
this point. If we fail in one of the branches, we go
back and try a different branch until we reach one where we succeed.
And then, we take our resolver that we've just chosen and apply that to our current
plan. So, we take the new plan and refine it according to that resolver, which
gives us a modified new plan. And then, we can simply do the recursive
call. So, we call our procedure PoP with our
new plan and our modified agenda. And, of course, our new plan only
contains flaws of the type unacheived precondition as all the threats have just
been resolved in the loop we've just gone through. And the agenda contains all
those unacheived preconditions. And that concludes our partial order of
planning algortihm, and it almost concludes this segment.
Now, before we finish with plan-space planning, here is a quick overview of how
plan-space planning compares to state-space planning.
Firstly, in state-space planning, we are dealing with a finite search space.
In each state, we only have an finite number of objects, we have a finite
number of relations. Which means, there's a finite number of
atoms that can relate the different objects to each other.
And that means there's only a finite number of states that we have in our
search space. In plan-space planning, on the other
hand, are search spaces potentially infinite.
You can easily see a plan can have potentially infinite length so there must
be an infinite number of plans in our search space.
But, of course, what matters in practice is the part of the search space that is
generated or that is explored before a solution plan is found.
And also, in state-space planning, our search space is usually a graph but what
we're searching is a tree, and the tree is infinite.
But, what we do have in state-space planning is an explicit representation of
the intermediate states that we're going through on our way to the goal.
In plan-space planning, during the planning process,
we don't know which intermediate states we will encounter.
This is quite an important distinction, as we will see later in the course,
because most of the modern heuristics that are used during planning rely on an
explicit representations of intermediate states that we are searching.
So, the heuristics are defined on states, on world states.
Next, in state-space planning, the ordering of the actions really reflects
the search control strategy that we, we've been using.
Whereas in plan-space planning the choice and the organization that we have in our
plan is independent of our search strategy.
So, in state-space planning, we consider one action after another, and the
ordering in which we consider them is also the ordering in which they end up in
our plan, even if there is no ordering constraint necessary between these
actions. In plan-space planning, we are much more
flexible. In fact, we have seen that in partial
plans, two actions in the plan don't necessarily need to be ordered with
respect to each other. They can be in parallel.
Also, in plan-space planning, we have an explicit representation of the rationale
why actions are in the plan that is given by the causal links.
There is no such structure in state-space planning.
We only have the sequence of actions that we need to execute.
This is quite important for the execution of the plan, namely when things go wrong.
If we then have the explicit knowledge of why we did an action, then it is much
easier to fix the current plan. And finally, in state-space planning the
search nodes we consider are relatively simple. That is, each search node is a
set of ground atoms, and the successors were relatively easy to compute.
I say relatively easy because there may still be an exponential number of
applicable actions, of course. But in practice, this is rarely the case.
In plan-space search, on the other hand, the search nodes are quite complex.
Meaning that we have constrained networks that we need to maintain as part of our
refinement process. So that means successors are potentially
expensive to compute. Overall, one can say that plan-space
planning was a big step forward when it was introduced.
But nowadays, due to efficient heuristics state-space planning is the more
eddicient way for searching for solutions.
This concludes the segment on plan-space search.
In this approach, we have introduced a new search space in which search nodes
are partial plans, and steps in our search space are planned
refinement operations. This is, of course, completely different
from the search-space we've considered during state-space search.
That means we also had to define a new initial state and a new goal test.
And what we did in our PSP algorithm was search for a plan that has no flaws.
Where a flaw can be either an unachieved sub-goal or a threat.
And finally, we've looked at a refinement, of this generic algorithm
that was implemented in the UCPOP planner where the only flaw we considered were
unachieved sub-goals, and threats were dealt with as part of the node generation
process. And that's all I was going to tell you on
partial order planning.