Tip:
Highlight text to annotate it
X
This video is a segway between bad news and good news.
The bad news, which we have now discussed, is NP completeness.
The fact that there are computationally intractable problems out there in the
world. In fact, they are fairly ubiquitous and
you're likely to encounter them in your own projects.
The good new is that NP completeness is hardly a death sentence.
Indeed, our algorithmic tool box is now rich enough to provide many different
strategies toward coping with NP complete problems.
So suppose you have identified a computational problem on which the
success of your new startup company rests.
May be you would spend the last several weeks throwing the kitchen sink at in.
All the algorithm design paradigms you know, all the data structures, all the
primitives, nothing works. Finally, you decide to try to prove the problem is NP
complete, and you succeed. Now you have an explanation for why your
weeks of effort have come to naught, but that doesn't change the fact that this is
the problem that governs the success of your project.
What should you do? Well, the good news is, NP completeness is certainly not a
death sentence. There are people solving, or at least
approximately solving, NP complete problems all the time.
However, knowing that your problem is NP complete does tell you where to set your
expectations. You should not expect some general
purpose, super-fast algorithm, like we have for other computational problems,
like say, sorting, or single source shortest paths.
Unless you are dealing with unusually small, or well structured inputs, you're
going to have to work pretty hard to solve this problem, and also, possibly,
make some compromises. The rest of this course is devoted to
strategies for solving or approximately solving NP- complete problems.
In the rest of the video, I'll give you an orientation for what those strategies
are, and what you can expect to come. So as usual, I'm going to to focus here
on general purpose strategies that cut across multiple application domains.
As usual, these general principles should just be a starting point.
You should take them, and run with them, augmenting them with whatever domain
expertise you have, in the specific problem that you need to solve.
The first strategy, is to focus on computationally tractable special cases,
of an np complete problem. Related, relatedly you want to think
about what's special about your domain or about the data sets that you're working
with, and try to understand if there's special structure which can be exploited
in your algorithm. Let me point out, we've already done this
in a couple of cases in this course. The first example we saw concerns the
weighted independent set. So we started this problem on path graphs
but computational problem makes perfect sense in general graphs.
The general problem is I give you as input, an undirected graph, every vertex
has a weight, and I want the maximum weight subset of vertices that is an
independent set. And remember, in an independent set, you
are forbidden to take any 2 vertices that are neighbors.
So in an independent set, none of the pairs of vertices that you've picked are
joined by an edge. In general graphs, the way to do an
independent set problem is NP complete, so we certainly don't expect it to have a
polynomial time algorithm. But, in the special case where the graph
is a path, as we saw, there's a linear time, dynamic programming algorithm, that
exactly solves the weight of the independent set problem.
So path graphs form a special case of the weight of the weighted independent set
problem that's computationally tractable, solvable in polynomial, even linear,
time. In fact, the frontier of tractability can
be pushed well beyond path graphs. On the homework, I asked you to think
through the case of graphs that are trees, and notice that you could still do
dynamic programming efficiently, to be weighted independent sets and trees.
You can even get computationally efficient algorithms for a broader class
of graphs, known as bounded tree width graphs.
So the definition of that is a little outside the scope of this course, but you
can go even beyond trees. So the second example follows from my
dynamic programming algorithm for the Knapsack problem, so we discussed that
running time and we explain why it's exponential time.
The running time of our dynamic programming Knapsack algorithm is N, the
number of items times capital W, the Knapsack capacity.
And because it only takes log W bits to specify the capacity capital W, we don't
call that a polynomial time algorithm. But, imagine you only have to solve a
knapsack instance where the capacity is not too big, maybe even say that capacity
capital W is Big O event, and you definitely will see knapsack instances in
practice, which possess that property. Well then our dynamic programming
algorithm just runs in time, O(n^2), and that's a bonafide polynomial time
algorithm for this special case of a small knapsack capacity.
So next, let me mention a couple of examples we're going to see in the
forthcoming videos. The first one is going to concern the
2-SAT Problem. The 2-SAT is a type of constraint
satisfaction problem. You remember, in a constraint
satisfaction problem, you have a bunch of variables, each one gets assigned a
value. So the simplest case is the Boolean case, where each variable can be
0 or 1, false or true, and then you have a bunch of clauses, which specify the
permitted joint values of a collection of variables.
The 2 in 2-SAT refers to the fact that each constraint involves the joint values
of only a pair of variables. So, a canonical constraint in a two set
instance, is going to for two variables, specify three joint assignments that are
allowed and one that's forbidden. So for example may be it will say offer
variables x3 and x7, it's okay to set them both to true, its okay to set them
both to false, its okay to set 3 to true and 7 to false, but it's forbidden to set
3 to false and 7 to true. So that's a canonical constraint in a
2-SAT instance. 3-SAT, it's the same thing, except the constraints involve the
joint values to a triple of variables, and it's going to forbidden one out of
the eight possibilities. Now the 3 set problems are a conanicle NP
complete problem. That was really singled out by Cook and
Levin as being sufficiently expressive to encode all problems in NP.
But, if each constraints had size only two, then, as we'll see, the problem
becomes polynomial times solvable. There's a couple of ways of proving that.
We're going to talk about a local search algorithm that checks whether or not
there is indeed an assignment to the, variables that simultaneously satisfies
all of the given constraints. So the final example, we'll discuss in
more detail later, but just very briefly, we're going to discuss the vertex cover
problem. This is a graph problem,
and the vertex cover is just a complement of an independent set.
So while an independent set cannot take two vertices from the same edge, in the
vertex cover problem, you have to take at least one vertex from, every single edge.
And then what you want is you want the vertex cover that minimizes the sum of
the vertex weights. Yet again, this is an NP complete problem
in general, but we're going to focus on the special case where the optimal
solution is small. That is, we're going to focus on graphs,
where there's a small number of vertices, such that every single edge has at least
one N point in that small set, and we see that for that special case, using a smart
kind of exhaustive search we'll actually be able to solve the problem in
polynomial time. So let me reiterate that these tractable
special cases are meant primarily to be building blocks, upon which you can draw
in a possibly more sophisticated approach to your NP complete problem.
So just to make this a little more concrete, let me just kind of dream up
one scenario to let you know what I am thinking about.
Imagine, for example, that you have a project where unfortunately it's not
really 2-SAT that you're confronting, it's actually a 3-SAT instance. So you're
feeling kind of bummed, 3-SAT is NP complete, and maybe you have 1000
variables, and certainly you can't do brute force search over the 2 to the
1,000 possible ways of assigning values to your 1000 variables.
But, maybe the good news is, because you have domain expertise, because you
understand this problem instance, you know that, yeah,
there's 1,000 variables, but there's really 20 that are crucial.
You have a feeling, that all of the action, basically, is boiling down to how
these 20 core variables get assigned. Well now, maybe you can mix together some
brute force search with some of these tractable special cases.
For example, you can ennumerate over all 2 to the 20 ways of assigning values to
this core set of 20 variables. 2 to the 20 is roughly a million, that's
not so bad. And now, what you're going to do is, for
each of these million scenarios, you check whether there's a way to extend
that tentative assignment of 20 values to the 20 variables, to the other 980
variables, so that all of the constraints get satisfied.
The original problem is solvable if and only if there exists a way of assigning
values to these 20 variables that successfully extends to the other 980.
Now, because these are the crucial variables and it's where all the action
is, maybe as soon as you assign all of them, 0's and 1's the residual SAT
instance is tractable. For example, maybe it just becomes a
simple 2-SAT instance, and then you can solve it in polynomial time.
So this gives you a hybrid appoach, approach.
Brute force search at the top level, tractable special cases for each guess of
assignment of the 20 variables, and you're off to the races.
And I hope it's clear, I mean this as just one possible way that you might
combine the various building blocks we're developing into a more elaborate approach
to tackling NP complete problem. And that's generally what they take, they
take a fairly elaborate approach, because, after all, they are NP complete,
you've gotta respect that. So with that digression complete, let me
mention what are the other two strategies we're going to be exploring in the
lectures to come. So the second strategy, which is
certainly one very common in practice, is to resort to heuristics.
That is, two algorithms, which are not guaranteed to be correct.
We haven't really seen examples of heuristics in the course thus far. those
of you that are alumni of part 1, perhaps we could classify Carger's randomized
minimum cut algorithm as a heuristic, because it did have a small failure
probability of failing to find, the minimum cut.
But rather, I'm going to focus on some examples in the upcoming lectures.
I'm going to use the. Knapsack problem as a case study,
and what we'll see, is that our toolbox, which contains various algorithm design
paradigms, it's useful not just for designing correct algorithms, but it's
also useful for designing heuristics. So in particular, we'll get a pretty good
algorithm for the Knapsack problem, using the greedy algorithm design paradigm,
and we'll get an excellent Heuristic for Knapsack, using the dynamic programming
algorithm design pardigm. The final strategy, is for situations
where you are unwilling to relax correctness.
You're unwilling to consider heuristics. Now, of course, for an NP complete
problem, if you're always going to be correct, you're not, you don't expect to
run in polynomial time, but there are still opportunities to have algorithms
that, while exponential time in the worst case, are smarter than naive brute force
search. So we have in fact already seen one
example that can be interpreted as a implementation of this strategy, that's
for the knapsack problem. So, in the knapsack problem, naive brute
force search, would just run over all possible subsets of the items.
It would check if a subset of items fit in the knapsack.
If it does, it remembers the value, and then it just returns the feasible
solution. with maximum value.
That has time proportional to 2 to the n, where n is the number of items.
Our dynamic programming algorithm, has running time n times W.
Now, of course, cap- this is no better than 2 to the n, if the knapsack capacity
is huge, if it is itself, 2 to the n. But, as we argued, if W is smaller, this
algorithm is going to be faster. And also, as you learned on the third
programming assignment, sometimes even though W is big, dynamic programming's
going to beat the crap out of brute force search.
So I'll show you a couple of other examples.
We'll talk about the travelling salesman problem, where naive brute force search
would roughtly take n factorial time, where n is the number of vertices.
We'll give an alternative dynamic programming base solution which runs in
time only 2 to the n, which is much better than n factorial.
The third example which will cover in a forthcoming video, we already alluded to
briefly on the last slide. It's for the vertex cover problem.
So this is where you're given a graph, every vertex has a weight, you want the
minimum weight subset of vertices that includes at least one endpoint from every
edge. We're going to consider the version of
the problem where you want to check whether or not it's possible to have a
vertex cover that uses only k vertices, whether or not there exists k vertices
that includes one endpoint at least, from each edge.
The naive brute force search will run in time, end the k, which gets absurd, even
when k is quite small, but we're going to show that there's a
smarter algorithm, which is still exponential in k that runs in time only 2
to the k times the size of the graph.