15.6 - Algorithmic approaches to np complete problems - Np complete problems by tim roughgarden

This video is a segway between bad news and good news. The bad news, which we have now discussed, is NP completeness. The fact that there are computationally intractable problems out there in the world. In fact, they are fairly ubiquitous and you're likely to encounter them in your own projects. The good new is that NP completeness is hardly a death sentence. Indeed, our algorithmic tool box is now rich enough to provide many different strategies toward coping with NP complete problems. So suppose you have identified a computational problem on which the success of your new startup company rests. May be you would spend the last several weeks throwing the kitchen sink at in. All the algorithm design paradigms you know, all the data structures, all the primitives, nothing works. Finally, you decide to try to prove the problem is NP complete, and you succeed. Now you have an explanation for why your weeks of effort have come to naught, but that doesn't change the fact that this is the problem that governs the success of your project. What should you do? Well, the good news is, NP completeness is certainly not a death sentence. There are people solving, or at least approximately solving, NP complete problems all the time. However, knowing that your problem is NP complete does tell you where to set your expectations. You should not expect some general purpose, super-fast algorithm, like we have for other computational problems, like say, sorting, or single source shortest paths. Unless you are dealing with unusually small, or well structured inputs, you're going to have to work pretty hard to solve this problem, and also, possibly, make some compromises. The rest of this course is devoted to strategies for solving or approximately solving NP- complete problems. In the rest of the video, I'll give you an orientation for what those strategies are, and what you can expect to come. So as usual, I'm going to to focus here on general purpose strategies that cut across multiple application domains. As usual, these general principles should just be a starting point. You should take them, and run with them, augmenting them with whatever domain expertise you have, in the specific problem that you need to solve. The first strategy, is to focus on computationally tractable special cases, of an np complete problem. Related, relatedly you want to think about what's special about your domain or about the data sets that you're working with, and try to understand if there's special structure which can be exploited in your algorithm. Let me point out, we've already done this in a couple of cases in this course. The first example we saw concerns the weighted independent set. So we started this problem on path graphs but computational problem makes perfect sense in general graphs. The general problem is I give you as input, an undirected graph, every vertex has a weight, and I want the maximum weight subset of vertices that is an independent set. And remember, in an independent set, you are forbidden to take any 2 vertices that are neighbors. So in an independent set, none of the pairs of vertices that you've picked are joined by an edge. In general graphs, the way to do an independent set problem is NP complete, so we certainly don't expect it to have a polynomial time algorithm. But, in the special case where the graph is a path, as we saw, there's a linear time, dynamic programming algorithm, that exactly solves the weight of the independent set problem. So path graphs form a special case of the weight of the weighted independent set problem that's computationally tractable, solvable in polynomial, even linear, time. In fact, the frontier of tractability can be pushed well beyond path graphs. On the homework, I asked you to think through the case of graphs that are trees, and notice that you could still do dynamic programming efficiently, to be weighted independent sets and trees. You can even get computationally efficient algorithms for a broader class of graphs, known as bounded tree width graphs. So the definition of that is a little outside the scope of this course, but you can go even beyond trees. So the second example follows from my dynamic programming algorithm for the Knapsack problem, so we discussed that running time and we explain why it's exponential time. The running time of our dynamic programming Knapsack algorithm is N, the number of items times capital W, the Knapsack capacity. And because it only takes log W bits to specify the capacity capital W, we don't call that a polynomial time algorithm. But, imagine you only have to solve a knapsack instance where the capacity is not too big, maybe even say that capacity capital W is Big O event, and you definitely will see knapsack instances in practice, which possess that property. Well then our dynamic programming algorithm just runs in time, O(n^2), and that's a bonafide polynomial time algorithm for this special case of a small knapsack capacity. So next, let me mention a couple of examples we're going to see in the forthcoming videos. The first one is going to concern the 2-SAT Problem. The 2-SAT is a type of constraint satisfaction problem. You remember, in a constraint satisfaction problem, you have a bunch of variables, each one gets assigned a value. So the simplest case is the Boolean case, where each variable can be 0 or 1, false or true, and then you have a bunch of clauses, which specify the permitted joint values of a collection of variables. The 2 in 2-SAT refers to the fact that each constraint involves the joint values of only a pair of variables. So, a canonical constraint in a two set instance, is going to for two variables, specify three joint assignments that are allowed and one that's forbidden. So for example may be it will say offer variables x3 and x7, it's okay to set them both to true, its okay to set them both to false, its okay to set 3 to true and 7 to false, but it's forbidden to set 3 to false and 7 to true. So that's a canonical constraint in a 2-SAT instance. 3-SAT, it's the same thing, except the constraints involve the joint values to a triple of variables, and it's going to forbidden one out of the eight possibilities. Now the 3 set problems are a conanicle NP complete problem. That was really singled out by Cook and Levin as being sufficiently expressive to encode all problems in NP. But, if each constraints had size only two, then, as we'll see, the problem becomes polynomial times solvable. There's a couple of ways of proving that. We're going to talk about a local search algorithm that checks whether or not there is indeed an assignment to the, variables that simultaneously satisfies all of the given constraints. So the final example, we'll discuss in more detail later, but just very briefly, we're going to discuss the vertex cover problem. This is a graph problem, and the vertex cover is just a complement of an independent set. So while an independent set cannot take two vertices from the same edge, in the vertex cover problem, you have to take at least one vertex from, every single edge. And then what you want is you want the vertex cover that minimizes the sum of the vertex weights. Yet again, this is an NP complete problem in general, but we're going to focus on the special case where the optimal solution is small. That is, we're going to focus on graphs, where there's a small number of vertices, such that every single edge has at least one N point in that small set, and we see that for that special case, using a smart kind of exhaustive search we'll actually be able to solve the problem in polynomial time. So let me reiterate that these tractable special cases are meant primarily to be building blocks, upon which you can draw in a possibly more sophisticated approach to your NP complete problem. So just to make this a little more concrete, let me just kind of dream up one scenario to let you know what I am thinking about. Imagine, for example, that you have a project where unfortunately it's not really 2-SAT that you're confronting, it's actually a 3-SAT instance. So you're feeling kind of bummed, 3-SAT is NP complete, and maybe you have 1000 variables, and certainly you can't do brute force search over the 2 to the 1,000 possible ways of assigning values to your 1000 variables. But, maybe the good news is, because you have domain expertise, because you understand this problem instance, you know that, yeah, there's 1,000 variables, but there's really 20 that are crucial. You have a feeling, that all of the action, basically, is boiling down to how these 20 core variables get assigned. Well now, maybe you can mix together some brute force search with some of these tractable special cases. For example, you can ennumerate over all 2 to the 20 ways of assigning values to this core set of 20 variables. 2 to the 20 is roughly a million, that's not so bad. And now, what you're going to do is, for each of these million scenarios, you check whether there's a way to extend that tentative assignment of 20 values to the 20 variables, to the other 980 variables, so that all of the constraints get satisfied. The original problem is solvable if and only if there exists a way of assigning values to these 20 variables that successfully extends to the other 980. Now, because these are the crucial variables and it's where all the action is, maybe as soon as you assign all of them, 0's and 1's the residual SAT instance is tractable. For example, maybe it just becomes a simple 2-SAT instance, and then you can solve it in polynomial time. So this gives you a hybrid appoach, approach. Brute force search at the top level, tractable special cases for each guess of assignment of the 20 variables, and you're off to the races. And I hope it's clear, I mean this as just one possible way that you might combine the various building blocks we're developing into a more elaborate approach to tackling NP complete problem. And that's generally what they take, they take a fairly elaborate approach, because, after all, they are NP complete, you've gotta respect that. So with that digression complete, let me mention what are the other two strategies we're going to be exploring in the lectures to come. So the second strategy, which is certainly one very common in practice, is to resort to heuristics. That is, two algorithms, which are not guaranteed to be correct. We haven't really seen examples of heuristics in the course thus far. those of you that are alumni of part 1, perhaps we could classify Carger's randomized minimum cut algorithm as a heuristic, because it did have a small failure probability of failing to find, the minimum cut. But rather, I'm going to focus on some examples in the upcoming lectures. I'm going to use the. Knapsack problem as a case study, and what we'll see, is that our toolbox, which contains various algorithm design paradigms, it's useful not just for designing correct algorithms, but it's also useful for designing heuristics. So in particular, we'll get a pretty good algorithm for the Knapsack problem, using the greedy algorithm design paradigm, and we'll get an excellent Heuristic for Knapsack, using the dynamic programming algorithm design pardigm. The final strategy, is for situations where you are unwilling to relax correctness. You're unwilling to consider heuristics. Now, of course, for an NP complete problem, if you're always going to be correct, you're not, you don't expect to run in polynomial time, but there are still opportunities to have algorithms that, while exponential time in the worst case, are smarter than naive brute force search. So we have in fact already seen one example that can be interpreted as a implementation of this strategy, that's for the knapsack problem. So, in the knapsack problem, naive brute force search, would just run over all possible subsets of the items. It would check if a subset of items fit in the knapsack. If it does, it remembers the value, and then it just returns the feasible solution. with maximum value. That has time proportional to 2 to the n, where n is the number of items. Our dynamic programming algorithm, has running time n times W. Now, of course, cap- this is no better than 2 to the n, if the knapsack capacity is huge, if it is itself, 2 to the n. But, as we argued, if W is smaller, this algorithm is going to be faster. And also, as you learned on the third programming assignment, sometimes even though W is big, dynamic programming's going to beat the crap out of brute force search. So I'll show you a couple of other examples. We'll talk about the travelling salesman problem, where naive brute force search would roughtly take n factorial time, where n is the number of vertices. We'll give an alternative dynamic programming base solution which runs in time only 2 to the n, which is much better than n factorial. The third example which will cover in a forthcoming video, we already alluded to briefly on the last slide. It's for the vertex cover problem. So this is where you're given a graph, every vertex has a weight, you want the minimum weight subset of vertices that includes at least one endpoint from every edge. We're going to consider the version of the problem where you want to check whether or not it's possible to have a vertex cover that uses only k vertices, whether or not there exists k vertices that includes one endpoint at least, from each edge. The naive brute force search will run in time, end the k, which gets absurd, even when k is quite small, but we're going to show that there's a smarter algorithm, which is still exponential in k that runs in time only 2 to the k times the size of the graph.