Algorithms - Greedy algorithm - 01 - Introduction - 01 - Arabic - English subtitles

Peace be upon you Today we'll talk about another technique an optimization technique like Dynamic Programming, and strongly related to it This technique is called Greedy Algorithm or Greedy Technique Quickly, let's remember what Dynamic Programming is, to figure out with what Greedy is Dynamic Programming is also optimization: minimization or maximization, according to the problem in hand What does it minimize or maximize over? - Over all the possible solutions we have So, we always had: a recursive state, this state is a root from which substates come out These substates may include child1, child2, child3, .. etc e.g., in the Longest-Increasing-Subsequence Problem, we had 2 choices, either to leave the item currently existing and recurse with the next one, or take it and recurse with the next one So, I had 2 choices, right? And supposedly, we didn't know what the optimal solution is, so we were just trying everything, leave the item or take the item, leave, or take, leave or take .. etc and the number of states was exponential, so we would memorize, .. etc Greedy refers to the English word "greedy", greedy in the sense that it wants to make something called "Greedy Choice" What is Greedy Choice about? .. In the above example, you have several substates you choose one from Greedy Choice, on the contrary, is about deciding "somehow" that my choice is childX E.g., assume an example where each state had 3 children Greedy Choice would say, for example, the second child (state) is the optimal then after going to the state of the second child, [it'd choose] the first child is the optimal then after going to the first child, it'd choose, for example, the first is the optimal, .. etc Each time it makes one Greedy Choice So, DP considers all choices, while Greedy considers all choices, but chooses only one choice to continue with DP recurses on each choice and comes back, trying everything but Greedy has just only one way Greedy has two kinds: one that finds a Global Optimal Solution, and one that does not The one that does not is called approximate Finding the global optimal solutions means that it's like DP. DP considers everything and tells you the optimal of all If Greedy makes one Greedy Choice per step. Until it reaches the base case, and its answer would be then proved to the optimal, then this is the Global Optimal Solution The other one is called approximate In contests, only the Global Optimal Solution kind is the one that matters because, assume the problem wants the minimum answer. So, if you solution produces 54 instead of 50 the judge would only want 54 So, in contests we'd only care about the Global-Optimal one. But in real life, both are important, and the approximate is even more as many real-life problems are hard, so we only try to find a close-enough approximate solution You'll find people talking about intuition. We'll talk about "intuition" and "proof" It's about having a hunch of which state is the answer More info on the topic: Greedy Choice makes the choice that "looks the best at the moment" That's why they call its choice "Local Optimal Choice" Standing at some state, it considers some stuff and chooses according to them As we mentioned before, if the collection of Local Optimal Choices is proved to be global, then we have our solution You must have proof, in case globally optimal ... That is, if you claim your solution using Greedy, the substate you choose each time, is optimal you must have proof for that, especially in a contest While we're in a contest, mostly time is strict. So, what most contestants try to do is to find an intuition to have a strong observation, to have a strong sense that the solution is X, and does not prove Instead, would generate several examples to challenge his idea After contest, many contestants don't care to know the proof of the solutions It's VERY IMPORTANT to think after contest why your greedy was correct To be able to prove, you need to use a proving technique. There're different proof techniques Just as you want to be good in programming, you need to be good as well in proving techniques Spare some time of your life to study different proof techniques Although Greedy is very related to DP solutions (there's usually both a DP and a Greedy solution to the problem), in many cases Greedy is the only solution, and DP won't work because state of the DP is [Incomprehensible], so it's worthless and you have to think Greedy Today, we'll give two examples to illustrate the topic One of the most famous classical examples taught in textbooks is called activity selection problem Someones has a number of activities, and each activity starts at some time and ends at some time and they do just one activity at any moment So, there's an activity starting @ 2 ending @ 4.5, and another starts @ 5 ending @ 7 .. etc So, these activities overlap. So, they can't do all of them So, they want to choose a subset of these activities such that: 1- There's no overlap. 2- The number of elements in the subset is as big as possible Last info: An activity can start at the end of another activity That is, for example, an activity that starts @ 10 and ends @ 15 can start after an activity that started @ 7 and ended @ 10 But, an activity that @ starts @ 10, can start after an activity that ended @ 11, ok? This problem has a naive DP solution, just like any DP problem we solve When you think, you'll find this is clearly a subset-style one That is, any activity is after all either 0 or 1, just like previous problems So, "Take-or-leave" idea is just applied to every activity But, you have a little problem: Now, if I pick the first activity: Previously, we used to go to the next Now, you have 2 problems you should notice: Assume the first activity was 10-11, and the second one was 6-7, and third was 9-12 You need to sort these activities to be able to deal with. This's the first thing You may sort them according to their start time, or alternatively you can sort them according to their finish time The 2nd problem: Previously, in Knapsack and such, we would pick -> go to next, then leave -> go to next Now, if I leave, I can go to the next one easily. But if I pick the current state, can I go to the next one (then pick or leave it, .. etc)? No, wait a second. You may be intersecting with the next one So, the correct thing is not to go to the next right away The correct thing is that you'll need to loop to pick the next one with which you don't overlap So, it's a pick-or-leave, just 3-line code, you either take the state or leave it but if you take it, you'll iterate a little to find the next event to start with Of course, this is possible as you'd sorted the activities in the beginning Now, we had 2 choice: pick or leave The question is: Can I, in another way, choose an activity (I have n activities), and say this is part of the optimal solution, remove it, and then have n - 1 activities to solve as an independent subproblem? That is, MAKE ONE CHOICE. Can I do that? If I can, then I've found the Greedy Solution Let's start thinking about that First thing one may think about: start to get an intuition, try to guess the solution So, first solution one may think about: what about counting with each activity how many activities intersect with it? So, for example, if I (an activity) intersect with 2 activities. So, if you take me, there're 2 activities you won't be able to take, and so on. So, we'll do this move, and start picking the activities according to the activity that has the lowest number of intersections Then, when I take this activity, I'll remove the activities that intersected with it and recurse with the rest So, that was a nice idea. But, is it correct? The answer is No Look at this example In this photo, each line represents an activity with a starting time and ending time The correct solution is these 4 blue things Now, what activities have the smallest number of intersections in these? The red one intersects with 2 only If you take the red one as the first activity, you'll need to remove the 2 upper blue ones And the optimal then would be the activity here and the activity here So, your solution consists of 3, while the correct one consists of 4 So, our idea is wrong Let's think of another intuition What about trying to make a selection such that the number of the remaining activities is as big as possible? However, the idea of choosing an activity that intersects with smallest number of activities does so, doesn't it? Yes, it does. But I'll do it in another way I'll try to pick an activity. Let's say for example, I'll sort them according to start time So, the activity in the left, if I pick it, it may help me a little The answer: This idea is wrong. You're telling me to sort the activities according to the start time, right? Imagine for instance, that the first activity starts the earliest and ends the latest So, for example, it starts @ 1 and ends @ 12 such that if I do this activity, I won't be able to do anything else So, that's a total failure But, wait! what about sorting them according to finish time and not start time? Then what? Then, pick the activity that finishes the earliest That is, pick the first-ending activity, remove the activities intersecting with it I'm currently trying to do something we call Boundary Thinking trying to get an activity from one of the sides (the beginning or the end), then solve for the rest The idea makes sense. If you try it on several test cases, you'll find it working However, you'll still be worried whether that solution is correct or no, and you can't know As I told you, if we're in a contest, just try it on several test cases. If it works, code it Otherwise, from inside of your idea, you try to get the proof of it Let me show you the proof of this idea to realize the importance of proofs I started with this example as the solution makes sense, but you'd still be afraid of your solution Let's assume, that these activities are sorted by their finish time, and then numbered 1 to n So, activity 1 is the one that finishes the earliest, then 2, .. etc Let's assume we have a subset A of these activities and they're the optimal solution And A is sorted according to finish time Now, we're claiming that the first activity in A would be activity 1 and we'd still be able to get the optimal solution. That's we said before, right? That after we sort according to finish time, the activity you'll pick is activity 1, then remove the overlapping ones with it and recurse over the new subproblem? So, from your claim, if you have a set of activities number 1 to n, and you chose a subset from them, the first activity of this subset would be 1, right? Now, let's see how we'll prove it, and try to focus as these techniques are [Incomprehensible] general techniques Let's assume that the first activity is called K Now, you're assuming that subset A is optimal subset So, I'm saying let's consider the first item in it, as my proof or my sense depended on the idea of the first one K has 2 possible values: it either be 1. So, it's the first activity, and in this case that'd be as our claim And it either be something other than 1 So, the first item is either 1 or NOT 1. If it's 1, then I'm correct. If the first time is NOT 1, I wanna prove that I can remove the first item of subset A, and replace it with 1 without having overlap in A And if no overlapping occurs, then I've managed to make a new subset whose length is the same as the optimal subset I have Now we're examining the possibility that K != 1 Let's make another subset called B, in which we'll remove the item K and replace it with 1 The question: Now, would there be overlapping in subset B, and so we have to remove 1? The answer is: Impossible. There can be no overlapping because we've agreed that item 1 has the smallest finish time, and so its finish time is smaller than the finish time of the first item in A and since it's smaller than it, the activities in B are also disjoint, no two will overlap in it So, since the elements in B are disjoint, AND you've just removed an item from A and replaced it with an item, then B is the optimal solution So, the proof is all about: If we have an optimal subset, the first item of it is either the smallest one, or not the smallest one but I can remove it and replace it with the smallest one and still no overlap occurs. So, we have an optimal solution So, this proof makes you sure that you solution is surely correct I've explained the proof, but usually I won't be explaining proofs as much as I'll be talking about the intuition to increase it for you But you MUST get used to proofs When you'll start reading proofs, you'll find it boring, cumbersome, you'll be sick of it. All of this is normal especially for people with Computer-Science background, as proofs are not easy for them as for Mathematics but keep reading the proof, understand it, increase your proving techniques, and you'll be better with time Let's talk about another example, a much easier one Remember the Knapsack problem? We have a knapsack of size, for example, 20 and we would enter house, take the items or not, and we wanted to get the items with highest benefit, each time had a price and weight In the Fractional Knapsack: Previously, we used to think as 0 or 1, take the item or don't Now, they tell you you can divide the item into small pieces, and pick the piece you want from them So, it's something like, when you're entering an apartment which has 5-kilo rice, where you can either steal the whole sack or take just 1 kilo Previously, the problem was about undividable items, like a TV for example In this problem for example, imagine you entered an apartment which has 10-kilo rice, and you know these 10 kilos cost 30 L.E. and you found 5-kilo cheese and they cost 50 L.E and I'm telling you that your bag can carry only 7 kilo. What would the optimal solution be? The optimal solution here consists of the 5-kilo cheese, and 2 kilos from the rice What would the DP solution be like? Just like the DP solution of the ordinary Knapsack, or actually it's just like the coin change problem When you pick something, for example, 1 kilo of the items, you'll try to recurse on yourself and see if you'll pick more or that's enough, and try to maximize And what about the Greedy solution? Let me ask you a question. If you have in this sack just 1 R. That is, you went to the apartment and you are not allowed to take anything other than 1 kilo from which items would you take? would you take from the thing with the highest price? Surely, the answer is no Why? Let me edit this example a little or never mind Now if you take from the item with the highest price, note that you'll take only 1 kilo from it you won't take the whole price, you'll take a kilo only So, you don't care about the item with the highest price. You only care about the item with the highest per-kilo price If I'd edit this example to illustrate the point more, I'd make the 20 kilo of the first type for 60 L.E So, following the sense, you'd take from the one with the highest price: take a kilo from the 60-L.E one However, the kilo of the first type is for 60 / 20 = 3 L.E So, you benefit only 3 L.E. from it. However, from the below one, you'd gain 50 / 5 = 10 L.E. So, it's most beneficial to me if I pick from the second one So, if I'm allowed to take 1 kilo only, with some intelligence, I wouldn't keep trying whether I should take 1 kilo from the first or the second Just take 1 kilo from the highest one (highest per-kilo price) And if you can take 2 kilos? just take 1 kilo from the highest one, then you'll have a new subproblem And in the new subproblem, the second item would be 1 reduced. So, instead of 5 kilos for 50 L.E., it'd be 4 kilos for 40 L.E. And the take the highest one from it, .. etc What I'm saying here is that we've developed a Greedy Solution here The greedy solution says: "Go take from the highest per-kilo type till it runs out. Then take from the next highest one till it runs outs, and so on" So, the solution would be: Sort items according to ratio of Money / Weight Then take from the first item as much as you can. And when finished go to the next one, take as much as you can, and so on And for complexity: Sorting is O(NlogN), and the processing is O(N) So, what did we do in this example? The highest one was this one, and we have up to 7 kilo. So, we took from it 5 kilos and it ran out And now we have 2 kilos left, take them from the other one Note that we got our intuition in this problem a lot faster, and believed in it so fast However, the proof of this problem is a full page That is, the problem's idea may be too easy, however, its proof is cumbersome And the problem or the idea can be very cumbersome, but has a very easy proof So, in all cases you need to read proofs So, that's for "Greedy". Greedy is important and shows up in lots of places So, it needs work. It's a hard technique as it depends on observations It needs you to think a lot, needs you to proof. So, try to work on it well Peace be upon you :)