Two+ Minute Papers - Overfitting and regularization for deep learning

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. In machine learning, we often encounter classification problems where we have to decide whether an image depicts a dog or a cat. We'll have an intuitive, but simplified example where we imagine that the red dots represent dogs, and the green ones are the cats. We first start learning on a training set, which means that we get a bunch of images that are points on this plane, and from these points we try to paint the parts of the plane red and green. This way, we can specify which regions correspond to the concept of dogs and cats. After that, we'll get new points that we don't know anything about, and we'll ask the algorithm, for instance, a neural network to classify these unknown images, so it tells us whether it thinks that it is a dog or a cat. This is what we call a test set. We have had a lots of fun with neural networks and deep learning in previous Two Minute Papers episodes, I've put some links in the description box, check them out! In this example, it is reasonably easy to tell that the reds roughly correspond to the left, and the greens to the right. However, if we just jumped on the deep learning hype train, and don't know much about a neural networks, we may get extremely poor results like this. What we see here is the problem of overfitting. Overfitting means that our beloved neural network does not learn the concept of dogs or cats, it just tries to adapt as much as possible to the training set. As an intuition, think of poorly made real-life exams. We have a textbook where we can practice with exercises, so this textbook is our training set. Our test set is the exam. The goal is to learn from the textbook and obtain knowledge that proves to be useful at the exam. Overfitting means that we simply memorize parts of the textbook instead of obtaining real knowledge. If you're on page 5, and you see a bus, then the right answer is B. Memorizing patterns like this, is not real learning. The worst case is if the exam questions are also from the textbook, because you can get a great grade just by overfitting. So, this kind of overfitting has been a big looming problem in many education systems. Now the question is, which kind of neural network do we want? Something that works like a lazy student, or one that can learn many complicated concepts. If we're aiming for the latter, we have to combat overfitting, which is the bane of so many machine learning techniques. Now, there's several ways of doing that, but today we're going to talk about one possible solution by the name L1 and L2 regularization. The intuition of our problem is that the deeper and bigger neural networks we train, the more potent they are, but at the same time, they get more prone to overfitting. The smarter the student is, the more patterns he can memorize. One solution is to hurl a smaller neural network at the problem. If this smaller version is powerful enough to take on the problem, we're good. A student who cannot afford to memorize all the examples is forced to learn the actual underlying concepts. However, it is very possible that this smaller neural network is not powerful enough to solve the problem. So we need to use a bigger one. But, bigger network, more overfitting. Damn. So what do we do? Here is where L1 and L2 regularization comes to save the day. It is a tool to favor simpler models instead of complicated ones. The idea is that the simpler the model is, the better it transfers the textbook knowledge to the exam, and that's exactly what we're looking for. Here you see images of the same network with different regularization strengths. The first one barely helps anything and as you can see, overfitting is still rampant. With a stronger L2 regularization, you see that the model is simplified substantially, and is likely to perform better on the exam. However, if we add more regularization, it might be that we simplified the model too much, and it is almost the same as a smaller neural network that is not powerful enough to grasp the underlying concepts of the exam. Keep your neural network as simple as possible, but not simpler. One has to find the right balance which is an art by itself, and it shows that training deep neural networks takes a bit of expertise. It is more than just a plug and play tool that solves every problem by magic. If you want to play with the neural networks you've seen in this video, just click on the link in the description box. I hope you'll have at least as much fun with it as I had! Thanks for watching, and for your generous support, and I'll see you next time!