Tip:
Highlight text to annotate it
X
Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér.
In machine learning, we often encounter classification problems where we have to decide whether an
image depicts a dog or a cat. We'll have an intuitive, but simplified example where we
imagine that the red dots represent dogs, and the green ones are the cats. We first
start learning on a training set, which means that we get a bunch of images that are
points on this plane, and from these points we try to paint the parts of the plane red
and green. This way, we can specify which regions correspond to the concept of dogs
and cats.
After that, we'll get new points that we don't know anything about, and we'll ask the algorithm,
for instance, a neural network to classify these unknown images, so it tells us whether
it thinks that it is a dog or a cat. This is what we call a test set.
We have had a lots of fun with neural networks and deep learning in previous Two Minute Papers
episodes, I've put some links in the description box, check them out!
In this example, it is reasonably easy to tell that the reds roughly correspond to the
left, and the greens to the right. However, if we just jumped on the deep learning hype
train, and don't know much about a neural networks, we may get extremely poor results
like this.
What we see here is the problem of overfitting. Overfitting means that our beloved neural
network does not learn the concept of dogs or cats, it just tries to adapt as much as
possible to the training set.
As an intuition, think of poorly made real-life exams. We have a textbook where we can practice
with exercises, so this textbook is our training set.
Our test set is the exam. The goal is to learn from the textbook and obtain knowledge that
proves to be useful at the exam.
Overfitting means that we simply memorize parts of the textbook instead of obtaining
real knowledge. If you're on page 5, and you see a bus, then the right answer is B. Memorizing
patterns like this, is not real learning. The worst case is if the exam questions are
also from the textbook, because you can get a great grade just by overfitting. So, this
kind of overfitting has been a big looming problem in many education systems.
Now the question is, which kind of neural network do we want? Something that works like
a lazy student, or one that can learn many complicated concepts.
If we're aiming for the latter, we have to combat overfitting, which is the bane of so
many machine learning techniques.
Now, there's several ways of doing that, but today we're going to talk about one possible
solution by the name L1 and L2 regularization.
The intuition of our problem is that the deeper and bigger neural networks we train, the more
potent they are, but at the same time, they get more prone to overfitting. The smarter
the student is, the more patterns he can memorize.
One solution is to hurl a smaller neural network at the problem. If this smaller version is
powerful enough to take on the problem, we're good. A student who cannot afford to memorize
all the examples is forced to learn the actual underlying concepts.
However, it is very possible that this smaller neural network is not powerful enough to solve
the problem. So we need to use a bigger one. But, bigger network, more overfitting. Damn.
So what do we do?
Here is where L1 and L2 regularization comes to save the day. It is a tool to favor simpler
models instead of complicated ones. The idea is that the simpler the model is, the better
it transfers the textbook knowledge to the exam, and that's exactly what we're looking for.
Here you see images of the same network with different regularization strengths. The first
one barely helps anything and as you can see, overfitting is still rampant. With a stronger
L2 regularization, you see that the model is simplified substantially, and is likely
to perform better on the exam. However, if we add more regularization, it might be that
we simplified the model too much, and it is almost the same as a smaller neural network
that is not powerful enough to grasp the underlying concepts of the exam. Keep your neural network
as simple as possible, but not simpler.
One has to find the right balance which is an art by itself, and it shows that training
deep neural networks takes a bit of expertise. It is more than just a plug and play tool
that solves every problem by magic.
If you want to play with the neural networks you've seen in this video, just click on the
link in the description box. I hope you'll have at least as much fun with it as I had!
Thanks for watching, and for your generous support, and I'll see you next time!