Tip:
Highlight text to annotate it
X
Hi, my name's Steve Malmskog with Netskope, and I'd like to
look into another Movie Line Monday.
The topic today is about machine learning.
And today's quote is "I'm sorry, Dave.
I'm afraid I can't do that." It comes from the movie 2001:
A Space Odyssey.
And for those of you who are familiar with the movie, you
know that that's actually not said by a person, but actually
by computer.
And today, we're going to be talking
about machine learning.
And we're not going to be talking about designing
systems that are sophisticated as HAL, but we are going to be
talking about how machine learning is used in enterprise
and other commercial solutions, and give you a
brief introduction to what that is.
And I want to start with a definition
about machine learning.
There are so more formal definitions.
The department head of the Machine Learning Department at
CMU, Carnegie Mellon, Tom Mitchell--
he actually has a very good formal definition.
We're going to be a little lighter weight here.
I want to use something that doesn't bog us down too much.
And so the definition that I'm adopting here is simply this.
The idea of machine learning is the study of designing
systems that learn from data.
And the two key words here that I want to look at in this
video is the idea of learning and the idea of data.
And if we start with data, if you're at all familiar with
where things have headed in the last few years, the whole
world of data is actually predicated on
this idea of big data.
And if you looked on Google Trends, for example, the
number of searches from big data over the last two years
has more than gone up by, I think, 10-fold
in the last 24 months.
So that's a lot of people who are interested in this area.
But the reason it's interesting is not because of
big data itself, but actually, as one of my colleagues, Ron,
did in another Movie Line Monday, talking about small
data, which is the idea of you taking big data and putting it
in some format that you can make sense of.
And one of the tools that we use to make that possible is
this idea of machine learning.
So machine learning is one of those tools in the toolbox
where we can take things like big data and make sense of it.
And if we look at the other half of--
we have data on one side.
And then the other half is this idea of learning.
So machine learning is not simply about processing data.
For example, you could have scripts that are run to
compress data.
You could have things that are substituting words or doing
word searches.
Those are just data processing tasks.
Machine learning is more about actually
learning from the data.
And when we say learning, what we mean by that is we mean
improvement over time.
So we start at some point in time, and as time goes on, the
actual results that we get are better than they were before.
And this idea of learning and machine learning is very
similar to how we learn ourselves.
For example, if you were learning to play a piece on
the piano--
let's say, Beethoven's "Moonlight Sonata"--
you might start out the first time playing and you might
only get 40% of the notes right.
But over time, you keep practicing, you keep
practicing, your brain adapts to the process of playing that
piece until you hopefully master it at some point.
In the same way, machines and machine learning algorithms
can mimic that behavior through the application of
data and their algorithms themselves to give that same
improvement over time.
And a great example that I like to use for that is this
idea of email.
So if you had an email account on an internet provider 15
years ago, you probably saw just about as much spam in
your email as you saw real email.
And it reached a point where, in many cases, email was
becoming nearly unusable.
And that's mainly because a lot of commercial vendors had
not yet adopted the means of figuring out what was real
email and what's spam.
But today, in your modern email inbox, you probably see
very, very few pieces of spam at all.
You might see a few once in while.
But for the most part, it's not there.
And what changed in that time is really the ability to take
email as an input, apply it to a machine learning algorithm.
And at the output, we get either an actual email, or we
figure out that it's actual spam.
The process of making this possible is really due to the
machine learning algorithm.
And in fact, it's not that the spam has disappeared.
If you actually go into your spam folder and take a look,
there are hundreds and hundreds of spam messages that
you are still getting.
But machine learning has made a tool that would by now be
completely unusable perfectly functional because of this
ability to improve over time and take inputs of data and
turn it into something useful, either email or reject it
because it's spam.
So just to wrap up here, what I want to talk about is this
idea, this relationship that exists between the data and
the machine learning algorithm.
So as we apply data to the machine learning algorithm
itself, there are two things that need to happen to get a
satisfactory result.
The first thing is that the data itself
has to be good data.
Even though the term "big data" is very popular, what's
really important at the end of the day is that the data is
actually usable.
You can do something with it.
Having a lot of data that you can't actually run a machine
learning algorithm against is not very useful.
At the same time, you need to have a machine learning
algorithm that is also good.
You need to choose a machine learning algorithm that
actually works for the problem that you have it hand.
When you combine these two together, you
get really good results.
If you mess up on either one of these, you
can expect poor results.
So that about wraps it up for another Movie Line Monday.
If you have questions, either about machine learning or
maybe another topic that you'd like to see, feel free to
email us at movielinemonday@netskope.com.
Again, I'm Steve Malmskog with Netskope
and thanks for watching.