Tip:
Highlight text to annotate it
X
In this video, I'm going to describe some relatively simple models of neurons.
I'll describe a number of different models starting with simple linear and threshold
neurons, and then, describing slightly more complicated models.
These are much simpler than real neurons, but they're still complicated enough, to
allow us to make neural nets, that do some very interesting kinds of machine
learning. In order to understand anything
complicated, we have to idealize it. That is, we have to make simplifications
that allow us to get a handle on how it might work.
With atoms, for example, we simplify them as behaving like little solar systems.
Idealization removes the complicated details that are not essential for
understanding the main principles. It allows us to apply mathematics, and to
make analogies to other familiar systems. And once we understand the basic
principles, it's easy to add complexity, and make the model more faithful to
reality. Of course, we have to be careful when we
idealize something, not to remove the thing that's giving it its main
properties. It's often worth understanding models that
are known to be wrong, as long as we don't forget they're wrong.
So for example, a lot of work on neural networks uses neurons that communicate
real values rather than discrete spikes of activity, and we know cortical neurons
don't behave like that, but it's still worth understanding systems like that, and
in practice they can be very useful for machine learning.
The first kind of neuron I want to tell you about is the simplest, it's a linear
neuron. It's simple.
It's computationally limited in what it can do.
It may allow us to get insights into more complicated neurons.
But it may be somewhat misleading. So in a linear neuron, the output Y.
Is a function of a bi-asset in your run B and the sum of all its incoming
connections of the activity on an input line times the weight on that line that's
the synaptic weight on the input line and if you plot that as curve, then if you
plot on the X-axis, the buyers plus the weighted activities on the input line we
get a straight line that goes through zero.
Very different from linear neurons, are binary threshold neurons that were
introduced by McCulloch and Pitts. They actually influenced Von Roenam when
he was thinking about how to design a universal computer.
In a binary threshold neuron you first compute a weighted sum of the inputs and
then you send out a spike of activity if that weighted sum exceeds the threshold.
[inaudible] and Pitts thought that the spikes were like the truth values of
propositions. So each neuron is combining the truth
values it gets from other neurons to produce the truth value of its own.
And that's like combining some propositions to compute the truth value of
another proposition. At the time in the 1940's logic was the
main paradigm for how the mind might work. Since then people thinking about how the
brain computes have become much more interested in the idea the brain is
combining lots of different sources of unreliable evidence.
And so logic isn't such a good pardigm for what the brain's up to.
For a binary threshold neuron, you can think of its input/output function as if
the weighted input is above the threshold, it gives an output of one.
Otherwise, it gives an output of zero. There are actually two equivalent ways to
write the equations for a binary threshold neuron.
We can say that the total input Z is just the activities on the input lines times
the weights. And then the output Y is one if that Z is
above the threshold and zero otherwise. Alternatively, we could say that the total
input includes a bias term. So the total input is what comes in on the
input lines, times the weights, plus this bias term.
And then we could say the output is one if that total input is above zero and is zero
otherwise. And the equivalence is simply that the
threshold in first formulation is equal to the negative of the bias in the second
formulation. A kind of neuron that combines the
properties of both linear neurons and binary threshold neurons is a rectified
linear neuron. It first computes a linear weighted sum of
its inputs, but then it gives an output that's a non-linear function of this
weighted sum. So we compute Z in the same way as before.
If z is below zero, we give an output of zero.
Otherwise, we give an output that's equal to z.
So above zero is linear, and at zero, it makes a hard decision.
So the input/output curve looks like this. It's definitely not linear, but above zero
it is linear. So with a neuron like this, we can get a
lot of the nice properties of linear systems, when it's above zero.
We can also get the ability to make decisions, at zero.
The neurons that we'll use a lot in this course, and are probably the commonest
kinds of neurons to use in artificial neuron [inaudible], are sigmoid neurons.
They give a real valued output that is a smooth and bounded function of their total
input. It's typical to use the logistic function,
where the total input is computed as before, as a bias plus what comes in on
the input lines, weighted. The output for a logistic neuron is one
over one plus e to the minus, the total input.
If you think about that, if the total input's big and positive.
E to the minus a big positive number is zero.
And so, the output will be one. If the total input's big and negative, E
to the minus a big negative number is a large number, and so the output will be
zero. So the input output function looks like
this. When, the total input's zero, e to the
minus zero is one, so the output's a half. And the nice thing about a sigmoid is it
has, smooth derivatives. The derivatives, change continuously.
And, so they're nicely behaved, and they make it easy to do learning as we'll see
in lecture three. Finally the stochastic binary neurons.
They use just the same equations as logistic units.
They compute their total input the same way and they use the logistic function to
compute a real value which is the probability that they will output a spike.
But then instead of outputting that probability as a real number they actually
make a probabilistic decision, and so what they actually output is either a one or a
zero. They're intrinsically random.
So they're treating the P as the probability of producing a one, not as a
real number. Of course, if the input is very big and
positive they will almost always produce a one.
If the input's big and negative they'll almost always produce a zero.
We can do a similar trick with rectified linear units.
We can say that the output, there's real value that comes out of a rectified linear
unit, if its input is above zero, is the rate of producing spikes.
So that's deterministic. But once we've figured out these rate of
producing spikes, the actual times at which spikes are produced is a random
process. It's a Poisson process.
So the rectified linear unit determines the rate, but intrinsic randomness in the
unit determines when the spikes are actually produced.