Tip:
Highlight text to annotate it
X
To extend the learning rule for a linear neuron to a learning rule we can use for
multilayer nets of nonlinear neurons, we need two steps.
First, we need to extend the learning rule to a single nonlinear neuron.
And we're going to use logistic neurons, although many other kinds of nonlinear
neurons could be used instead. We're now going to generalize the learning
rule for a linear neuron to a logistic neuron, which is a non linear neuron.
So, a logistic neuron, computes its logic, z, which is its total input, its, its bias
plus the sum over all its input lines of the value of, on an input line xi times
the weight on that line, wi. It then gives an output y that's a smooth
nonlinear function of that logit. As shown in the graph here, that function
is approximately zero when z is big and negative, approximately one when z is big
and positive, and in bet, in between, it changes smoothly and nonlinearly.
The fact that it changes continuously gives it nice derivatives, which make
learning easy. So to get the derivatives of a logistic
neuron with respect to the weight, which is what we need for learning, we first
need to compute the derivative of the logit itself, that is the total input with
respect to our weight, that's very simple. The logit is just a bias plus the sum of
all the input lines of the failure on the input lines times the weight.
So, when we differentiate with respect to wi, we just get xi.
So, the derivative of the logit with respect to wi is xi, and similarly, the
derivative of the logit with respect to xi is wi.
The derivative of the output with respect to the logic is also simple if you express
it terms of the output. So, the output is one / one + e^-z. And dy
by dz is just y into one - y. That's not obvious.
For those of you who like to see the math, I've put it on the next slide.
The math is tedious but perfectly straightforward so you can go through it
by yourself. Now, we've got the derivative, the output
with respect to the logic and the derivative, the logit with respect to the
weight, we can start to figure out the derivative, the output with respect to the
weight. We just use the chain rule again.
So, dy by dw is dz by dw times dy by dz. And dz by dw, as we just saw, is xi, dy by
dz is y into one minus y. And so, we now have the learning row for a
logistic neuron. We've got dy by dw, and all we need to do
is use the chain rule once more, and multiply it by de by dy.
And we get something that looks very like the delta rule.
So, the way the arrow changes is we change the weight, de by dwi, is just the sum of
all the row of training cases and of the value on input line xin times the
residual, the difference between the target and the output, on the actual
output of the neuron. But it's got this extra term in it, which
comes from the slope of the logistic function, which is yn into one - yn.
So, a slight modification of the delta rule gives us the gradiant decent learning
rule for training a logistic unit.