3. Some Simple Models of Neurons

In this video, I'm going to describe some relatively simple models of neurons. I'll describe a number of different models starting with simple linear and threshold neurons, and then, describing slightly more complicated models. These are much simpler than real neurons, but they're still complicated enough, to allow us to make neural nets, that do some very interesting kinds of machine learning. In order to understand anything complicated, we have to idealize it. That is, we have to make simplifications that allow us to get a handle on how it might work. With atoms, for example, we simplify them as behaving like little solar systems. Idealization removes the complicated details that are not essential for understanding the main principles. It allows us to apply mathematics, and to make analogies to other familiar systems. And once we understand the basic principles, it's easy to add complexity, and make the model more faithful to reality. Of course, we have to be careful when we idealize something, not to remove the thing that's giving it its main properties. It's often worth understanding models that are known to be wrong, as long as we don't forget they're wrong. So for example, a lot of work on neural networks uses neurons that communicate real values rather than discrete spikes of activity, and we know cortical neurons don't behave like that, but it's still worth understanding systems like that, and in practice they can be very useful for machine learning. The first kind of neuron I want to tell you about is the simplest, it's a linear neuron. It's simple. It's computationally limited in what it can do. It may allow us to get insights into more complicated neurons. But it may be somewhat misleading. So in a linear neuron, the output Y. Is a function of a bi-asset in your run B and the sum of all its incoming connections of the activity on an input line times the weight on that line that's the synaptic weight on the input line and if you plot that as curve, then if you plot on the X-axis, the buyers plus the weighted activities on the input line we get a straight line that goes through zero. Very different from linear neurons, are binary threshold neurons that were introduced by McCulloch and Pitts. They actually influenced Von Roenam when he was thinking about how to design a universal computer. In a binary threshold neuron you first compute a weighted sum of the inputs and then you send out a spike of activity if that weighted sum exceeds the threshold. [inaudible] and Pitts thought that the spikes were like the truth values of propositions. So each neuron is combining the truth values it gets from other neurons to produce the truth value of its own. And that's like combining some propositions to compute the truth value of another proposition. At the time in the 1940's logic was the main paradigm for how the mind might work. Since then people thinking about how the brain computes have become much more interested in the idea the brain is combining lots of different sources of unreliable evidence. And so logic isn't such a good pardigm for what the brain's up to. For a binary threshold neuron, you can think of its input/output function as if the weighted input is above the threshold, it gives an output of one. Otherwise, it gives an output of zero. There are actually two equivalent ways to write the equations for a binary threshold neuron. We can say that the total input Z is just the activities on the input lines times the weights. And then the output Y is one if that Z is above the threshold and zero otherwise. Alternatively, we could say that the total input includes a bias term. So the total input is what comes in on the input lines, times the weights, plus this bias term. And then we could say the output is one if that total input is above zero and is zero otherwise. And the equivalence is simply that the threshold in first formulation is equal to the negative of the bias in the second formulation. A kind of neuron that combines the properties of both linear neurons and binary threshold neurons is a rectified linear neuron. It first computes a linear weighted sum of its inputs, but then it gives an output that's a non-linear function of this weighted sum. So we compute Z in the same way as before. If z is below zero, we give an output of zero. Otherwise, we give an output that's equal to z. So above zero is linear, and at zero, it makes a hard decision. So the input/output curve looks like this. It's definitely not linear, but above zero it is linear. So with a neuron like this, we can get a lot of the nice properties of linear systems, when it's above zero. We can also get the ability to make decisions, at zero. The neurons that we'll use a lot in this course, and are probably the commonest kinds of neurons to use in artificial neuron [inaudible], are sigmoid neurons. They give a real valued output that is a smooth and bounded function of their total input. It's typical to use the logistic function, where the total input is computed as before, as a bias plus what comes in on the input lines, weighted. The output for a logistic neuron is one over one plus e to the minus, the total input. If you think about that, if the total input's big and positive. E to the minus a big positive number is zero. And so, the output will be one. If the total input's big and negative, E to the minus a big negative number is a large number, and so the output will be zero. So the input output function looks like this. When, the total input's zero, e to the minus zero is one, so the output's a half. And the nice thing about a sigmoid is it has, smooth derivatives. The derivatives, change continuously. And, so they're nicely behaved, and they make it easy to do learning as we'll see in lecture three. Finally the stochastic binary neurons. They use just the same equations as logistic units. They compute their total input the same way and they use the logistic function to compute a real value which is the probability that they will output a spike. But then instead of outputting that probability as a real number they actually make a probabilistic decision, and so what they actually output is either a one or a zero. They're intrinsically random. So they're treating the P as the probability of producing a one, not as a real number. Of course, if the input is very big and positive they will almost always produce a one. If the input's big and negative they'll almost always produce a zero. We can do a similar trick with rectified linear units. We can say that the output, there's real value that comes out of a rectified linear unit, if its input is above zero, is the rate of producing spikes. So that's deterministic. But once we've figured out these rate of producing spikes, the actual times at which spikes are produced is a random process. It's a Poisson process. So the rectified linear unit determines the rate, but intrinsic randomness in the unit determines when the spikes are actually produced.