Tip:
Highlight text to annotate it
X
.
.
In this segment, I'm going to just talk
briefly about the IEEE 754 standard.|We're only going to talk about
how we represent a single-precision number.|IEEE 754
standard is a standard which is used to standardize
the representation of numbers in various computers, as well as standardize the arithmetic
operations of multiplication, addition, division, and subtraction in those computers.
So the standard is basically to see that, hey, the different computers,
whether you're using a VAX, a Cray computer, a PC, a Macintosh, that all those
computers will be able to represent the numbers in a similar fashion, the same as the
case with the arithmetic operations which we do on those computers.
Now, there's a nice paper,
and you can find this link in the PowerPoint presentation of this particular
presentation here.|So if you go to the keyword floating point on the numerical methods
website, you will see a link to the PowerPoint, but you can also get it right from here, if you
just want to punch in the URL directly into your . . . into your
browser.|This is a very good paper on what every computer scientist should know
about floating point arithmetic, and in my opinion, and even if you are not a computer scientist,
you can gain quite a bit of knowledge about floating point operations,
or floating point arithmetic from this particular paper.|It's a long paper, it's about 150 to 175
pages long.|I don't expect that you are going to go through all the 175 pages,
but what I would like to see is that you skim through it, see some of the initial details,
some of the summary details of the paper, and I think you will learn quite a bit about
how this floating point arithmetic works, and how the IEEE 754 standard
works for floating point operations.|And, again, I said
that we are limiting our discussion to the single-precision number, we're not even talking about arithmetic
operations, but just the single-precision format, so that you get some good
feel about, we already talked about the floating point representation in a hypothetical
word or a hypothetical . . . hypothetical ten-bit
word, and here, what we are trying to do is we are trying to use the
thirty-two bits which are actually used for single-precision in real life, how is that different?
Now, one of the things which you see is that you have thirty-two bits for your single-precision
number.|You're going to use one bit for the sign of the number, and then you
use eight bits for the biased exponent, and we'll talk about
the biased exponent in a little bit, and then use twenty-three bits for the mantissa.
Now what does that mean?|It's that your -1 raised power s will then dictate, where s
is the sign of the bit, it can be either a 0 or a 1, it will dictate whether it's a negative number or a
positive number, the mantissa is twenty-three bits which go right here, those 0s and 1s
go right here, and what you're seeing here is that before the radix point, here, you are seeing a 1.
What does that mean?|We know that when we do scientific notation, we use a
nonzero number before the decimal point, when we're talking about the base-10 number, so that nonzero
number can be . . . has to be 1, 2, 3, 4, 5, 6, 7, 8, or 9, depending on
what the number is, but when we talk about binary numbers, the only nonzero number which we have is 1.
So the 1 is automatically assumed that it is before the radix point, so that's why
you find out that you don't put that 1 in the mantissa part of the number at all,
because if it is already assumed, there's no need to store it, and you are saving some
space, and also increasing the accuracy of your numbers by doing that.|Now so far as the
exponent is concerned, it's a biased exponent.|What we mean by biased exponent is that
you'll have to subtract 127 from the exponent which is being stored here.
What you are finding out here in the biased exponent is that there is no bit which is being used for negative
exponents, because you're going to get negative for a number, let's suppose you have 2 to the power -50.
How are we going to represent that in an exponent which has no bit for the sign of
the exponent?|And the way it is done is that you bias it, and you subtract 127, which
means that whatever number is being represented in exponent, you subtract 127
from it to show that, hey, that's the real number which is being represented.|Now, if you have the exponent
as something, then you add 127 to it, and then you store it right
here in the eight bits of the biased exponent.
.
.