Tip:
Highlight text to annotate it
X
In this lesson, we're going to continue our discussion of how to read from files. One thing we really haven't discussed yet is how do you
read all the information from the file? Well, that's going to depend on how much information you know about the file.
You may or may not know how big the file. One thing is for certain you must know the format of the file. You have to know what format the data
is in the file. That's going to determine how you write your code. But you have to remember that you're not necessarily going to know how many
data is in the file. Is there 15 or 15 million? You may not know, and you may be handed a file and told there is about 1 million pieces of data.
Well, is it a million or a million and a half? You may not know. You may know and you may know some other
particular aspect about the file that is going to be important. So, let's take a look. The very first thing that you
have to realize is when you're reading from a file
you don't need prompts. So, if you look at this right here, who is going to be reading that when you read from a file? Nobody is.
So, if you were going to read nine pieces of data this pops up to the screen nine times and you hardly even notice it. But, if you're going to read a
file with a million pieces of data in it, probably shouldn't have a prompt. Okay, secondly. One way that you can deal with the issue of how to
read from a file, and of course you're going to be using some sort of loop because you are repeating the process, is if you could mark the
end of the data with a particular value. For instance, let's suppose all of the data is positive. Then you could put a negative value at the
end of the data and use a loop just to read until you get to a negative value. So, in general, when you're reading from
a file like this you can kind of pretend like there's this little pointer here. As soon as you open the file it points to the first data item. And every time
you read it moves over to the next data item. When I read in the very first value here that's going to put 3 into that variable there and then the
little pointer is going to move over to the next data item. In this case,
what we're going to do is we're going to jump into a loop while "num != -1". Well, we read in a 3 so it is not equal to -1. I'm going to
process the data and then read the next value.
So, I will read in 3, and then 11, and 34, each time processing until I read in that -1. That marks the end of the data and that's what's going terminate
that loop. Well, of course that depends on you being able to put a value there
that is of the same type, but it is outstanding in some fashion. And you may not be able to do that at all times. So, let's take a look at another case.
Suppose you know, a priori, the size of the dataset. Somebody hands you file and says, "I know there's exactly 16,284 pieces of data in
there." Well then you can create a for loop because you know ahead of time
exactly how many data items there are. And so you jump in to the loop, you read in, you process the data, whatever it is,
and loop. Now, suppose you don't know how
much data is in the dataset. Size of the dataset is unknown. Course you're looking at it here, you know there's 9, but suppose you can't look at it.
You don't know. There are two ways to do this, two ways to read the entire set of data. One, I think it's much easier than the other. And that is
simply to form a while loop and put the reading of the data in the expression for the while. This will return true
as long as there's something to read. So you read in, you process the data, wherever it is, and then
you jump back through to the expression of the while loop. This works beautifully. There is a way for you to screw up though.
Let's take a look. "while fin >> num" then "fin >> num; process_data(num);"
What's the problem here? Problem is you read in the first element, then you read in the next element, throwing away that guy.
Skips every other datum and that's a bad idea. So, be careful about doing something like that.
Let's take a look at a case where you've got two different kinds of data. In this case we've got a number and then a string.
How do I read this information in? We're going to use the extraction of data in the expression of the while loop to get the integer information,
and then inside the loop we'll do something similar, we're going to extract the data for the name data and that is for
the name data. So, I read in a number, read in a name, process those two items and then go back and loop through. Another way to read in
an entire file and stop at the appropriate point is using the "eof" function. "Eof" is a function call,
it's a member function of the class of objects to which "fin" belongs. You have to understand how the "eof" works. Now, "eof" will become
true when you're at the end of the file. It's false as long as you're not. Watch carefully.
And I put in red here the most important aspect of this that you have to understand. That is "eof"
becomes true only after trying to read past the last datum.
I read the 3, "eof" is false, read the 1, "eof" is false, 34 false, 56 false, 3 false, 14 false, 12 false, 6 false. I read the 124, "eof" is still
false. Then I attempt to read nothing. There's nothing there. It's only after I attempt to read nothing,
not before, but after I attempt to read something that isn't there, that "eof" becomes true.
Well, if you realize that then you know the code has to be like this. We have to first read in the very first value
and that's necessary just in case it's an empty file. We test "eof", if that is false,
Then, well you know this is an error. This should be not, right here. Then I'm going to process the data and read in the next value.
As a last example, we're going to read data from a file and write it out to another file in squared value.
So, I declare an "ifstream", I declare an "ofstream", I'm gonna open my connection from
the end stream to input.dat, that's where my data is. I'm gonna open the connection from out to output.dat,
jump into a loop and I'm gonna read, in the loop, every value
that's in the input.dat file. Then I'm gonna write it out to the output file. I'm not gonna write out just the data, but the data squared
and I'll put a space between them
so I can tell one from another. I'm done, I close the in, I close the out and I return and that's the end of that. So, now you can
create file streams, connect them to files, input data, use that data, process it, massage it, do whatever you need to do
and you can output data to a file. Incidentally,
you see that we created this output file stream and connected it to this output.dat. Does that file have to exist when you connect? No.
This is going to create it. If you were to run this program then go out to the Unix environment and look at
the listing of your files and you'll find that you got a new file called output.dat.
It will actually create it. The file does not have to exist. It creates it. Of course the input.dat
that does have to be in existence when you connect. So, that's the end of this session.