The Problem
This article is intended to address the plethora of posts to Dream.In.Code's C and C++ forum where a poster has doomed their program to weird behaviour under the innocent assumption that using "EOF" is a sure way to know that there's no more data to read from their file

What?? E.O.F. stands for 'End Of File', surely that means there's no more data?

Yes, kind of. Sorry if this hurts your head, but 'end of file' in C++ does not mean the same thing as 'no more data'.

Take a look at this short program which works with a file storing 5 names, and more importantly, what happens when 'eof' is used as the breaking condition for reading data into a vector.

Uh oh! First the program tells you that its read 6 names when there are only 5 names in the file, and then it goes on to display the last name twice.

What has happened is that the while loop has been repeated 6 times, because it is repeating until namefile has its internal "EOF" flag set. Remember that ifstream is not a file, its a stream, which means that it cannot know whether or not there is any more data in a file until after a failed attempt to read past the end of the file. After it read "Harry" from the file, the stream had not failed, therefore the EOF flag was still unset.

There's an even worse (more subtle) problem with this, based on how the >> operator works; it will always stop reading at the first 'whitespace' character it encounters - a space, a newline, a carriage return or a tab.

The file used in this example has a trailing whitespace character after the last name, however, if you modify the file, that the final character is y from "Harry", the stream will fail during the attempt to read "Harry", since it will not encounter any whitespace, and the eof flag will be set during that 5th and final read, and the program will "appear" to be working OK.
Obviously this is extremely unreliable. The one thing worse than a bug which you can see is a bug which you can't see! Any user, or other program which modifies this file, or even a part of this program which appends to this file may indiscriminately add a newline or space after the final data entry some other time, which means if you rely on the lack of a whitespace character, your program has a 50/50 chance of failing.

Not only this, but there are reasons other than EOF that a stream can fail; for example, a file may have been corrupted with some control characters, and the stream may fail for attempting to read some invalid data; This will send the program into an infinite loop since EOF will never be reached.

The Solution
The above example has established that "Read while not EOF" is a terrible, unreliable idiom prone to breaking, and must be avoided. luckily the solution is extremely simple; the correct idiom could be termed "Read while data exists to be read".

It simply involves moving the expression which reads data from the file into the while condition itself. This may seem a little odd, since namefile >> input doesn't really look like a true/false statement - it isn't! But that doesn't matter, because the 'return' of a >> operation is a reference to the stream from which data is extracted.

"Read while data exists to be read"

while( namefile >> input)
names.push_back(input);

streams are deliberatley allowed to evaluate to a bool for exactly this purpose. They use a technique called a conversion operator, Which is a form of operator overloading to allow objects to behave like other data types.
in the case of std::istream, an "operator bool" exists, and will yield false if a read operation fails for any reason; whether it be invalid data, a nonexistant file, EOF, or some other reason.

Using the same input file with a trailing space, or even using one without a trailing space, the output is the same - there are no 'overruns'; moreover, the new program is now hardened against other kinds of stream failures.

This solution with struct and class types
Inevitably, its more likely that you want to be able to read text files into complicated types which the >> operator doesn't know how to handle on its own. The problem described above still applies, but so can the solution with a little extra work.
The easy remedy is to overload >> so that data can be extracted directly into a struct/class object without the need for long, ugly while statements and a list of temporary variables. The ideal interface ought to allow file reading to look exactly the same as the code above.

Take a common example of a student class (output operator included for testing, though output formatting is beyond the scope of this article)

the final two lines allow overloaded insertion << and extraction >> operators to work closely with the student class, almost as part of its interface. They declare these operators to be friends of the student class, allowing them unrestricted access to private members; although the operators are not members of the class itself

The student input file might be a comma-separated format of Name, ID, Fees

There are no obviously easy ways to input an entire student at once - the data is comma separated, with one student on each line. The function which reads the file will need to grab a line and parse the comma separated data for each of the 3 student attributes. Luckily, reading from a file is usually fairly reliable - if the format (layout) of that file is known to be consistent throughout, a fairly rigid procedure can be written to parse each line.

Parsing the student file
Looking closely at the data and the student class, the format of a student is [string] <comma> [int] <comma> [double]

getline() is capable of retrieving data upto a delimiter; The student's name can be easily retrieved

std::getline(input, s.name, ',');

getline will automatically discard the comma which follows the name data.

This will retrieve 'id' but leave the next comma alone.

input >> s.id;

the next instruction must explicitly discard that comma

input.ignore();

Finally, the fees attribute can be retrieved

input >> s.fees;

There's still one problem - there's a 'newline' character remaining, this needs to be discarded, otherwise the next call to this overloaded operator will encounter this newline while its trying to read the student name

Notice that the final code in main() which populates the vector<student> is almost identical to the simple program which populated the vector<std::string>, even though the data in the file is far less trivial.

Summary
This article has addressed a common 'gotcha' for learners who are toying with streams and file I/O in C++. Overloading the stream insertion and extraction operators provides the user of a class with a clean and idiomatic way to handle I/O. The mess of parsing data is wrapped in an overloaded operator, allowing multiple objects of that class to be easily retrieved from a file; In addition, handling file read errors does not need to be a concern of the code which handles data parsing.

Great tutorial! I was getting tired of explaining to people why their loops controlled by if (!file.eof() ) didn't work and had it in mind to write a tutorial about it ... and here it is already.

Just one minor suggestion:

Your examples bad.cpp, good.cpp and main.cpp won't compile on Linux systems under GCC-3.x or GCC-4.x as they now stand. Apparently <iterator> (needed for ostream_iterator) is included by one of the other headers in MSDN, but not in GCC, so to be portable these programs should have

Great tutorial! I was getting tired of explaining to people why their loops controlled by if (!file.eof() ) didn't work and had it in mind to write a tutorial about it ... and here it is already.

Just one minor suggestion:

Your examples bad.cpp, good.cpp and main.cpp won't compile on Linux systems under GCC-3.x or GCC-4.x as they now stand. Apparently <iterator> (needed for ostream_iterator) is included by one of the other headers in MSDN, but not in GCC, so to be portable these programs should have

Your examples bad.cpp, good.cpp and main.cpp won't compile on Linux systems under GCC-3.x or GCC-4.x as they now stand. Apparently <iterator> (needed for ostream_iterator) is included by one of the other headers in MSDN, but not in GCC, so to be portable these programs should have

#include <iterator>

added to them.

Nice find! I hope a moderator can edit the examples and add this. Annoyingly, it seems that the comeau try-it-out online compiler (which usually picks up discrepancies like this between MS and the standard) also compiles happily without including <iterator>

hello
i was using the old way for reading file but i do not use the operator(>>)and i am dealing with binary data of size greater than one byte (integer,double....)and using the read and write funcs.
to write on the file.txt i am wondering how can i use your solution cuz the last repeated value make problems to the system i am trying to implement specially when i try to plot the output of my system
this is how i write the data to file with write

which will continue to enter the loop as long as there is data in the file.

If that isn't what you wanted, please try to clarify your question.

i am trying to implement a digital communication system and like every system i have number of blocks and write the output of each block on a file and read the input to the block from some values now i am building a NCO numerically controlled oscillator this is when i input a a numerical value it is generate a cosine wave with certain freq according to that value
so i have been b4 create a cosine wave with 8192 poind and store the points in file
now when the numerical value entered the NCO calculate the new freq and the now of points to generate cosine wave with that freq then open the file where the 8192 point stored and read the wanted points using the sequence of read ,seekg,write

read the wanted point from the 8192point file seekg set the file pointer to the next wanted point write is to write the points to the output file
this is inside the

while (!in.eof())
{
////read ,seekg,write///
}

you know that the last value is written twice and when i plot the wave i have get wrong value in my system so how can use the solution in this case
hope now that u got the complete image of my problem
regards

This post has been edited by eng. elaph al_abasy: 12 March 2011 - 10:50 AM

But that doesn't make sense to me. You say that you are generating a new wave, so why are you reading the data of the old wave and writing the same data to the output file?

I also don't understand what you are doing with seekg. After each 4-byte read, the file pointer automatically advances 4 bytes. Isn't your data contiguous? Did you write something else in between the wave points that you want to skip over? What is s? Why are you moving the pointer?

edit: If you have a reason to skip some of the points from the input file, that's fine.while(in.read(reinterpret_cast<char*>(&cos),sizeof (int))) will read and enter the loop only as long as there is more data in the file. It won't duplicate the last entry.