I was working on an audio program on the weekend and had some weird results on BSD. Tracking it down, it seems to be related to fseek() and read(). It occurs on both

* NetBSD 4.0.1 -release i386

* OpenBSD 4.4 -release i386

What happens is this: An input file consists of a whole number of frames (frame=2352 bytes). I fseek() a whole number of frames into the file, and then try to read() the rest of the file frame-by-frame. I noticed the wrong number of frames get read, and the last read() doesn't read a whole frame (too few bytes are read). From the read(2) man page:

Quote:

The system guarantees to read the number of bytes requested if the descriptor references a normal file that has that many bytes left before the end-of-file, but in no other case.

After tracking down the problem a bit, here's a simple demo that reproduces it. First, make an input file:

BTW, I had also tried fseeko(3), and it had the same problem as fseek(3). I'm still puzzled why fseek doesn't work with read(2), as they both seem to be rather legacy functions of this kind.

As for FreeBSD, I don't have it installed and never used it, so I don't know if the results are the same there. If any of the local FreeBSD users wish to try it and report the result that would be interesting!

> I'm still puzzled why fseek doesn't work with read(2), as they both seem to be rather legacy functions of this kind.

There is some difference, fseek is a stream I/O function of C stdlib and lseek is the system call for seeking into a file. Maybe there are different DS in the kernel for them and perhaps they are not kept in sync? <guess/>

Thanks TerryP for trying it on FreeBSD. Looks like you're also getting unexpected behaviour there, but different in detail. My sense is that, assuming the test code is properly written for the various platforms, then they should all give the same output. Since they don't, something is likely wrong.

Quote:

Originally Posted by TerryP

Hmm, I wonder if errno gets set to anything useful.

Well, since fseek() and read() are tested for their respective error condition (-1 in each case) and those cases aren't entered, probably errno won't be set, right?

Well, since fseek() and read() are tested for their respective error condition (-1 in each case) and those cases aren't entered, probably errno won't be set, right?

FWIW: I tried creating a version that checks errno after each call via a macro, only to have it segfault on run. Then I yanked the version in your post (again) to a temp file, compiled & run as in the last post and it segfaulted exactly the same way (same machine).

I've never tried to mix standard I/O functions with I/O system calls (why does anyone need to do that, normally?), but I remember a comment in the book Programming Perl: a warning about mixing things like read() and sysread(), should only be done if you are into wizardry, pain, or both. (read() and sysread() in Perl are basically equivalents to a Unix/C's fread() and read() respectively). I would reckon is you manipulate the file descriptor without updating the structure on the other side of a FILE *, like f.*() functions should do; things could probably get out of sync between the integer file descriptor and the FILE *stream; and get pissed off accordingly if certain ops were done, hypothetically anyway.

I'd really suggest trying it with fread() and such instead, as BSDFan suggests.

== other ==

The documentation on read() system call returns the # of bytes read, 0 if the read was EOF, -1 if a cork popped and sets errno. So if it's not reading the specified amount, I would rather assume it hit EOF and returned what was read up to that point (i.e. a number of bytes that is > 0 but < 2352)

Upon successful completion, where nbyte is greater than 0, read() shall mark for update the st_atime field of the file, and shall return the number of bytes read. This number shall never be greater than nbyte. The value returned may be less than nbyte if the number of bytes left in the file is less than nbyte, if the read() request was interrupted by a signal, or if the file is a pipe or FIFO or special file and has fewer than nbyte bytes immediately available for reading. For example, a read() from a file associated with a terminal may return one typed line of data.

Thanks BSDfan666 and TerryP again, that is really helpful stuff. It looks like fread() may be the missing piece. In trying to understand the origins of my confusion on this there seem to be 3 factors:

1) There are a lot of similar functions available here. Although I was aware of the distinction between those using stdio FILE*'s and the lower level ones using file descriptors, I wasn't aware of lseek(), and it seems I had forgotten about fread() due to:

2) I don't work with these things on a very regular basis, so things get fuzzy .

3) The program was originally developed on Linux, where fseek() and read() seem to work together ok. (BTW a quick check on SunOS showed it was ok there too.) This is good in a way, but it led to a false sense of security as to the general situation.

Quote:

Originally Posted by TerryP

I've never tried to mix standard I/O functions with I/O system calls (why does anyone need to do that, normally?)

I guess in my case I just found the read() interface a bit cleaner, combined with not having problems with it previously. Live and learn ...

Quote:

So if it's not reading the specified amount, I would rather assume it hit EOF and returned what was read up to that point (i.e. a number of bytes that is > 0 but < 2352)

Agreed, that was my assumption too. In the little demo program I wanted to do an extra read just to make sure there was no more. Of course it could have checked for EOF explictly then too, but this was getting beyond the first-order problem and I wanted to keep it short and clear.

So ... yesterday I re-wrote the thing to use lseek() [pointed out by ephemera]. But it seems I should really use fread() and re-assess things concerning lseek vs fseeko.

Short history lesson; The first Unix system was the PDP-7, this system had 18-bit integers.. not 16-bit.

Interesting, as that's an odd size. At any rate, my comment re K&R referred to page 164 where they say:

Quote:

Originally Posted by K&R

In pre-version 7 UNIX, the basic entry point to the I/O system is called seek. seek is identical to lseek, except that its offset argument is an int rather than a long. Accordingly, since PDP-11 integers have only 16 bits, the offset specified by seek is limited to 65,535.

I'll try to edit the post to include 18-bits as well.

Quote:

Also.. fseeko/ftello use off_t, under OpenBSD this type is always 64-bit.. but you'll need to define _FILE_OFFSET_BITS=64 on Linux.

Correct, that is what I wrote too, but maybe the headings were not clear. I'll see if I can punctuate them better or something. Thanks.