On 12/27/16, Laslo Hunhold <dev_AT_frign.de> wrote:
> On Tue, 6 Dec 2016 02:16:56 -0800
> Michael Forney <mforney_AT_mforney.org> wrote:
>
> Hey Michael,
>
>> fread reads the entire requested size (BUFSIZ), which causes tools to
>> block if only small amounts of data are available at a time. At best,
>> this causes unnecessary copies and inefficiency, at worst, tools like
>> tee and cat are almost unusable in some cases since they only display
>> large chunks of data at a time.
>
> sbase uses FILE-pointers exclusively and mixing both "logics" is not
> very helpful for the clarity of the expressed code.

fread and read behave differently and have different use cases. I don't
see a problem with using whichever API is best suited for the task.

> The writeall() function looks like an util function that could be
> introduced to completely convert sbase from fp to fd stream handling.

I don't think writeall will help at all with reading files line-by-line
as many of the tools require.

> I think this is not the place here to discuss this, but I would be glad
> if you could create a new thread here on hackers_AT_ where you give some
> examples where buffered IO is problematic.

I feel like I have already justified this change enough in previous
comments.

However, I will take this rebase patch set as an opportunity to sneak in
one final response.

It is impossible to implement tee or cat -u correctly with fread unless
you read one byte at a time.

cat -u should "Write bytes from the input file to the standard output
without delay as each is read". If you fread into a BUFSIZ byte buffer,
it will call read successively until the buffer is filled or EOF is hit,
delaying the write.

tee "shall not buffer output", but by reading into intermediate buffer
with fread before the fwrite, we are buffering the output.

setbuf and setvbuf are not relevant because the problem is with the
fread behavior of returning "the number of elements successfully read
which is less than nitems only if a read error or end-of-file is
encountered".

The current implementation makes cat and tee not very useful when used
in a pipeline connected to a tty.

You could argue that we should just use fwrite instead of writeall, but
I think writeall is simple enough, and avoids mixing file descriptors
and FILE * in concat.

> The necessity to add functions like fshut() really made me think twice
> about the advantages of buffered IO.

Yes, I think stdio is a pretty unfortunate API, but replacing it
completely in sbase is a much bigger change with less benefit.

Changes since v2:
- Rebase on latest git.

Changes since v1:
- Changed concat to return -2 on write error so that callers have the option to
handle it differently.
- Added a patch to libutil/cp.c to preserve atime/mtime for symlinks.
- Changed the od overflow fix to be a bit clearer and avoid casting off_t to
size_t.
- Switched tail to use raw IO and concat as well.
- Added a patch to tail to make -c offsets refer to bytes instead of runes.