Community

Last week there were a series of posts regarding some optimized code
within phobos streams. A question posed was, without those same
optimizations, would tango.io be slower than the improved phobos [1]
As these new phobos IO functions are now available, Andrei's "benchmark"
[2] was run on both Win32 and linux to see where tango.io could use some
improvement.
The results indicate:
1) on linux, the fastest variation of the revised phobos code runs 40%
slower than the generic tango.io equivalent. On the other hand, the new
phobos code seems a bit faster than perl
2) on win32, similar testing shows tango.io to be more than six times
faster than the improved phobos code. Tweaking the tango.io library a
little makes it over eight times faster than the phobos equivalent [3]
3) On Win32, generic tango.io is more than twice as efficient as the
fastest C version identified. It's also notably faster than MinGW 'cat',
which apparently performs various under-the-cover optimizations.
4) by making some further optimizations in the phobos client-code using
setvbuf() and fputs(), the improved phobos version can be sped up
significantly; at that point tango.io is only three times faster than
phobos on Win32. These adjustments require knowledge of tweaking the
underlying C library; thus, they may belong to the group of C++ tweaks
which Walter quibbled with last week. The setvbuf() tweaks make no
noticable difference on linux, though the fputs() improvements are
accounted for in #1 (above)
Note that tango.io is not explicitly optimized for this behaviour. While
some quick hacks to the library have been shown to make it around 20%
faster than the generic package (for this specfic test), the efficiency
benefits are apparently derived through the approach more than anything
else. With some changes to a core tango.io module, similar performance
multipliers could presumeably be exhibited on linux platforms also. That
is: tango.io is relatively sedate on linux, compared to its win32 variation.
FWIW: if some of those "Language Shootout" tests are IO-bound, perhaps
tango.io might help? Can't imagine they'd apply that as a "language"
test, but stranger things have happened before.
Here's the tango.io client (same as last week):
-------------
import tango.io.Console;
void main()
{
char[] content;
while (Cin.nextLine (content, true))
Cout (content);
}
------------
and here's the fastest phobos equivalent. Removing the setvbuf() code
makes it consume around twice as much time on Win32. Note that this
version is faster than the equivalent code posted last week, though
obviously more specialized and verbose:
------------
import std.stdio;
import std.cstream;
void main() {
char[] buf = new char[1000 ];
size_t len;
const size_t BUFSIZE = 2 * 1024;
setvbuf(stdin, null, _IOFBF, BUFSIZE);
setvbuf(stdout, null, _IOFBF, BUFSIZE);
while (( len = readln(buf)) != 0) {
assert(len < 1000);
buf[len] = '\0';
fputs(buf.ptr, stdout);
}
}
------------
[1] Timing measurements can be supplied to those interested.
[2] The recent changes within phobos apparently stemmed from Andrei
piping large text files through his code, and this "benchmark" is a
reflection of that process.
[3] That ~20% optimization has been removed from the generic package at
this time, since we feel it doesn't contribute very much to the overall
IO picture. It can be restored if people find that necessary, and there
is no change to client code.

kris wrote:
> Last week there were a series of posts regarding some optimized code
> within phobos streams. A question posed was, without those same
> optimizations, would tango.io be slower than the improved phobos [1]
>
> As these new phobos IO functions are now available, Andrei's "benchmark"
> [2] was run on both Win32 and linux to see where tango.io could use some
> improvement.
[snip]
On my machine, Tango does 4.3 seconds and the following phobos program
(with Walter's readln) does 5.4 seconds:
#!/usr/bin/env rundmd
import std.stdio;
void main() {
char[] line;
while (readln(line)) {
write(line);
}
}
where write is a function that isn't yet in phobos, of the following
implementation:
size_t write(char[] s) {
return fwrite(s.ptr, 1, s.length, stdout);
}
Also, the Tango version has a bug. Running Tango's cat without any pipes
does not read lines from the console and outputs them one by one, as it
should; instead, it reads many lines and buffers them internally,
echoing them only after the user has pressed end-of-file (^D on Linux),
or possibly after the user has entered a large amount of data (I didn't
have the patience). The system cat program and the phobos implementation
correctly process each line as it was entered.
This bug should be fixed for the programs to be comparable. After that,
it would help giving numbers comparing all of tango, phobos, and cat,
with the perl baseline.
Andrei

Andrei Alexandrescu (See Website For Email) wrote:
> kris wrote:
>
>> Last week there were a series of posts regarding some optimized code
>> within phobos streams. A question posed was, without those same
>> optimizations, would tango.io be slower than the improved phobos [1]
>>
>> As these new phobos IO functions are now available, Andrei's
>> "benchmark" [2] was run on both Win32 and linux to see where tango.io
>> could use some improvement.
>
> [snip]
>
> On my machine, Tango does 4.3 seconds and the following phobos program
> (with Walter's readln) does 5.4 seconds:
On Win32, the difference is very much larger. As noted before, several
times faster. Those benefits will likely translate to linux going forward.
>
> #!/usr/bin/env rundmd
> import std.stdio;
>
> void main() {
> char[] line;
> while (readln(line)) {
> write(line);
> }
> }
>
> where write is a function that isn't yet in phobos, of the following
> implementation:
>
> size_t write(char[] s) {
> return fwrite(s.ptr, 1, s.length, stdout);
> }
Wondered where that had gone
>
> Also, the Tango version has a bug. Running Tango's cat without any pipes
> does not read lines from the console and outputs them one by one, as it
> should; instead, it reads many lines and buffers them internally,
> echoing them only after the user has pressed end-of-file (^D on Linux),
> or possibly after the user has entered a large amount of data (I didn't
> have the patience). The system cat program and the phobos implementation
> correctly process each line as it was entered.
If you mean something that you've written, that could presumeably be
rectified by adding the isatty() test Walter had mentioned before. That
has not been added to tango.io since (a) it would likely make programs
behave differently depending on whether they were redirected or not.
It's not yet clear whether that is an appropriate specialization, as
default behaviour, and (b) there has been no ticket issued for it
Again, please submit a ticket so we don't forget about that detail. We'd
be interested to hear if folk think the "isatty() test" should be
default behaviour, or would perhaps lead to corner-case issues instead

kris wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> kris wrote:
>>
>>> Last week there were a series of posts regarding some optimized code
>>> within phobos streams. A question posed was, without those same
>>> optimizations, would tango.io be slower than the improved phobos [1]
>>>
>>> As these new phobos IO functions are now available, Andrei's
>>> "benchmark" [2] was run on both Win32 and linux to see where tango.io
>>> could use some improvement.
>>
>> [snip]
>>
>> On my machine, Tango does 4.3 seconds and the following phobos program
>> (with Walter's readln) does 5.4 seconds:
>
> On Win32, the difference is very much larger. As noted before, several
> times faster. Those benefits will likely translate to linux going forward.
If I understand things correctly, it looks like the hope is to derive
more speed from further dropping phobos and C I/O compatibility, a path
that I personally don't consider attractive.
Also, the fact that the tango version is "more than twice as efficient
as the fastest C version identified" suggests a problem with the testing
method or with the C code. Are they comparable? If you genuinely have a
method to push bits through two times faster than the fastest C can do,
you may want as well go ahead and patent it. Your method would speed up
many programs, since many use C's I/O and are I/O bound. It's huge news.
I'm not even kidding. But I doubt that that's the case.
>> Also, the Tango version has a bug. Running Tango's cat without any
>> pipes does not read lines from the console and outputs them one by
>> one, as it should; instead, it reads many lines and buffers them
>> internally, echoing them only after the user has pressed end-of-file
>> (^D on Linux), or possibly after the user has entered a large amount
>> of data (I didn't have the patience). The system cat program and the
>> phobos implementation correctly process each line as it was entered.
>
> If you mean something that you've written, that could presumeably be
> rectified by adding the isatty() test Walter had mentioned before. That
> has not been added to tango.io since (a) it would likely make programs
> behave differently depending on whether they were redirected or not.
> It's not yet clear whether that is an appropriate specialization, as
> default behaviour
What is absolutely clear is that the current version has a bug. It can't
read a line from the user and write it back. There cannot be any
question that that's a problem.
>, and (b) there has been no ticket issued for it
>
> Again, please submit a ticket so we don't forget about that detail. We'd
> be interested to hear if folk think the "isatty() test" should be
> default behaviour, or would perhaps lead to corner-case issues instead
I was actually pointing out a larger issue: incompatibility with phobos'
I/O and C I/O. Tango's version is now faster (thank God we got past the
\n issue and bummer it's not the default parameter of nextLine) but it
is incompatible with both phobos' and C's stdio. (It's possible that the
extra speed is derived from skipping C's stdio and using read and write
directly.) Probably you could reimplement phobos and bundle it with
Tango to give the users the option to link phobos code with Tango code
properly, but still C stdio compatibility is lost, and phobos code has
access to it.
Andrei

kris wrote:
> On Win32, the difference is very much larger. As noted before, several
> times faster.
I suspect that much of the slowness difference is from using C's fputs,
along with the need to append a 0 to use fputs.
std.stdio.readln will also automatically convert to char[] if the stream
is in wide character mode (as will all the phobos stdio functions). This
test is inlined and fast under Windows, but is a function call under
Linux which will hurt performance significantly.
> If you mean something that you've written, that could presumeably be
> rectified by adding the isatty() test Walter had mentioned before. That
> has not been added to tango.io since (a) it would likely make programs
> behave differently depending on whether they were redirected or not.
> It's not yet clear whether that is an appropriate specialization, as
> default behaviour, and (b) there has been no ticket issued for it
>
> Again, please submit a ticket so we don't forget about that detail. We'd
> be interested to hear if folk think the "isatty() test" should be
> default behaviour, or would perhaps lead to corner-case issues instead
Using isatty() to switch between line and block buffered I/O access is
routine when using C's stdio, and in fact is relied upon in DMC's
internal implementation of buffering. It's been this way for 25 years,
every C stdio implementation I've heard of uses it, and I've never heard
a complaint about it.

Andrei Alexandrescu (See Website For Email) wrote:
> kris wrote:
>
>> Andrei Alexandrescu (See Website For Email) wrote:
>>
>>> kris wrote:
>>>
>>>> Last week there were a series of posts regarding some optimized code
>>>> within phobos streams. A question posed was, without those same
>>>> optimizations, would tango.io be slower than the improved phobos [1]
>>>>
>>>> As these new phobos IO functions are now available, Andrei's
>>>> "benchmark" [2] was run on both Win32 and linux to see where
>>>> tango.io could use some improvement.
>>>
>>>
>>> [snip]
>>>
>>> On my machine, Tango does 4.3 seconds and the following phobos
>>> program (with Walter's readln) does 5.4 seconds:
>>
>>
>> On Win32, the difference is very much larger. As noted before, several
>> times faster. Those benefits will likely translate to linux going
>> forward.
>
>
> If I understand things correctly, it looks like the hope is to derive
> more speed from further dropping phobos and C I/O compatibility, a path
> that I personally don't consider attractive.
Nope. That's not the case at all. The expectation (or 'hope', if you
like) is that we can make the linux version operate more like the Win32
version
>
> Also, the fact that the tango version is "more than twice as efficient
> as the fastest C version identified" suggests a problem with the testing
> method or with the C code. Are they comparable? If you genuinely have a
> method to push bits through two times faster than the fastest C can do,
> you may want as well go ahead and patent it. Your method would speed up
> many programs, since many use C's I/O and are I/O bound. It's huge news.
That's good for D then?
There's no reason why C could not take the same approach yet, one might
imagine, the IO strategies exposed and the wide variety of special cases
may 'discourage' the implementation of a more efficient approach? That's
just pure speculation on my part, and I'm quite positive the C version
could be sped up notably if one reimplemented a bunch of things.
> I'm not even kidding. But I doubt that that's the case.
You're most welcome to your doubts, Andrei. However, just because "C
does it that way" doesn't mean it is, or ever was, the "best" approach
>
>>> Also, the Tango version has a bug. Running Tango's cat without any
>>> pipes does not read lines from the console and outputs them one by
>>> one, as it should; instead, it reads many lines and buffers them
>>> internally, echoing them only after the user has pressed end-of-file
>>> (^D on Linux), or possibly after the user has entered a large amount
>>> of data (I didn't have the patience). The system cat program and the
>>> phobos implementation correctly process each line as it was entered.
>>
>>
>> If you mean something that you've written, that could presumeably be
>> rectified by adding the isatty() test Walter had mentioned before.
>> That has not been added to tango.io since (a) it would likely make
>> programs behave differently depending on whether they were redirected
>> or not. It's not yet clear whether that is an appropriate
>> specialization, as default behaviour
>
>
> What is absolutely clear is that the current version has a bug. It can't
> read a line from the user and write it back. There cannot be any
> question that that's a problem.
Only with the way that you've written your program. In the general case,
that is not true at all. But please do submit that bug-report :)
>
>> , and (b) there has been no ticket issued for it
>>
>> Again, please submit a ticket so we don't forget about that detail.
>> We'd be interested to hear if folk think the "isatty() test" should be
>> default behaviour, or would perhaps lead to corner-case issues instead
>
>
> I was actually pointing out a larger issue: incompatibility with phobos'
> I/O and C I/O. Tango's version is now faster (thank God we got past the
> \n issue and bummer it's not the default parameter of nextLine) but it
> is incompatible with both phobos' and C's stdio. (It's possible that the
> extra speed is derived from skipping C's stdio and using read and write
> directly.) Probably you could reimplement phobos and bundle it with
> Tango to give the users the option to link phobos code with Tango code
> properly, but still C stdio compatibility is lost, and phobos code has
> access to it.
The issue you raise here is that of interleaved and shared access to
global entities, such as the console, where some incompatability between
tango.io and C IO is exhibited.
If you really dig into it, you'll perhaps conclude that (a) the number
of real-world scenario where this would truly become an issue is
diminishingly small, and (b) the vast (certainly on Win32) performance
improvement is worth that tradeoff. Even then, it is certainly possible
to intercept C IO functions and route them to tango.io equivalents instead.
It has been said before, but is probably worth repeating:
- Tango is not a phobos clone. Nor is it explicitly designed to be
compatible with phobos; sometimes it is worthwhile taking a different
approach. Turns out that phobos can be run alongside tango in many
situations.
- Tango is for D programmers; not C programmers.
- Tango, as a rule, is intended to be flexible, modular, efficient and
practical. The goal is to provide D with an exceptional library, and we
reserve the right to break a few eggs along the way ;)

Walter Bright wrote:
> kris wrote:
>
>> On Win32, the difference is very much larger. As noted before, several
>> times faster.
>
>
> I suspect that much of the slowness difference is from using C's fputs,
> along with the need to append a 0 to use fputs.
Okay. Oh, seemingly dout.write() has some io-synch problems when used in
this manner?
> std.stdio.readln will also automatically convert to char[] if the stream
> is in wide character mode (as will all the phobos stdio functions). This
> test is inlined and fast under Windows, but is a function call under
> Linux which will hurt performance significantly.
Well, phobos is running as fast as perl under linux, so perhaps it
doesn't seem to be much of an issue there? Under Win32, tango.io seems
to leave everything else in the dust.
Seems kinda obvious why that is, when you look at what the "benchmark"
is really testing? To me, it's likely spending most of its time
constructing each line, so it's really not an IO test per se? Tango
takes an alternate approach to such tasks, which would explains why it
is so fast under Win32. What surprises us is that tango.io is almost
sedate on linux by comparison. I can't explain that right now, but
suspect it may have something to do with file locks, or something :)
>> If you mean something that you've written, that could presumeably be
>> rectified by adding the isatty() test Walter had mentioned before.
>> That has not been added to tango.io since (a) it would likely make
>> programs behave differently depending on whether they were redirected
>> or not. It's not yet clear whether that is an appropriate
>> specialization, as default behaviour, and (b) there has been no ticket
>> issued for it
>>
>> Again, please submit a ticket so we don't forget about that detail.
>> We'd be interested to hear if folk think the "isatty() test" should be
>> default behaviour, or would perhaps lead to corner-case issues instead
>
>
> Using isatty() to switch between line and block buffered I/O access is
> routine when using C's stdio, and in fact is relied upon in DMC's
> internal implementation of buffering. It's been this way for 25 years,
> every C stdio implementation I've heard of uses it, and I've never heard
> a complaint about it.
That's useful input ... thanks. It was noted that a program used both as
a console process and a child process might behave differently, since
flush would be automatic on the console yet not always so for the child
(with redirected handles) ?

kris wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
> - Tango is for D programmers; not C programmers.
D programmers sometimes like to call 3rd party code written in other
languages, and pretty much any interop in D has to happen via C
compatibility. E.g. pyD. So I'm guessing if my D code calls on some
Python code that prints to the console that somewhere down the line that
eventually ends up on C's stdout. I could be wrong, but at least that's
why I *think* Andrei and Walter keep saying that C compatibility is
important.
Andrei -- by "compatibility" does that mean if I rebind stdio/stdout to
something different that both D and C's output go to the new place? Or
is it still necessary to rebind them individually? I did this once for
some legacy code in C++, and found that I had to rebind 3 things: the C
streams, the C++ old-style streams from <iostream.h> (was under MSVC 6),
and the new-style C++ streams from <iostream>. And then I had to do
the interleaving myself, which didn't really work (because all the
streams were just writing to output buffers individually). If what
you're talking about with compatibility would avoid that kind mess, that
is certainly be a good thing.
--bb

kris wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> kris wrote:
>>
>>> Andrei Alexandrescu (See Website For Email) wrote:
>>>
>>>> kris wrote:
>>>>
>>>>> Last week there were a series of posts regarding some optimized
>>>>> code within phobos streams. A question posed was, without those
>>>>> same optimizations, would tango.io be slower than the improved
>>>>> phobos [1]
>>>>>
>>>>> As these new phobos IO functions are now available, Andrei's
>>>>> "benchmark" [2] was run on both Win32 and linux to see where
>>>>> tango.io could use some improvement.
>>>>
>>>>
>>>> [snip]
>>>>
>>>> On my machine, Tango does 4.3 seconds and the following phobos
>>>> program (with Walter's readln) does 5.4 seconds:
>>>
>>>
>>> On Win32, the difference is very much larger. As noted before,
>>> several times faster. Those benefits will likely translate to linux
>>> going forward.
>>
>>
>> If I understand things correctly, it looks like the hope is to derive
>> more speed from further dropping phobos and C I/O compatibility, a
>> path that I personally don't consider attractive.
>
> Nope. That's not the case at all. The expectation (or 'hope', if you
> like) is that we can make the linux version operate more like the Win32
> version
>
>>
>> Also, the fact that the tango version is "more than twice as efficient
>> as the fastest C version identified" suggests a problem with the
>> testing method or with the C code. Are they comparable? If you
>> genuinely have a method to push bits through two times faster than the
>> fastest C can do, you may want as well go ahead and patent it. Your
>> method would speed up many programs, since many use C's I/O and are
>> I/O bound. It's huge news.
>
> That's good for D then?
>
> There's no reason why C could not take the same approach yet, one might
> imagine, the IO strategies exposed and the wide variety of special cases
> may 'discourage' the implementation of a more efficient approach? That's
> just pure speculation on my part, and I'm quite positive the C version
> could be sped up notably if one reimplemented a bunch of things.
>
>> I'm not even kidding. But I doubt that that's the case.
>
> You're most welcome to your doubts, Andrei. However, just because "C
> does it that way" doesn't mean it is, or ever was, the "best" approach
I think we're not on the same page here. What I'm saying is that, unless
you cut a deal with Microsoft to provide you with a secret D I/O API
that nobody knows about, all fast APIs in existence come with a C
interface. It's very hard to contend that. Probably you are referring to
the C stdio, and I'm in agreement with that. Of course there's a variety
of means to be faster than stdio on any given platform, at various
compatibility costs. It's known how to do that. "Hot water has been
invented."
>>>> Also, the Tango version has a bug. Running Tango's cat without any
>>>> pipes does not read lines from the console and outputs them one by
>>>> one, as it should; instead, it reads many lines and buffers them
>>>> internally, echoing them only after the user has pressed end-of-file
>>>> (^D on Linux), or possibly after the user has entered a large amount
>>>> of data (I didn't have the patience). The system cat program and the
>>>> phobos implementation correctly process each line as it was entered.
>>>
>>>
>>> If you mean something that you've written, that could presumeably be
>>> rectified by adding the isatty() test Walter had mentioned before.
>>> That has not been added to tango.io since (a) it would likely make
>>> programs behave differently depending on whether they were redirected
>>> or not. It's not yet clear whether that is an appropriate
>>> specialization, as default behaviour
>>
>>
>> What is absolutely clear is that the current version has a bug. It
>> can't read a line from the user and write it back. There cannot be any
>> question that that's a problem.
>
> Only with the way that you've written your program. In the general case,
> that is not true at all. But please do submit that bug-report :)
This is the fourth time we need to discuss this. Why do I need to
_argue_ that this is a bug, I don't understand.
Let me spell it again: Cin.nextLine is incorrect. It cannot be used
(without possibly some extra incantations I don't know about) to
implement a program that does this:
$ ./test.d
Please enter your name: Moe
Hello, Moe!
$ _
I don't have an account on the Tango site, and in a fraction of the time
it would take me to create one, you can submit the bug report.
>>> , and (b) there has been no ticket issued for it
>>>
>>> Again, please submit a ticket so we don't forget about that detail.
>>> We'd be interested to hear if folk think the "isatty() test" should
>>> be default behaviour, or would perhaps lead to corner-case issues
>>> instead
>>
>>
>> I was actually pointing out a larger issue: incompatibility with
>> phobos' I/O and C I/O. Tango's version is now faster (thank God we got
>> past the \n issue and bummer it's not the default parameter of
>> nextLine) but it is incompatible with both phobos' and C's stdio.
>> (It's possible that the extra speed is derived from skipping C's stdio
>> and using read and write directly.) Probably you could reimplement
>> phobos and bundle it with Tango to give the users the option to link
>> phobos code with Tango code properly, but still C stdio compatibility
>> is lost, and phobos code has access to it.
>
> The issue you raise here is that of interleaved and shared access to
> global entities, such as the console, where some incompatability between
> tango.io and C IO is exhibited.
>
> If you really dig into it, you'll perhaps conclude that (a) the number
> of real-world scenario where this would truly become an issue is
> diminishingly small, and (b) the vast (certainly on Win32) performance
> improvement is worth that tradeoff. Even then, it is certainly possible
> to intercept C IO functions and route them to tango.io equivalents instead.
What Win32 primitives does tango use?
> It has been said before, but is probably worth repeating:
>
> - Tango is not a phobos clone. Nor is it explicitly designed to be
> compatible with phobos; sometimes it is worthwhile taking a different
> approach. Turns out that phobos can be run alongside tango in many
> situations.
>
> - Tango is for D programmers; not C programmers.
>
> - Tango, as a rule, is intended to be flexible, modular, efficient and
> practical. The goal is to provide D with an exceptional library, and we
> reserve the right to break a few eggs along the way ;)
Sounds great.
Andrei

> It has been said before, but is probably worth repeating:
>
> - Tango is not a phobos clone. Nor is it explicitly designed to be
> compatible with phobos; sometimes it is worthwhile taking a different
> approach. Turns out that phobos can be run alongside tango in many
> situations.
>
> - Tango is for D programmers; not C programmers.
>
> - Tango, as a rule, is intended to be flexible, modular, efficient and
> practical. The goal is to provide D with an exceptional library, and we
> reserve the right to break a few eggs along the way ;)
Totally agree!