Benji suggested run-time inheritance and at least from a design
perspective I like some of his thoughts.
I've got a few questions though:
a) people here said that a virtual call will make it slow. How much
slow? how much of an overhead is it on modern hardware considering also
that this is a place where hardware manufacturers spend time on
optimizations?
b) can't a string class use implicit casts and maybe some sugar to
pretend to be a regular array in such a way to avoid that virtual call
and still be useful?
you already can do array.func(...) instead of func(array, ...) so this
can be used with a string class that implicitly converts to a char array..
C) compile-time interfaces? (aka concepts)

superdan wrote:
> for example look at this fn from std.string.
>
> int cmp(C1, C2)(in C1[] s1, in C2[] s2);
>
> so it looks like cmp accepts arrays of any character type. that is
> cool but the [] limits the thing to builtin arrays. the correct sig
> is
>
> int cmp(S1, S2)(in S1 s1, in S2 s2) if (isSortaString!(S1) &&
> isSortaString!(S2));
>
> correct?
Yes. It's just that template constraints came along later than
std.string.cmp :-)

Yigal Chripun wrote:
> a) people here said that a virtual call will make it slow. How much
> slow? how much of an overhead is it on modern hardware considering also
> that this is a place where hardware manufacturers spend time on
> optimizations?
Virtual function calls have been a problem for hardware optimization.
Direct function calls can be speculatively executed, but not virtual
ones, because the hardware cannot predict where it will go. This means
virtual calls can be much slower than direct function calls.

superdan wrote:
>> For starters, with strings implemented as character arrays, writing
>> library code that accepts and operates on strings is a bit of a pain in
>> the neck, since you always have to write templates and template code is
>> slightly less readable than non-template code. You can't distribute
your
>> code as a DLL or a shared object, because the template instantiations
>> won't be included (unless you create wrapper functions with explicit
>> template instantiations, bloating your code size, but more importantly
>> tripling the number of functions in your API).
>
> so u mean with a class the encoding char/wchar/dchar won't be an
issue anymore. that would be hidden behind the wraps. cool.
>
> problem is that means there's an indirection cost for every character
access. oops. so then apps that decided to use some particular encoding
consistently must pay a price for stuff they don't use.
So, I was thinking about the actual costs involved with the String class
and CharSequence interface design that I'd like to see (and that exists
in languages like Java and C#).
There's the cost of the class wrapper itself, the cost of internally
representing and converting between encodings, the cost of routing all
method calls through an interface vtable. Characters, if always
represented using two bytes, would consume twice the memory. And
returning characters from method-calls has got to be slower than
accessing them directly from arrays. Right?
So I wrote some tests, in Java and in D/Tango.
The source code files are attached. Both of the tests perform a common
set of string operations (searching, splitting, concatenating, and
character-iterating). I tried to make the functionality as identical as
possible, though I wasn't sure which technique to use for splitting text
in Tango, so I used both the "Util.split" and "Util.delimit" functions.
I ran both tests using a 5MB text file, "The Complete Works of William
Shakespeare", from the Project Gutenberg website:
http://www.gutenberg.org/dirs/etext94/shaks12.txt
You can grab it for yourself, or you can just run the code against your
favorite large text file.
I compiled and ran the Java code in the 1.6.0_06 JDK, with the "-server"
flag. The d code was compiled with DMD 1.034 and Tango 0.99.7, using the
"-O -release -inline" flags.
My test machine is an AMD Turion 64 X2 dual-core laptop, with 2GB of RAM
and running WinXP SP3.
I ran the tests eight times each, using fine-resolution timers. These
are the median results:
LOADING THE FILE INTO A STRING: D/Tango wins, by 428%
D/Tango: 0.02960 seconds
Java: 0.12675 seconds
ITERATING OVER CHARS IN A STRING: Java wins, by 280%
D/Tango: 0.10093 seconds
Java: 0.03599 seconds
SEARCHING FOR A SUBSTRING: D/Tango wins, by 218%
D/Tango: 0.02251 seconds
Java: 0.04915 seconds
SEARCH & REPLACE INTO A NEW STRING: D/Tango wins, by 226%
D/Tango: 0.17685 seconds
Java: 0.39996 seconds
SPLIT A STRING ON WHITESPACE:
Java wins, by 681% (against tango.text.Util.delimit())
Java wins, by 313% (against tango.text.Util.split())
D/Tango (delimit): 8.28195 seconds
D/Tango (split): 3.80465 seconds
Java (split): 1.21477 seconds
CONCATENATING STRINGS: Java wins, by 884%
D/Tango (array concat, no pre-alloc): 4.07929 seconds
Java (StringBuilder, no pre-alloc): 0.46150 seconds
SORT STRINGS (CASE-INSENSITIVE): D/Tango wins, by 226%
D/Tango: 1.62227 seconds
Java: 3.66389 seconds
It looks like D mostly falls down when it has to allocate a lot of
memory, even if it's just allocating slices. The D performance for
string splitting really surprised me.
I was interested to see, though, that Java was so much faster at
iterating through the characters in a string, since I used the charAt(i)
method of the CharSequence interface, rather than directly iterating
through a char[] array, or even calling the charAt method on the String
instance.
And yet, character iteration is almost 3 times as fast as in D.
Down with premature optimization! Design the best interfaces possible,
to enable the most pleasant and flexible programing idioms. The
performance problems can be solved. :-P
--benji

superdan wrote:
> Benji Smith Wrote:
>> A char[] is actually an array of UTF-8 encoded octets, where each
>> character may consume one or more consecutive elements of the array.
>> Retrieving the str.length property may or may not tell you how many
>> characters are in the string. And pretty much any code that tries to
>> iterate character-by-character through the array elements is
>> fundamentally broken.
>
> try this:
>
> foreach (dchar c; str)
> {
> process c
> }
Cool. I had no idea that was possible. I was doing this:
myFunction(T)(T[] array) {
foreach(T c; array) {
doStuff(c);
}
}
--benji

Walter Bright wrote:
> Benji Smith wrote:
>> You can't distribute your code as a DLL or a shared object, because
>> the template instantiations won't be included (unless you create
>> wrapper functions with explicit template instantiations, bloating your
>> code size, but more importantly tripling the number of functions in
>> your API).
>
> Is the problem you're referring to the fact that there are 3 character
> types?
Basically, yeah.
With three different character types, and two different array types
(static & dynamic). And in D2, with const, invariant, and mutable types
(and soon with shared and unshared), the number of ways of representing
a "string" in the type-system is overwhelming.
This afternoon, I was writing some string-processing code that I intend
to distribute in a library, and I couldn't but help thinking to myself
"This code is probably broken, for anything but the most squeaky-clean
ASCII text".
I don't mind that there are different character types, or that there are
different character encodings. But I want to deal with those issues in
exactly *one* place: in my string constructor (and, very rarely, during
IO). But 99% of the time, I want to just think of the object as a
String, with all the ugly details abstracted away.
>> Another good low-hanging argument is that strings are frequently used
>> as keys in associative arrays. Every insertion and retrieval in an
>> associative array requires a hashcode computation. And since D strings
>> are just dumb arrays, they have no way of memoizing their hashcodes.
>
> True, but I've written a lot of string processing programs (compilers
> are just one example of such). This has never been an issue, because the
> AA itself memoizes the hash, and from then on the dictionary handle is
> used.
Cool. The hashcode-memoization thing was really just a catalyst to get
me thinking. It's really at the periphery of my concerns with Strings.
>> We've already observed that D assoc arrays are less performant than
>> even Python maps, so the extra cost of lookup operations is unwelcome.
>
> Every one of those benchmarks that purported to show that D AA's were
> relatively slow turned out to be, on closer examination, D running the
> garbage collector more often than Python does. It had NOTHING to do with
> the AA's.
Ah. Good point. Thanks for clarifying. I didn't remember all the
follow-up details.
>> Most importantly, the contract between the regex engine and its
>> consumers would provide a well-defined interface for processing text,
>> regardless of the source or representation of that text.
>
> I think a better solution is for regexp to accept an Iterator as its
> source. That doesn't require polymorphic behavior via inheritance, it
> can do polymorphism by value (which is what templates do).
That's a great idea.
I should clarify that my referring to an "interface" was in the informal
sense. (Though I think actual interfaces would be a reasonable
solution.) But any sort of contract between text-data-structures and
text-processing-routines would fit the bill nicely.
>> But then again, I haven't used any of the const functionality in D2,
>> so I can't actually comment on relative usability of compiler-enforced
>> immutability versus interface-enforced immutability.
>
> From my own experience, I didn't 'get' invariant strings until I'd used
> them for a while.
I actually kind of think I'm on the other side of the issue.
I've been primarily a Java programmer (8 years) and secondarily a C#
programmer (3 years), so immutable Strings are the only thing I've ever
used. Lots of the other JDK classes are like that, too.
So, from my perspective, it seems like the ideal, low-impact way of
enforcing immutability is to have the classes enforce it on themselves.
I've never felt the need for compiler-enforced const semantics in any of
the work I've done.
Thanks for your replies! I always appreciate hearing from you.
--benji

"Walter Bright" <newshound1@digitalmars.com> wrote in message
news:g90iia$2jc4$3@digitalmars.com...
> Yigal Chripun wrote:
>> a) people here said that a virtual call will make it slow. How much
>> slow? how much of an overhead is it on modern hardware considering also
>> that this is a place where hardware manufacturers spend time on
>> optimizations?
>
> Virtual function calls have been a problem for hardware optimization.
> Direct function calls can be speculatively executed, but not virtual ones,
> because the hardware cannot predict where it will go. This means virtual
> calls can be much slower than direct function calls.
Modern x86 branch prediction treats indirect calls the same as conditional
branches. They get a slot in the branch target buffer, so they do get
speculatively executed. And if correctly predicted it's only a couple of
cycles more costly direct calls.
See the thread "Feature Request: nontrivial functions and vtable
optimizations" about 2 weeks ago.
I cited the technical docs and a few doubters ran benchmarks, which proved
that virtual methods are not as evil as many people think. In fact they are
no more evil than a conditional branch.

superdan wrote:
>> class LinkedList(T)
>> {
>> Node!(T) head;
>>
>> /** Gets the ith item of the list. Throws: SIGSEGV if i >= length of
>> the list. Time complexity: O(N) for a list of length N. This operation
>> is provided for completeness and not recommended for frequent use in
>> large lists. */
>> T opIndex(int i)
>> {
>> auto current = head;
>> while (i)
>> {
>> current = current.next;
>> }
>> return current.value;
>> }
>> }
>
> oopsies. houston we got a problem here. problem is all that pluggable sort business works only if it can count on a constant time opIndex. why? because sort has right in its spec that it takes o(n log n) time. if u pass LinkedList to it you obtain a nonsensical design that compiles but runs afoul of the spec. because with that opIndex sort will run in quadratic time and no amount of commentin' is gonna save it from that.
WRONG!
Those sorting algorithms are correct. Their runtime is now O(n^2 log n)
for this linked list.
> this design is a stillborn.
>
> what needs done is to allow different kinds of containers implement different interfaces. in fact a better way to factor things is via iterators, as stl has shown.
You didn't ask for an O(1) opIndex on a linked list. You asked for a
correct opIndex on a linked list. Any sorting algorithm you name that
would work on an array would also work on this linked list. Admittedly,
insertion sort would be much faster than qsort, but if your library
provides that, you, knowing that you are using a linked list, would
choose the insertion sort algorithm.
>>> how do you make a hashtable, an array, and a linked list obey the same interface? guess hashtable has stuff that others don't support?
>> You have an interface for collections (you can add, remove, and get the
>> length, maybe a couple other things).
>
> this is an incomplete response. what do you add for a vector!(T)? A T. what do you add for a hash!(T, U)? you tell me. and you tell me how you make that signature consistent across vector and hash.
interface List(T) : Collection!(T) {}
class Vector(T) : List!(T) {}
class HashMap(T, U) : Collection!(KeyValuePair!(T, U)) {}
Have a look at C#'s collection classes. They solved this problem.
>> You have an interface for lists (they're collections, and you can index
>> them).
>
> wrong. don't ever mention a linear-time indexing operator in an interview. you will fail it right then. you can always define linear-time indexing as a named function. but never masquerade it as an index operator.
If you create a linked list with O(1) indexing, that might suffice to
get you a PhD. If you claim that you can do so in an interview, you
should be required to show proof; and should you fail to do so, you will
probably be shown the door.
Even if you did prove it in the interview, they would probably consider
you overqualified, unless the company's focus was data structures.
>> Then you can use all the collection-oriented stuff with lists, and you
>> can do special list-type things with them if you want.
>>
>>> these are serious questions. not in jest not rhetorical and not trick.
>> Yes, but they're solved problems.
>
> apparently not since you failed at'em.
You claimed that a problem with an inefficient solution has no solution,
and then you execrate me for providing an inefficient solution. Why?
>>>> Someone else comes along and writes a library of algorithms. The
>>>> algorithms can operate on any container that implements the necessary
>>>> operations.
>>> hm. things are starting to screech a bit. but let's see your answers to your questions above.
>>>
>>>> When someone clever comes along and writes a new sorting algorithm, I
>>>> can plug my new container class right into it, and get the algorithm for
>>>> free. Likewise for the guy with the clever new collection class.
>>> things ain't that simple.
>> Collection-oriented library code will care sufficiently about
>> performance that this mix-and-match stuff is not feasible.
>
> what's that supposed to mean? you sayin' stl don't exist?
No, I'm saying that for efficiency, you need to know about the internals
of a data structure to implement a number of collection-oriented
algorithms. Just like the linked list example.
>> Almost
>> anything else doesn't care enough to take only an
>> AssociativeArrayHashSet and not a TreeSet or a LinkedList or a primitive
>> array.
>
> wrong for the reasons above.
They expose very similar interfaces. You might care about which you
choose because of the efficiency of various operations, but most of your
code won't care which type it gets; it would still be correct.
Well, sets have a property that no element appears in them twice, so
that is an algorithmic consideration sometimes.
>>> saw this flick "the devil wears prada", an ok movie but one funny remark stayed with me. "you are in desperate need of chanel."
>>>
>>> i'll paraphrase. "you are in desperate need of stl." you need to learn stl and then you'll figure why you can't plug a new sorting algorithm into a container. you need more guarantees. and you need iterators.
>>>
>>>> We don't bat an eye at the idea of containers & algorithms connecting to
>>>> one another using a reciprocal set of interfaces.
>>> i do. it's completely wrong. you need iterators that broker between containers and algos. and iterators must give complexity guarantees.
>> I don't. If I'm just going to iterate through the items of a collection,
>> I only care about the opApply. If I need to index stuff, I don't care if
>> I get a primitive array or Bob Dole's brand-new ArrayList class.
>
> you too are in desperate need for stl.
You are in desperate need of System.Collections.Generic. Or
tango.util.container.

Benji Smith:
> It looks like D mostly falls down when it has to allocate a lot of
> memory, even if it's just allocating slices. The D performance for
> string splitting really surprised me.
String splitting requires lot of work for the GC. The GC of HotSpot is light years ahead of the current D GC. You can see that measuring just the time the D GC takes to deallocate a large array of the splitted substrings.
I have posted several benchmarks here about this topic.
> I was interested to see, though, that Java was so much faster at
> iterating through the characters in a string, since I used the charAt(i)
> method of the CharSequence interface, rather than directly iterating
> through a char[] array, or even calling the charAt method on the String
> instance.
HotSpot is able to inline lot of virtual methods too, D can't do those things.
> Down with premature optimization! Design the best interfaces possible,
> to enable the most pleasant and flexible programing idioms. The
> performance problems can be solved. :-P
They currently can't be solved by the backends of DMD and GDC, only HotSpot (and maybe the compiler of the dot net on windows) are able to do that. I don't know if LLVM will be able to perform some of those things.
Bye,
bearophile

Walter Bright:
> > We've already observed that D assoc arrays are less performant than even
> > Python maps, so the extra cost of lookup operations is unwelcome.
>
> Every one of those benchmarks that purported to show that D AA's were
> relatively slow turned out to be, on closer examination, D running the
> garbage collector more often than Python does. It had NOTHING to do with
> the AA's.
Really? I must have missed those conclusions then, despite reading all the posts on the subject. What solutions do you propose for the problem then? I recall that disabling the GC didn't improve the situation much. So the problem now becomes how to improve the D GC?
In my site I am keeping a gallery of tiny benchmarks where D code (with DMD) is 10 or more times slower than very equivalent Python, C, Java code (I have about 12 programs so far, very different from each other. There's a benchmark regarding the associative arrays too). Hopefully it will become useful once people will start tuning D implementations.
Bye,
bearophile