There is a simple follow on patch that could enable StringImpl to automatically share the buf from a UString, but I'm slightly more worried about potential memory regressions (due to UString's buffers being bigger than StringImpl), so I wanted to get in this part first.

If anyone wants to start looking at the diff, it would be great.
However, I'm looking at changing making some changes to mrowe's comments in irc. My summary: Basically "SharedDataHolder" is confusing and badly named (and I goofed in the changelog "visa versa" should be "vice versa").
So I've be changing SharedDataHolder some how to make it clearer in how to use it (and likely changing its name).

Created attachment 26550[details]
Attempted to make the SharedDataHolder usage more clear.
This has all known changes done to it.
There is one known issue with the change:
JavaScriptCore/wtf/SharedDataHolder.h no longer matches the contents of the file.
This means that I should change the name of the file and the references to it in various places (build files, includes, and ChangeLog). However, in case there is more feedback about the naming of this class, I avoid those changes for the moment.

Comment on attachment 26609[details]
Part 2: Refactor UString::Rep into two classes which reduces memory usage for the base Rep class.
This looks good.
> +static const unsigned numCharactersToStore = 0x100; // Note that SmallStrings::m_singleCharacterStrings should be the same size.
We have COMPILE_ASSERT so that such things can be checked rather than just mentioned in comments.
> +} // namespace JSC
> +
> +#endif // SmallStrings_h
If the patch is going to touch some files just to add comments, then I will be in full nitpick mode. I suggest only one space before "//" to match the style elsewhere, rather than two spaces.
> - rep->capacity = newCapacity;
> + rep->baseString()->capacity = newCapacity;
I would have expected you to use "base" here rather than re-fetching rep->baseString(). Is this better in some way?
> - bool baseIsSelf() const { return baseString == this; }
> + bool baseIsSelf() const { return m_identifierTableAndFlags.isFlagSet(BaseStringFlag); }
Why can't we keep the old definition, but move it down in the file where it can see the BaseString class definition? Or can't this check if baseString is zero instead? Do we really need the bit in the flags? Does the bit help performance or something?
> - UChar* data() const { return baseString->buf + baseString->preCapacity + offset; }
> + UChar* data() const;
I'm surprised this can be non-inline without hurting performance slightly.
> + void setBaseString(PassRefPtr<BaseString> base) { ASSERT(base != this); m_baseString = base.releaseRef(); }
> + BaseString* baseString() { return reinterpret_cast<BaseString*>(baseIsSelf() ? this : m_baseString); }
> + const BaseString* baseString() const { return const_cast<const BaseString*>(const_cast<Rep*>(this)->baseString()); }
It's annoying to have these reinterpret_cast here. Instead you should just define these inline functions outside the class definition, in a place that can see the definition of BaseString.
> + void* m_baseString; // If "this" is a BaseString instance, it is 0. BaseString* otherwise.
Seems really ugly to have this be a void*. Can't it be a BaseString* instead? When is the fact that it's a void* helpful?
> +
> + static BaseString& null() { return *nullBaseString; }
> + static BaseString& empty() { return *emptyBaseString; }
> + private:
> + friend void initializeUString();
I'd prefer a blank line before the private line.
> + inline UChar* UString::Rep::data() const
> + {
> + const UString::BaseString* base = baseString();
> + return base->buf + base->preCapacity + offset;
> + }
Do you need the UString:: qualifier on BaseString here?
> + UString::BaseString* base = m_rep->baseString();
And here?
I'm going to say review- because of the reinterpret_cast question, and the question of whether we really do need a bit to tell whether something is a base string. But this does seem like a great change and about ready to land.

Created attachment 26618[details]
Addressed comments -- Part 2: Refactor UString::Rep into two classes which reduces memory usage for the base Rep class.
> > +static const unsigned numCharactersToStore = 0x100; // Note that SmallStrings::m_singleCharacterStrings should be the same size.
>We have COMPILE_ASSERT so that such things can be checked rather than just mentioned in comments.
Fixed.
> > +} // namespace JSC
> > +
> > +#endif // SmallStrings_h
> I suggest only one space before "//" to match the style elsewhere, rather than two spaces.
Fixed.
> > - rep->capacity = newCapacity;
> > + rep->baseString()->capacity = newCapacity;> I would have expected you to use "base" here rather than re-fetching
> rep->baseString(). Is this better in some way?
rep has just been recreated, so the local variable base is no longer valid at this point.
> > + UChar* data() const;
> I'm surprised this can be non-inline without hurting performance slightly.
Still inlined just not defined in the class definition.
> > + void setBaseString(PassRefPtr<BaseString> base) { ASSERT(base != this); m_baseString = base.releaseRef(); }
> > + BaseString* baseString() { return reinterpret_cast<BaseString*>(baseIsSelf() ? this : m_baseString); }
> > + const BaseString* baseString() const { return const_cast<const BaseString*>(const_cast<Rep*>(this)->baseString()); }> It's annoying to have these reinterpret_cast here. Instead you should just
> define these inline functions outside the class definition, in a place that can
> see the definition of BaseString.
Fixed.
> > + static BaseString& empty() { return *emptyBaseString; }
> > + private:
> > + friend void initializeUString();
> I'd prefer a blank line before the private line.
Fixed.
> + inline UChar* UString::Rep::data() const
> + {
> + const UString::BaseString* base = baseString();
> Do you need the UString:: qualifier on BaseString here?
Fixed.
> > + UString::BaseString* base = m_rep->baseString();
>And here?
Fixed.
> > - bool baseIsSelf() const { return baseString == this; }
>> + bool baseIsSelf() const { return m_identifierTableAndFlags.isFlagSet(BaseStringFlag); }> Why can't we keep the old definition, but move it down in the file where it can...
and
> > + void* m_baseString; // If "this" is a BaseString instance, it is 0. BaseString* otherwise.
> Seems really ugly to have this be a void*. Can't it be a BaseString* instead?
> When is the fact that it's a void* helpful?
and
> I'm going to say review- because of the reinterpret_cast question, and the
> question of whether we really do need a bit to tell whether something is a base
> string.
In the next change, I'll make this point to something else if it is a BaseString. This is the reason for the flag and putting 0 in there in the case of the BaseString and for making a it void* that no code can use directly.
Why re-use this field? In order to not make BaseString any bigger. (I've been under the impression that this is desired.)

(In reply to comment #14)
> In the next change, I'll make this point to something else if it is a
> BaseString. This is the reason for the flag and putting 0 in there in the case
> of the BaseString and for making a it void* that no code can use directly.
A union is typically better than a void* to use the same storage for pointers of two different types.

Comment on attachment 26618[details]
Addressed comments -- Part 2: Refactor UString::Rep into two classes which reduces memory usage for the base Rep class.
> + COMPILE_ASSERT(numCharactersToStore == sizeof(SmallStrings::m_singleCharacterStrings) / sizeof(SmallStrings::m_singleCharacterStrings[0]),
> + WTF_IsNumCharactersConstInSyncWithClassUsage);
No need to put that WTF_ prefix on the COMPILE_ASSERT argument. That name is only used for a local symbol, which can't conflict with anything outside this function. The only place you *might* want that prefix would be if you used COMPILE_ASSERT in a header, and even there I don't think it’s needed.
Do you need those SmallStrings:: qualifiers? I'd think this would compile without them since we’re inside a SmallStrings constructor definition.
> + void setBaseString(PassRefPtr<BaseString> base);
This is a classic case where we would not name the argument in the function declaration, since the argument's type and the function name make it perfectly clear without a name.
r=me

Created attachment 26629[details]
Part3: Allow UString to share a UChar* with another class.
The underlying class used for sharing also allows for sharing across threads which will be used in StringImpl::copy.

I currently like the names
FastRefCountForThreads
or
FastThreadableRefCount
instead of
RefCountedMalloced
but I'd like a second opinion before I change it. (The fact that it hold something that was malloc'ed is easily changeable if it is ever desired to be used to for another allocation type.)

(In reply to comment #21)
> I currently like the names
> FastRefCountForThreads
> or
> FastThreadableRefCount
>
> instead of
> RefCountedMalloced
>
> but I'd like a second opinion before I change it. (The fact that it hold
> something that was malloc'ed is easily changeable if it is ever desired to be
> used to for another allocation type.)
I like the general direction here; the new suggested names seem better.
Despite our unfortunate use of the word "fast" in "fastMalloc", I generally would like to avoid the word "fast". It makes me want to ask the question, "When we would I ever *not* use this?" I don't want my other code to be slow!
This is easier to do in person, but generally I like to describe the class and what it's for in prose, and then select words from that discussion to name the class. That exercise often works well.

> describe the class and what it's for in prose
Here's my attempt:
The class is to allow for sharing an item with multiple objects across threads that can each control its lifetime independently (using a ref counting scheme) which is similar to RefCounted.
It particularly useful in performance critical scenarios where ref/deref are done often without significantly impacting performance and avoids using atomic* until absolutely necessary (unlike ThreadSafeShared which does it on every call).
In return, the way to use it is slightly more complicated than ThreadSafeShared. An instance of the class is not threadsafe, but it is designed to be usable in a multi-thread scenario. The code has to call ::copy() to get another instance which can be used on another thread (while still holding on to the same object).

Created attachment 27266[details]
Part 3: Add CrossThreadRefCounted.
Ready for review.
Also I broke up the previous (part 3) patch into something smaller with the hope that it will be easier to review like this, and came up with a better name for my class. The class isn't in the build yet because it isn't used in this patch. (It will be used in the next patch.)

Comment on attachment 27266[details]
Part 3: Add CrossThreadRefCounted.
> + // ThreadSafeShared can have a siginificant perf impact when used in low level classes
Typo here in the spelling of "significant".
> + // data is freed using *fastFree*.
I believe this comment is wrong. I think the data is deallocated with delete, not fastFree.
> + // Used to make an instance that can be used on another thread.
> + PassRefPtr<CrossThreadRefCounted<T> > copy();
I think this naming is getting increasingly unclear. The functions that copy objects to be used on other threads probably need a name that somehow talks about threads rather then just the word "copy".
> + bool isShared() const { return !m_refCounter.hasOneRef() || dataAccessMustBeThreadSafe(); }
I don't understand the semantics of this function. When would you call it and what would you do based on the result.
> + bool dataAccessMustBeThreadSafe() const { return m_threadSafeRefCounter && !m_threadSafeRefCounter->hasOneRef(); }
Same question.
> + ~CrossThreadRefCounted()
> + {
> + if (m_data && !m_threadSafeRefCounter)
> + delete m_data;
> + }
The null check of m_data is redundant with what the compiler will already do automatically, so I suggest omitting it.
> +#if USE(LOCKFREE_THREADSAFESHARED)
> + if (atomicDecrement(&m_refCount) <= 0)
> +#else
> + {
> + MutexLocker locker(m_mutex);
> + --m_refCount;
> + }
> + if (m_refCount <= 0)
> +#endif
> + return true;
I think this would read more clearly if the return statement was repeated isnide the #if.
r=me