And maybe (b) can be implemented by making gc_alloc / gc_free
overridable function pointers? Then we can override their values
and use scope guards to revert them back to the values they were
before.

Yea, I was thinking this might be a way to go. You'd have a global
(well, thread-local) allocator instance that can be set and reset
through stack calls.
You'd want it to be RAII or delegate based, so the scope is clear.
with_allocator(my_alloc, {
do whatever here
});
or
{
ChangeAllocator!my_alloc dummy;
do whatever here
} // dummy's destructor ends the allocator scope
I think the former is a bit nicer, since the dummy variable is a bit
silly. We'd hope that delegate can be inlined.

Actually, D's frontend leaves something to be desired when it comes to
inlining delegates. It *is* done sometimes, but not as often as one may
like. For example, opApply generally doesn't inline its delegate, even
when it's just a thin wrapper around a foreach loop.
But yeah, I think the former has nicer syntax. Maybe we can help the
compiler with inlining by making the delegate a compile-time parameter?
But it forces a switch of parameter order, which is Not Nice (hurts
readability 'cos the allocator argument comes after the block instead of
before).

But, the template still has a big advantage: you can change the
type. And I think that is potentially enormously useful.

True. It can use different types for different allocators that does (or
doesn't) do cleanups at the end of the scope, depending on what the
allocator needs to do.

Another question is how to tie into output ranges. Take std.conv.to.
auto s = to!string(10); // currently, this hits the gc
What if I want it to go on a stack buffer? One option would be to
rewrite it to use an output range, and then call it like:
char[20] buffer;
auto s = to!string(10, buffer); // it returns the slice of the
buffer it actually used
(and we can do overloads so to!string(10, radix) still works, as
well as to!string(10, radix, buffer). Hassle, I know...)

I think supporting the multi-argument version of to!string() is a good
thing, but what to do with library code that calls to!string()? It'd be
nice if we could somehow redirect those GC calls without having to comb
through the entire Phobos codebase for stray calls to to!string().
[...]

The fun part is the output range works for that, and could also work
for something like this:
struct malloced_string {
char* ptr;
size_t length;
size_t capacity;
void put(char c) {
if(length >= capacity)
ptr = realloc(ptr, capacity*2);
ptr[length++] = c;
}
char[] slice() { return ptr[0 .. length]; }
alias slice this;
mixin RefCounted!this; // pretend this works
}
{
malloced_string str;
auto got = to!string(10, str);
} // str is out of scope, so it gets free()'d. unsafe though: if you
stored a copy of got somewhere, it is now a pointer to freed memory.
I'd kinda like language support of some sort to help mitigate that
though, like being a borrowed pointer that isn't allowed to be
stored, but that's another discussion.

Nice!

And that should work. So then what we might do is provide these
little output range wrappers for various allocators, and use them on
many functions.
So we'd write:
import std.allocators;
import std.range;
// mallocator is provided in std.allocators and offers the goods
OutputRange!(char, mallocator) str;
auto got = to!string(10, str);

I like this. However, it still doesn't address how to override the
default allocator in, say, Phobos functions.

What's nice here is the output range is useful for more than just
allocators. You could also to!string(10, my_file) or a delegate,
blah blah blah. So it isn't too much of a burden, it is something
you might naturally use anyway.

Now *that* is a very nice idea. I like having a way of bypassing using a
string buffer, and just writing the output directly to where it's
intended to go. I think to() with an output range parameter definitely
should be implemented. It doesn't address all of the issues, but it's a
very big first step IMO.

Also, we may have the problem of the wrong allocator
being used to free the object.

Another reason why encoding the allocator into the type is so nice.
For the minimal D I've been playing with, the idea I'm running with
is all allocated memory has some kind of special type, and then
naked pointers are always assumed to be borrowed, so you should
never store or free them.

Interesting idea. So basically you can tell which allocator was used to
allocate an object just by looking at its type? That's not a bad idea,
actually.

// but....
struct A {
char[] lol; // not allowed, because you don't know when lol is
going to be freed
}
foo frees itself with refcounting.

This is a bit inconvenient. So your member variables will have to know
what allocation type is being used. Not the end of the world, of course,
but not as pretty as one would like.
On Wed, Jun 26, 2013 at 03:24:57AM +0200, Adam D. Ruppe wrote:

I was just quickly skimming some criticism of C++ allocators, since
my thought here is similar to what they do. On one hand, maybe D can
do it right by tweaking C++'s design rather than discarding it.
On the other hand, with all the C++ I've done, I have never actually
used STL allocators, which could say something about me or could say
something about them.
One thing I saw said making the differently allocated object a
different type sucks. ...but must it? The complaint there was "so
much for just doing a function that takes a std::string". But, the
way I'd want to do it in D is the function would take a char[]
instead, and our special allocated type provides that via opSlice
and/or alias this.

Yeah I think alias this adds a whole new factor into the equation. The
advantage of having a distinct type makes it much easier to implement,
and allows you to mix differently-allocated objects without having to
worry about things like calling the right version of gc_free to cleanup
properly. You can even have the same underlying data type be allocated
in two different ways, and the cleanup will happen correctly.
Basically, when you allocate some object O of class C using allocator A,
then it follows that no matter what you do with the gc_alloc/gc_free
function pointers afterwards, O must be freed using A.free. So in a
sense, O needs to carry around a function pointer to A.free in its dtor
(or whoever frees it). So this actually argues for having a distinct
type for an instance of C allocated using A, vs. an instance of C
allocated using a different allocator B. You need to store that function
pointer to A.free and B.free *somewhere*, otherwise things won't work
properly.
[...]

Anyway, bottom line is I don't think that criticism necessarily
applies to D.

Agreed, in D, distinct types per allocator is, at the very least, not as
bad as it is in C++.

But there's surely many others and I'm more or less a
n00b re c++'s allocators so idk yet.

Who *isn't* a n00b wrt to C++'s allocators, since so few people actually
use it? :-P
T
--
He who sacrifices functionality for ease of use, loses both and deserves
neither. -- Slashdotter

I think supporting the multi-argument version of to!string() is
a good thing, but what to do with library code that calls
to!string()? It'd be nice if we could somehow redirect those GC
calls without having to comb through the entire Phobos codebase
for stray calls to to!string().

Let's consider what kinds of allocations we have. We can break
them up into two broad groups: internal and visible.
Internal allocations, in theory, don't matter. These can be on
the stack, the gc heap, malloc/free, whatever. The function
itself is responsible for their entire lifetime.
Changing these either optimize, in the case of reusing a region,
or leak if you switch it to manual and the function doesn't know
it.
Visible allocations are important because the caller is
responsible for freeing them. Here, I really think we want the
type system's help: either it should return something that we
know we're responsible for, or take a buffer/output range from us
to receive the data in the first place.
Either way, the function signature should reflect what's going on
with visible allocations. It'd possibly return a wrapped type and
it'd take an output range/buffer/allocator.
With internals though, the only reason I can see why you'd want
to change them outside the function is to give them a region of
some sort to work with, especially since you don't know for sure
what it is doing - these are all local variables to the
function/call stack. And here, I don't think we want to change
the allocator wholesale.
At most, we'd want to give it hints that what we're doing are
short lived. (Or, better yet, have it figure this out on its own,
like a generational gc.)
So I think this is more about tweaking the gc than replacing it,
at most adding a couple new functions to it:
GC.hint_short_lived // returns a helper struct with a static
refcount:
TempGcAllocator {
static int tempCount = 0;
static void* localRegion;
this() { tempCount++; } // pretend this works
~this() { tempCount--; if(tempCount == 0)
gc.tryToCollect(localRegion); }
T create(T, Args...)(Args args) { return GC.new_short_lived
T(args); }
}
and gc.tryToCollect() does a quick scan for anything into the
local region. If there's nothing in there, it frees the whole
thing. If there is, in the name of memory safety, it just
reintegrates that local region into the regular memory and gc's
its components normally.
The reason the count is static is that you don't have to pass
this thing down the call stack. Any function that wants to adapt
to this generational hint system just calls hint_short_lived. If
you're a leaf function, that's ok, the static count means you'll
inherit the region from the function above you.
You would NOT use this in main(), as that defeats the purpose.

I think to() with an output range parameter definitely
should be implemented.

No doubt about it, we should aim for most phobos functions not to
allocate at all, if given an output range they can use.

Interesting idea. So basically you can tell which allocator was
used to allocate an object just by looking at its type?

Right, then you'll know if you have to free() it. (Or it can free
itself with its destructor.)

This is a bit inconvenient. So your member variables will have
to know what allocation type is being used. Not the end of the
world, of course, but not as pretty as one would like.

Yeah, you'd need to know if you own them or not too (are you
responsible for freeing that string you just got passed? If no,
are you sure it won't be freed while you're still using it?), but
I just think that's a part of memory management you can't
sidestep.
There's two easy answers: 1) always make a private copy of
anything you store (and perhaps write to) or 2) use a gc and
trust it to always be the owner.
In any other case, I think you *have* to think about it, and the
type telling you can help you make that decision.

and allows you to mix differently-allocated objects without
having to

Important to remember though that you are borrowing these
references, not taking ownership.
I think the rule of all pointers/slices are borrowed is fairly
workable though. With the gc, that's ok, you don't own anything.
The garbage collector is responsible for it all, so store away.
(Though if it is mutable, you might want to idup it so you don't
get overwritten by someone else. But that's a separate question
from allocation method.... and already encoded in D's type
system).
So never free() a naked pointer, unless you know what you're
doing like interfacing with a C library, prefer to only free a
ManuallyAllocated!(pointer).
hell a C library binding could change the type too, it'd still be
binary compatible. RefCounted!T wouldn't be, but
ManuallyAllocated!T would just be a wrapper around T*.
I think I'm starting to ramble!