Community

So I've been thinking about this problem off and on for awhile,
and I think I have a rather simple solution. I highly doubt I'm
the first to think of it, but haven't found any info about a
similar idea on the net. So in the interest of science I'm
posting it here so anyone interested can attempt to poke holes in
the idea, which you folks have been so good at in the past.
This isn't really D related, but I usually like the feedback I
get from this community, plus, if there's any value in the idea,
maybe it would be a worth while discussion for future directions
D might make use of.
I'll be illustrating with Pseudo-code. The basic idea spawns from
conceptually separating variables as either "memory owners" or
"memory references". Basically, the pseudo-code has three types:
var - Memory Owner, never null, deleted at end of scope**
ref - Memory Reference, can't "own" memory
ptr - Unsafe manually managed memory pointer (like C pointers)
So, 'ptr's aside for now, 'var's would be the only way to
allocate memory, and memory is automatically allocated to them at
their definition**. Vars would be deleted at the end of their
scope (or when they're removed from dynamic-arrays, etc) unless
they're returned, in which case they're "injected" into the
callers scope, and deleted at the end of that scope. Without too
much lengthy explanation, here's some code:
type Foo
{
var bar = 0
}
func main
{
ref f : Foo
ref b : int
scope
{
var foo = Foo # allocates Foo
f => foo # f points to foo
f.bar = 1 # foo.bar = 1
b => f.bar # b points to foo.bar
b = 3 # foo.bar = 3
# auto 'clean up code' at end of scope:
#
# Memory.delete(foo)
# f => null
#
}
f.bar = 4 # ERROR: f is null
var n = 6
b => n
b = 7 # n = 7
# No clean-up code needed, 'n' is stack-allocated
#
# Also, notice we didn't need to set 'b' to 'null' at
# the end of the scope, since it wasn't accessed
# directly after the end of the scope.
}
So this is sort of like ref-counted scopes (smart scopes), except
the compiler has a higher degree of guarantee on the lifetime of
memory, and there's a lot of static analysis that can happen to
optimize the clean up code (like how we didn't need to assign 'b'
to null in the example above). All "ref counting" would be done
through static-analysis at compile-time, and injected by the
compiler, so it wouldn't have the runtime overhead of
ref-counting, nor would there be a need for "weak references"
since references are guaranteed to never "own" memory.
There's more to it of course. For instance, in order to have
expanding memory, you'd need the concepts of a dynamic-array and
type inheritance. You'd also need to not clean up memory that was
returned... so...
type Foo
{
var name : text
init { name = "Foo" }
func write { Console.write(name) }
}
type Bar : Foo
{
init { name = "Bar" }
}
ref global_foo : Foo
func main
{
var foos = Foo[]
foos += Foo # allocation
foos += Foo # allocation
foos += Bar # allocation
foos[].write() # prints: 'Foo', 'Foo', 'Bar'
foos -= foos[1] # deallocation
foos[].write() # prints: 'Foo', 'Bar'
# ---
func getFoo
{
var f = Foo
f.name = "Fip"
return f
# no cleanup, since 'f' is returned
}
global_foo = getFoo()
global_foo.write() # prints: 'Fip'
# auto cleanup:
#
# Memory.delete(foos)
# Memory.delete(global_foo)
# global_foo => null
#
}
Okay, so with that you could have global var arrays, and assign
global refs to function return values, etc. If you need extra
control, there's also 'ptr' types, which would require manual
clean-up. However, 'ptr's would be automatically cleaned when the
program exist, so they would be safe to use for global references
which only sometimes needed to point to usable memory.
Anyways.. comments and criticisms would be great. Sorry this
isn't more D related. Like I said, I hope it does have value D
might make use of at some point.

On Mon, Jan 21, 2013 at 06:37:36PM +0100, F i L wrote:
> So I've been thinking about this problem off and on for awhile, and
> I think I have a rather simple solution. I highly doubt I'm the
> first to think of it, but haven't found any info about a similar
> idea on the net. So in the interest of science I'm posting it here
> so anyone interested can attempt to poke holes in the idea, which
> you folks have been so good at in the past.
>
> This isn't really D related, but I usually like the feedback I get
> from this community, plus, if there's any value in the idea, maybe
> it would be a worth while discussion for future directions D might
> make use of.
>
> I'll be illustrating with Pseudo-code. The basic idea spawns from
> conceptually separating variables as either "memory owners" or
> "memory references". Basically, the pseudo-code has three types:
>
> var - Memory Owner, never null, deleted at end of scope**
> ref - Memory Reference, can't "own" memory
> ptr - Unsafe manually managed memory pointer (like C pointers)
[...]
I invented this scheme (independently -- I'm sure the concept has been
thought of before) way back when I was coding in C and C++.
Of course, I didn't have language/compiler support, but I did adopt a
convention of marking every pointer with [O] (for owner) and [R] (for
reference) in comments, in all my code. Owner pointers can never be
assigned a reference pointer, and owner pointers can only be
destructively copied (the original must be set to NULL after copying).
Owner pointers must be free()'d if they are non-NULL and going out of
scope (unless they are returned from the function, in which case I
consider it as a conceptual destructive copy followed by NULLing). I did
allow owner pointers to be NULL -- mainly because it is required for
destructive copy to be sound.
In newer versions of C++ (I invented this scheme back in the 90's when
templates and STL were only starting to emerge), owner pointers are
essentially equivalent to STL's auto_ptr<T>.
This was essentially as far as I developed the concept -- and probably
it's as far as it will get, because the concept has limitations:
1) It required code to be manually written to deal with owner pointers
correctly (auto_ptr mostly made this automatic, but it is certainly NOT
a drop-in replacement for pointers);
2) It can't deal with cyclic structures correctly;
3) It is still unsafe: you can have a dangling reference to owned
memory, because the owner pointer goes out of scope and the memory gets
deallocated, but there can still be references lingering around
somewhere.
IOW, it doesn't solve the problem of memory management. It is one level
up from having no distinction whatsoever between pointers (and scarily
enough, I've seen "enterprise" code that was obviously written without
any such distinction in mind), for sure, but it's not much more than
that.
T
--
MACINTOSH: Most Applications Crash, If Not, The Operating System Hangs

On Monday, 21 January 2013 at 17:37:38 UTC, F i L wrote:
> This isn't really D related, but I usually like the feedback I
> get from this community, plus, if there's any value in the
> idea, maybe it would be a worth while discussion for future
> directions D might make use of.
>
I haven´t found any hole or improvement suggestion yet, my only
comment is, I`d really love to use a system like this.

On 2013-01-21 18:37, F i L wrote:
> So I've been thinking about this problem off and on for awhile, and I
> think I have a rather simple solution. I highly doubt I'm the first to
> think of it, but haven't found any info about a similar idea on the net.
> So in the interest of science I'm posting it here so anyone interested
> can attempt to poke holes in the idea, which you folks have been so good
> at in the past.
Seems a bit like ARC (Automatic Reference Counting) that Clang has
implemented for Objective-C (and some C libraries on Mac OS X). The
compiler automatically inserts calls to "retain" and "release" where
appropriate.
--
/Jacob Carlborg

F i L wrote:
> [...code...]
> global_foo = getFoo()
> [...code...]
Typo, that should have been:
global_foo => getFoo()
H. S. Teoh wrote:
> [...] because the concept has limitations:
>
> 1) It required code to be manually written to deal with owner
> pointers
> correctly (auto_ptr mostly made this automatic, but it is
> certainly NOT
> a drop-in replacement for pointers);
I don't do a whole lot of C++, so I don't know a lot about their
smart_ptr types, however, I doubt you could do anything like my
idea in C++ (without a lot of extra syntax like you described)
since it requires a lot of automatic per-scope bookkeeping I
doubt C++ types alone could accomplish.
> 2) It can't deal with cyclic structures correctly;
That's why there would be a 'ptr' type :)
> 3) It is still unsafe: you can have a dangling reference to
> owned
> memory, because the owner pointer goes out of scope and the
> memory gets
> deallocated, but there can still be references lingering around
> somewhere.
This is why there would need to be a lot of magic happening in
the compiler. When the compiler can't know, for sure, weather a
reference that was assigned to a local-scope var is *still*
referencing that var at the end of the scope (pretty common),
then it asks before setting assigning to null. If the compiler
can't determine, at all, which refs will be pointing to a
particular local-var, it falls back to regular ref-counting (for
the var in question).
I've given it a bit of thought, and while runtime checks and even
regular ref-counting wouldn't be too uncommon, due to it's
var/ref design there would be a great many places where the
compiler could optimize away all the runtime checks and you
normally wouldn't have to deal with the cyclic-ref problem
ref-counting memory-management usually bring to the table
(granted ptr's aren't much better). Also, I can't remember
exactly, but because var's are tied to scopes, I think there's
more optimization that can happen with ref-counting because it
happens at the scope level... more thoughts needed here though.
One benefit of this I forgot to mention is that, like
ref-counting, memory is deterministic (with optional
late-collection). That wou
Thanks for your feedback!
Arne wrote:
> I haven´t found any hole or improvement suggestion yet, my only
> comment is, I`d really love to use a system like this.
Me too! Let me know if you think of any flaws.
Jacob Carlborg wrote:
> Seems a bit like ARC (Automatic Reference Counting) that Clang
> has implemented for Objective-C (and some C libraries on Mac OS
> X). The compiler automatically inserts calls to "retain" and
> "release" where appropriate.
Interesting, I will have to look into that. Thanks for the info.

F i L wrote:
> ... Also, I can't remember exactly, but because var's are tied
> to scopes, I think there's more optimization that can happen
> with ref-counting because it happens at the scope level... more
> thoughts needed here though.
Ah right I remember. Since vars only ever get collected at the
end of a scope, you don't need to check for zero-refs on every
dereference, only at the end of the var's scope.

On Mon, Jan 21, 2013 at 09:27:20PM +0100, F i L wrote:
[...]
> H. S. Teoh wrote:
[...]
> >3) It is still unsafe: you can have a dangling reference to owned
> >memory, because the owner pointer goes out of scope and the memory
> >gets deallocated, but there can still be references lingering around
> >somewhere.
>
> This is why there would need to be a lot of magic happening in the
> compiler. When the compiler can't know, for sure, weather a reference
> that was assigned to a local-scope var is *still* referencing that var
> at the end of the scope (pretty common), then it asks before setting
> assigning to null. If the compiler can't determine, at all, which refs
> will be pointing to a particular local-var, it falls back to regular
> ref-counting (for the var in question).
[...]
Hmm. So you're essentially saying that the compiler will do compile-time
analysis of variable usage, and based on that choose the simplest memory
management model for it?
That's not a bad idea. You can have a series of increasingly complex
management techniques, and the compiler could choose the least complex
one that it can statically prove is safe. So if something can be proven
not to escape the current scope, it can even be stack-allocated (or if
it's big, put it on the manual heap and delete it on scope exit). If it
is returned with no other references to it, then it can be a
single-reference pointer (i.e. your owner type). If it has cyclic
references, use a reference-counted pointer. If all else fails, fall
back to the GC.
The problem is that quite often, the compiler will not be able to prove
much about the data structure, esp. when you have non-trivial
manipulation of pointers -- it requires solving the halting problem to
achieve that level of analysis on arbitrary code -- so it will probably
fall back to the GC quite often, depending on what the code does.
T
--
My program has no bugs! Only undocumented features...

Am 21.01.2013 22:25, schrieb H. S. Teoh:
> On Mon, Jan 21, 2013 at 09:27:20PM +0100, F i L wrote:
> [...]
>> H. S. Teoh wrote:
> [...]
>>> 3) It is still unsafe: you can have a dangling reference to owned
>>> memory, because the owner pointer goes out of scope and the memory
>>> gets deallocated, but there can still be references lingering around
>>> somewhere.
>>
>> This is why there would need to be a lot of magic happening in the
>> compiler. When the compiler can't know, for sure, weather a reference
>> that was assigned to a local-scope var is *still* referencing that var
>> at the end of the scope (pretty common), then it asks before setting
>> assigning to null. If the compiler can't determine, at all, which refs
>> will be pointing to a particular local-var, it falls back to regular
>> ref-counting (for the var in question).
> [...]
>
> Hmm. So you're essentially saying that the compiler will do compile-time
> analysis of variable usage, and based on that choose the simplest memory
> management model for it?
>
> That's not a bad idea. You can have a series of increasingly complex
> management techniques, and the compiler could choose the least complex
> one that it can statically prove is safe. So if something can be proven
> not to escape the current scope, it can even be stack-allocated (or if
> it's big, put it on the manual heap and delete it on scope exit). If it
> is returned with no other references to it, then it can be a
> single-reference pointer (i.e. your owner type). If it has cyclic
> references, use a reference-counted pointer. If all else fails, fall
> back to the GC.
>
> The problem is that quite often, the compiler will not be able to prove
> much about the data structure, esp. when you have non-trivial
> manipulation of pointers -- it requires solving the halting problem to
> achieve that level of analysis on arbitrary code -- so it will probably
> fall back to the GC quite often, depending on what the code does.
>
>
> T
>
I see how this could work in a JIT environment, but not in a static
compiler with modules compiled in separate ways.
--
Paulo

On 2013-01-21 19:22:04 +0000, Jacob Carlborg <doob@me.com> said:
> On 2013-01-21 18:37, F i L wrote:
>> So I've been thinking about this problem off and on for awhile, and I
>> think I have a rather simple solution. I highly doubt I'm the first to
>> think of it, but haven't found any info about a similar idea on the net.
>> So in the interest of science I'm posting it here so anyone interested
>> can attempt to poke holes in the idea, which you folks have been so good
>> at in the past.
>
> Seems a bit like ARC (Automatic Reference Counting) that Clang has
> implemented for Objective-C (and some C libraries on Mac OS X). The
> compiler automatically inserts calls to "retain" and "release" where
> appropriate.
And I'll confirm that it's no magic either. Avoiding cyclic references
isn't that easy when you start mixing implicitly retained pointers with
Objective-C's blocks (lambas in C++ or delegate literals in D).
I'm a little sad Apple decided to deprecate its Objective-C garbage
collector. Having the choice between GC and reference counting was
great (although it surely sucks for library developers who need to
support both modes).
--
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca/

On Mon, Jan 21, 2013 at 11:36:58PM +0100, Paulo Pinto wrote:
> Am 21.01.2013 22:25, schrieb H. S. Teoh:
[...]
> >Hmm. So you're essentially saying that the compiler will do
> >compile-time analysis of variable usage, and based on that choose the
> >simplest memory management model for it?
> >
> >That's not a bad idea. You can have a series of increasingly complex
> >management techniques, and the compiler could choose the least
> >complex one that it can statically prove is safe. So if something can
> >be proven not to escape the current scope, it can even be
> >stack-allocated (or if it's big, put it on the manual heap and delete
> >it on scope exit). If it is returned with no other references to it,
> >then it can be a single-reference pointer (i.e. your owner type). If
> >it has cyclic references, use a reference-counted pointer. If all
> >else fails, fall back to the GC.
> >
> >The problem is that quite often, the compiler will not be able to
> >prove much about the data structure, esp. when you have non-trivial
> >manipulation of pointers -- it requires solving the halting problem
> >to achieve that level of analysis on arbitrary code -- so it will
> >probably fall back to the GC quite often, depending on what the code
> >does.
[...]
> I see how this could work in a JIT environment, but not in a static
> compiler with modules compiled in separate ways.
[...]
It can be made to work with separately-compiled modules if there's
language support (say in the form of exported pointer attributes). But
probably not feasible in the current version of D.
T
--
Designer clothes: how to cover less by paying more.