How common are circular references? The less common they are, the fewer hard cases you have if you are writing in a language with only reference counting-GC. Are there any cases where it wouldn't work well to make one of the references a "weak" reference so that reference counting still works?

It seems like you should be able to have a language only use reference counting and weak references and have things work just fine most of the time, with the goal of efficiency. You could also have tools to help you detect memory leaks caused by circular references. Thoughts, anyone?

It seems that Python uses references counting (I don't know if it uses a tracing collector occasionally or not for sure) and I know that Vala uses reference counting with weak references; I know that it's been done before, but how well would it work?

With the goal of efficiency, garbage collection can be more efficient. Just consider a system that generates many short-lived objects; releasing all of them takes some effort. On the other hand, a generational garbage collector just transfers the surviving objects to the next generation, thus avoiding many separate deallocations.
– user281377Dec 28 '10 at 15:33

Does anyone know of any research that indicates where existing methods are most efficient?
– compmanDec 28 '10 at 15:34

garbage collection can even be more efficient than malloc/free in some circumstances.
– dan_waterworthDec 28 '10 at 15:39

It seems that there is some confusion about what you are asking. All the answers so far that have answered the language design part of your question, seem to have completely misunderstood the question. They are talking about manual memory management using reference counting vs. any kind of automatic memory management, while you are asking about automatic memory management using a tracing garbage collector vs. automatic memory management using a reference counting garbage collector. Those are two completely different questions! You may want to edit your question to make it more clear.
– Jörg W MittagDec 28 '10 at 20:46

8 Answers
8

Weak references work great for breaking loops, as long as it's clear a loop is being made and as long as the developer makes provision for it with a weak reference. This isn't the same as a more robust GC, which is intended to handle such things automatically so the developer doesn't have to think about them at all.

Is it enough of a problem to warrant increased inefficiency? EDIT: That assumes that reference counting is faster, but...
– compmanDec 28 '10 at 15:25

@user9521: Enough of a problem? Maybe. Nor is efficiency the major concern here. C++ uses reference counting in smart pointers to integrate memory handling into RAII, and being able to do that is very useful. Also note that, while reference counting is usually faster, it's less regular. Standard GC can be designed to be spread out the processing, but chained deletes have to be done at a particular time.
– David ThornleyDec 28 '10 at 15:55

actually there are reference counting implementations out there, notably because the "easy" mark-and-sweep has this nasty freeze the world effect. Trouble is that reference counting garbage collectors must implement some cycle detection algorithm, and this one is not easy.
– Matthieu M.Jan 8 '11 at 16:37

@user9521: That's a bad analogy. Different than LISP, GC is widespread, but I'm not talking about whether GC gets wide-spread, but whether a certain implementation of an already wide-spread feature would. A better analogy: If there was a feature in LISP that would make it much faster, wouldn't it be common for LISP implementations by now? (But then it might not, because LISP isn't in demand as much as GC is, so there's less pressure to optimize it. Anyway, this analogy would work.)
– sbiJan 10 '11 at 9:39

They are very common. One ubiquitous example are hierachical structures where each node has a pointer back to its parent, and of course the parent has a collection of its childs. User interfaces toolkits like Swing work like this.

In this case, the existence of the child UI element should not stop the parent UI element from being destroyed, so you could use a normal reference from the parent to the child and a weak reference from the child to the parent, thus fixing the problem. Right?
– compmanDec 28 '10 at 15:19

Yes, weak references could mitigate the problem in that case.
– user281377Dec 28 '10 at 15:28

3

@user Yes, weak references can do this. But as David Thornley wrote, that's not the programmer's job in a language that claims to have a GC.
– user7043Dec 28 '10 at 15:40

You are right, you should be able to do garbage collection only using reference counting. Objective-C on iPhone is an example of a platform that does just that.

Why is it not done in practice? Because sloppy code and memory leaks are a reality and having a computer deal with them automatically enables every average pig to write code that does not constantly run out of memory.

I have run into this problem with entities. Suppose that have two classes, Author and Article. Furthermore I have a render(Author) and render(Article) which return html documents describing either item. The author will have a list of articles and the article will have information about the author.

Author and Article will have a reference cycle. We can attempt to solve this cycle by introducing a weak reference. What if the reference is from Author to Article? Then render(Article) will not keep references to Author alive and thus will the object will be deallocated before it is used to get the Author's name. But, if the reference is from Article to Author that works fine, but render(Author) will not register any references against Article and they will be deallocated before their titles can be fetched.

Introducing weak references solves the problem so long as all of the relationships have a definite direction where the opposite direction is clearly secondary. But some relations do not work that way.

On the other hand: This precise problem only derives in cases where we have associations between objects. That is each object needs to hold a mutual reference to the other. Perhaps the language could support such associations naturally and thus give them special treatment with reference counting. That could be made to work.

2: Data Structures

Data structures very often have reference cycles. The most common example is a doubly-linked list. It is also not uncommon for tree structure to have next/prev pointers in the leaves for quick iteration.

On the other hand: I suspect most such cycles are actually broken by the operation of the data structure. For example, when an object is removed from a linked list, the two references to the node will be reset thus clearing the references.

3: Deterministic Deletes

One reason to use reference counting is that you get deterministic deletes. Objects will be deleted at predictable times. This means you can do things like close files, release locks, etc. because you know the objects will get deleted in a timely manner.

The problem is that this tends to breaks. If I accidentally create a cycle my files do not get closed and my locks do not get released. Furthermore, I receive no errors as a result of doing this.

This isn't really a problem with reference counting. It is a case where the advantage (deterministic deletes) does not really work out leaving me with less reason to want reference counting.

On the other hand: The real issue might be argued to be that we did not have any automated testing to ensure the objects were destroyed. If had such a thing we would be informed when this didn't work.

4: Speed

Another reason to want reference counting is because it is faster. However, it seems this is not actually true. All the increments and decrements are actually pretty expensive. (It gets really bad when you start considering threading.) So its not clear that any speed advantage is gained here. In fact, it may be faster to use another GC technique.

5: Convenience

If working in a reference counting language, I have to worry about not creating reference cycles. I found myself creating cycles on a regular basis. In most circumstances I could fix this by introducing a weak reference. But why bother? There did seem to be some advantages to reference counting but they didn't seem to pan out (See 3&4) As a result, it just didn't seem worth it to fight the cycles.

In Conclusion

Reference Counting simply doesn't seem to deliver enough benefits over typical GC implementations to make up for the extra headache it gives the coders.

Your first example could just be an instance of bad program design. I don't know for sure I haven't seen the source code, but those names don't make it sound like great design. In your second example, no garbage collection system could have saved you.
– compmanJan 5 '11 at 23:03

@user9521, I can't disagree: that design had issues. But those issues weren't really related to the issue at hand. I rewrote the post to explain better.
– Winston EwertJan 6 '11 at 1:10

Note that thread-safe reference counting using atomic decrements is non-deterministic and reference counting is considered to be one of the slowest forms of garbage collection (hence it is almost unheard of in production-quality GCs like HotSpot and the CLR).
– Jon HarropOct 14 '13 at 14:40

Python and Objective-C use reference counting, and they are the languages behind some important works in software. As with any other garbage-collection strategy, reference-counting implementations can be tweaked to make them efficient for the common case.

Languages that opt for reference-counting usually include several important data structures in the language definition or in the standard libraries, so programmers are shielded from thinking about how memory management happens in them.

Only one particular Python implementation uses refcounting (CPython - yes, that's the reference implementation and the one most people use, but it's still an implementation detail), and it does have a real GC as well, which rund periodically and breaks cyclic references.
– user7043Dec 28 '10 at 19:35

You can use reference counting for any object that can never refer to itself, transitively. This is similar to the kind of heuristic analysis you can do to determine that a function is non-reentrant, so can be called without recourse to a stack in a single-threaded environment.

In a type-safe statically typed language, this can be determined statically. If a set of concrete classes can be ordered so that each class's properties can only contain reference to later classes, then that set of types can be reference counted.

So, given a Java point class,

final class Point {
double x;
double y;
}

We know the class contains no references, so a VM could allocate Point instances in a reference counted block of memory as an optimization.

And given a simple class like,

final class Person {
String name;
}

since we know that String cannot refer to a Person, we could do a similar reference-counting optimization for instances of Person.

But the following class cannot be reference counted:

final class Pair {
Object a;
Object b;
}

because a could a pair, or could be an object that refers to a Pair.

The following class cannot be reference counted either:

final class Tuple {
List<Object> items;
}

since items could contain references to Tuples or to objects that can contain references to Tuples.