I was involved in a programming discussion today where I made some statements that basically assumed axiomatically that circular references (between modules, classes, whatever) are generally bad. Once I got through with my pitch, my coworker asked, "what's wrong with circular references?"

I've got strong feelings on this, but it's hard for me to verbalize concisely and concretely. Any explanation that I may come up with tends to rely on other items that I too consider axioms ("can't use in isolation, so can't test", "unknown/undefined behavior as state mutates in the participating objects", etc.), but I'd love to hear a concise reason for why circular references are bad that don't take the kinds of leaps of faith that my own brain does, having spent many hours over the years untangling them to understand, fix, and extend various bits of code.

Edit: I am not asking about homogenous circular references, like those in a doubly-linked list or pointer-to-parent. This question is really asking about "larger scope" circular references, like libA calling libB which calls back to libA. Substitute 'module' for 'lib' if you like. Thanks for all of the answers so far!

15 Answers
15

Circular class references create high coupling; both classes must be recompiled every time either of them is changed.

Circular assembly references prevent static linking, because B depends on A but A cannot be assembled until B is complete.

Circular object references can crash naïve recursive algorithms (such as serializers, visitors and pretty-printers) with stack overflows. The more advanced algorithms will have cycle detection and will merely fail with a more descriptive exception/error message.

I particularly appreciate the last point, "cognitive load" is something that I am very conscious of but never had a great concise term for it.
–
dash-tom-bangOct 14 '10 at 16:40

Good answer. It would be better if you said something about testing. If modules A and B are mutually dependent, they must be tested together. This means they are not really separate modules; together they are one broken module.
–
kevin clineOct 29 '12 at 18:35

@kevincline: While that's a true statement, I haven't been able to convince myself that it's germane to circular references; even a one-way module dependency implies that at least one module must be tested together with another (dependent) module, unless some type of abstraction is used - in which case, the same abstraction would make the circular reference testable. If I'm overlooking something, can you clarify with a specific example?
–
AaronaughtOct 29 '12 at 22:46

A circular reference is twice the coupling of a non-circular reference.

If Foo knows about Bar, and Bar knows about Foo, you have two things that need changing (when the requirement comes that Foos and Bars must no longer know about each other). If Foo knows about Bar, but a Bar doesn't know about Foo, you can change Foo without touching Bar.

Cyclical references can also cause bootstrapping problems, at least in environments that last for a long time (deployed services, image-based development environments), where Foo depends on Bar working in order to load, but Bar also depends on Foo working in order to load.

When you tie two bits of code together, you effectively have one large piece of code. The difficulty of maintaining a bit of code is at least the square of its size, and possibly higher.

People often look at single class (/function/file/etc.) complexity and forget that you really should be considering the complexity of the smallest separable (encapsulatable) unit. Having a circular dependency increases the size of that unit, possibly invisibly (until you start trying to change file 1 and realize that also requires changes in files 2-127).

Hmm... that depends on what you mean by circular dependence, because there are actually some circular dependencies which I think are very beneficial.

Consider an XML DOM -- it makes sense for every node to have a reference to their parent, and for every parent to have a list of its children. The structure is logically a tree, but from the point of view of a garbage collection algorithm or similar the structure is circular.

@Conrad: I suppose it could be thought of as a tree, yes. Why?
–
Billy ONealOct 14 '10 at 2:14

I don't think of tree's as circular because you can navigate down its children and will terminate (regardless of the parent reference). Unless a node had a child that was also a ancestor which in my mind makes it a graph and not a tree.
–
Conrad FrixOct 14 '10 at 15:21

A circular reference would be if one of the children of a node looped back to an ancestor.
–
Matt OlenikOct 14 '10 at 17:04

In database terms, circular references with proper PK/FK relationships make it impossible to insert or delete data. If you can't delete from table a unless the record is gone from table b and you can't delete from table b unless the record is gone from table A, you can't delete. Same with inserts. this is why many databases do not allow you to set up cascading updates or deletes if there is a circular reference because at some point, it becomes not possible. Yes you can set up these kind of relationships with out the PK/Fk being formally declared but then you will (100% of the time in my experience) have data integrity problems. That's just bad design.

While I agree with most of the comments here I would like to plead a special case for the "parent"/"child" circular reference.

A class often needs to know something about its parent or owning class, perhaps default behavior, the name of the file the data came from ,the sql statement that selected the column, or, the location of a log file etc.

You can do this without a circular reference by having a containing class so that what was previously the "parent" is now a sibling, but it is not always possible to re-factor existing code to do this.

The other alternative is to pass all the data a child might need in its constructor, which end up being just plain horrible.

What situation can you give me where keeping a circular reference model is the best model for what you're trying to build?

From my experience, the best model will pretty much never involve circular references in the way I think you mean it. That being said, there are a lot of models where you use circular references all the time, it's just extremely basic. Parent -> Child relationships, any graph model, etc, but these are well known models and I think you're referring to something else entirely.

It MAY be that a circular linked list (single-linked or double-linked) would be an excellent data structure for the central event queue for a program that's supposed to "never stop" (stick the important N things on the queue, with a "do not delete" flag set, then simply traverse the queue until empty; when new tasks (transient or permanent) are needed, stick them in a suitable place on the queue; whenever you serve an even without the "do not delete" flag, do it then take it off the queue).
–
VatineOct 14 '10 at 14:11

As long as you don't add any relationships that aren't actually there, you are safe. If you do add them, you get less integrity in data (cause there is a redundancy) and more tightly coupled code.

The thing with the circular references specifically is that I haven't seen a case where they would be actually needed except one - self reference. If you model trees or graphs, you need that and it is perfectly all right because self-reference is harmless from the code-quality point of view (no dependency added).

I believe that at the moment you start to need a not-self reference, immediately you should ask if you can't model it as a graph (collapse the multiple entities into one - node). Maybe there is a case in between where you make a circular reference but modelling it as graph is not appropriate but I highly doubt that.

There is a danger that people think that they need a circular reference but in fact they don't. The most common case is "The-one-of-many case". For instance, you have got a customer with multiple addresses from which one should be marked as the primary address. It is very tempting to model this situation as two separate relationships has_address and is_primary_address_of but it is not correct. The reason is that being the primary address is not a separate relationship between users and addresses but instead it is an attribute of the relationship has address. Why is that? Because its domain is limited to the user's addresses and not to all the addresses there are. You pick one of the links and mark it as the strongest (primary).

(Going to talk about databases now) Many people opt for the two-relationships solution because they understand to "primary" as being a unique pointer and a foreign key is kind of a pointer. So foreign key should be the thing to use, right? Wrong. Foreign keys represent relationships but "primary" is not a relationship. It is a degenerated case of an ordering where one element is above all and the rest is not ordered. If you needed to model a total ordering you would of course consider it as a relationship's attribute because there is basically no other choice. But at the moment you degenerate it, there is a choice and quite a horrible one - to model something that is not a relationship as a relationship. So here it comes - relationship redundancy which is certainly not something to be underestimated. The uniqueness requirement should be imposed in another way, for instance by unique partial indexes.

So, I wouldn't allow a circular reference to occur unless it is absolutely clear that it comes from the thing I am modelling.

(note: this is slightly biased to database design but I would bet it is fairly applicable to other areas too)

The term "circular reference" is somewhat vague, your question needs some context to answer. For example, in a doubly linked list there are references (pointers) back and forth, but it is in no way harmful.

But another meaning (under .NET) is when you reference an assembly with yours. In this case, a "circular reference" breaks compilation.

When one says circular reference, one typically means that following a chain of pointers (e.g. myobj->next->next->next) will eventually lead back to the starting point. Circular implies that there is no "terminating condition" to signal that you've reached the end. Which is quite different from a doubly linked list or a tree.
–
dash-tom-bangMar 20 '12 at 0:55

Circular references in data structures is sometimes the natural way of expressing a data model. Coding-wise, it's definitely not ideal and can be (to some extent) solved by dependency injection, pushing the problem from code to data.

Hmm.. any garbage collector tripped up by this isn't a true garbage collector.
–
Billy ONealOct 14 '10 at 0:37

9

I don't know of any modern garbage collector which would have problems with circular references. Circular references are a problem if you're using reference counts, but most garbage collectors are tracing style (where you start with the list of known references and follow them to find all others, collecting everything else).
–
Dean HardingOct 14 '10 at 0:40

3

See sct.ethz.ch/teaching/ws2005/semspecver/slides/takano.pdf who explains the drawbacks to various types of garbage collectors -- if take mark and sweep and start optimizing it to reduce the long pause times (e.g. creating generations), you start to have problems with circular structures (when circular objects are in different generations). If you take reference counts and start fixing the circular reference problem, you end up introducing the long pause times are characteristic of mark and sweep.
–
Ken BloomOct 14 '10 at 13:45

If a garbage collector looked at Foo and deallocated its memory which in this example references Bar it should handle the removal of Bar. Thus at this point there is no need for garbage collector to go ahead and remove bar because it already did. Or vice versa, if it removes Bar which references Foo it shuold remove Foo too and thus it will not need to go remove Foo because it did so when it removed Bar? Please correct me if I am wrong.
–
ChrisOct 14 '10 at 13:46

1

In objective-c, circular references make it so the ref count doesn't hit zero when you release, which trips up the garbage collector.
–
DexterWOct 14 '10 at 14:06

In my opinion having unrestricted references makes program design easier, but we all know that some programming languages lack support for them in some contexts.

You mentioned references between modules or classes. In that case it's a static thing, predefined by the programmer, and it's clearly possible for the programmer to search for a structure that lacks circularity, though it might not fit the problem cleanly.

The real problem comes in circularity in run time data structures, where some problems actually can't be defined in a way that gets rid of circularity. In the end though - it's the problem that should dictate and requiring anything else is forcing the programmer to solve an unnecessary puzzle.

I'd say that's a problem with the tools not a problem with the principle.

Adding a one sentence opinion doesn't significantly contribute to the post or explain the answer. Could you elaborate upon this?
–
MichaelTMay 11 '14 at 2:11

Well two points, the poster actually mentioned references between modules or classes. In that case it's a static thing, predefined by the programmer, and it's clearly possible for the programmer to search for a structure that lacks circularity, though it might not fit the problem cleanly. The real problem comes in circularity in run time data structures, where some problems actually can't be defined in a way that gets rid of circularity. In the end though - it's the problem that should dictate and requiring anything else is forcing the programmer to solve an unnecessary puzzle.
–
Josh SMay 11 '14 at 6:37

I have found that it makes it easier to get your program up and running but that generally speaking it ultimately makes it harder to maintain the software since you find that trivial changes have cascading effects. A makes calls into B which makes calls back to A which makes calls back to B... I've found it's tough to truly understand the effects of changes of this nature, especially when A and B are polymorphic.
–
dash-tom-bangMay 14 '14 at 20:06