Well, I know that there are things like malloc/free for C, and new/using-a-destructor for memory management in C++, but I was wondering why there aren't "new updates" to these languages that allow the user to have the option to manually manage memory, or for the system to do it automatically (garbage collection)?

Somewhat of a newb-ish question, but only been in CS for about a year.

We've got a module in iPhone development this semester. After coding apps for Android for 2 years, this question struck most of the class pretty hard. Only now do we see how many hours Java has actually saved us in not having to track down nasty memory management errors and not having to write boiler plate code.
–
siamiiOct 8 '11 at 23:28

16

I would hate if they did have GC. I don't think GC is necessary.
–
acidzombie24Oct 9 '11 at 0:00

7

@NullUserException, since it doesn't specify a way to reclaim memory that pretty much implies a GC.
–
Winston EwertOct 9 '11 at 4:48

2

We already have Java and C# and Python and so on. Why must C++ also fit that mold?
–
MaxOct 14 '11 at 11:04

4

Because if they did, then you would complain about something else.
–
JobOct 15 '11 at 4:21

15 Answers
15

Garbage collection requires data structures for tracking allocations and reference counting. These create overhead in memory, performance, and the complexity of the language. C++ is designed to be "close to the metal", in other words, it takes the higher performance side of the tradeoff vs convenience features. Other languages make that tradeoff differently. This is one of the considerations in choosing a language, which emphasis you prefer.

That said, there are a lot of schemes for reference counting in C++ that are fairly lightweight and performant, but they are in libraries, both commercial and open source, rather than part of the language itself. Reference counting to manage object lifetime is not the same as garbage collection, but it addresses many of the same kinds of issues, and is a better fit with C++'s basic approach.

A secondary issue is the GC is non-deterministic. The object may or may not still be in memory long after the program has "dropped" it. Refcount lifecycles are deterministic, when the last reference is dropped, the memory is dropped. This has implications not only for memory efficiency, but for debugging as well. A common programming error is the "zombie" object, reference memory that has theoretically been dropped. GC is much more likely to mask this effect, and produce bugs that are intermittent and extremely difficult to track down.
–
kylbenOct 8 '11 at 22:17

16

- modern gc's neither track allocations or count references. They build a graph from everything currently on the stack and just condense and wipe everything else (simplified), and GC normally results in reduced language complexity. Even the performance benefit is questionable.
–
Joel CoehoornOct 9 '11 at 0:56

8

Er, @kylben, the whole point of having automatic GC baked into the language is that it becomes impossible to reference zombie objects, because the GC only frees objects that are impossible to reference! You get the sort of hard-to-track bugs you're talking about when you make mistakes with manual memory management.
–
BenOct 9 '11 at 1:08

10

-1, GC does not count references. Plus, depending of your memory usage and allocation scheme, a GC can be faster (with an overhead in memory usage). So the argument about performance is fallacy too. Only the close to the metal is a valid point actually.
–
deadalnixOct 9 '11 at 2:07

9

Neither Java nor C# use reference counting: GC schemes based on reference counting are pretty primitive in comparison and perform much worse than modern garbage collectors (mainly because they need to do memory writes to alter reference counts whenever you copy a reference!)
–
mikeraOct 9 '11 at 2:59

Strictly speaking, there is no memory management at all in the C language. malloc() and free() are not keywords in the language, but just functions that are called from a library. This distinction may be pedantic now, because malloc() and free() are part of the C standard library, and will be provided by any standard compliant implementation of C, but this wasn't always true in the past.

Why would you want a language with no standard for memory management? This goes back to C's origins as 'portable assembly'. There are many cases of hardware and algorithms that can benefit from, or even require, specialized memory management techniques. As far as I know, there is no way to completely disable Java's native memory management and replace it with your own. This is simply not acceptable in some high performance/minimal resource situations. C provides almost complete flexibility to choose exactly what infrastructure your program is going to use. The price paid is that the C language provides very little help in writing correct, bug free code.

Bah, a video. But never the less, interesting already.
–
surfasbOct 8 '11 at 23:07

2

interesting video. 21 minutes in, and 55 minutes in were the best bits. Too bad the WinRT calls still looked to be C++/CLI bumpf.
–
gbjbaanbOct 8 '11 at 23:39

2

@dan04: That's true. But then, if you write in C, you get what you ask for.
–
DeadMGOct 14 '11 at 9:16

4

Managing the smart pointers is not any more demanding than making sure you don't have unnecessary references in a garbage collected environment. Because GC can't read your mind, it's not magic either.
–
Tamás SzeleiOct 14 '11 at 11:56

"All" a garbage collector is is a process that runs periodically checking to see if there are any unreferenced objects in memory and if there are deletes them. (Yes, I know this is a gross oversimplification). This is not a property of the language, but the framework.

There are garbage collectors written for C and C++ - this one for example.

One reason why one hasn't been "added" to the language could be because of the sheer volume of existing code that would never use it as they use their own code for managing memory. Another reason could be that the types of applications written in C and C++ don't need the overhead associated with a garbage collection process.

But future programs written would begin to use the garbage collector, no?
–
Dark TemplarOct 8 '11 at 21:07

4

While garbage collection is theoretically independent from any programming language, it is pretty hard to write a useful GC for C/C++, and even impossible to make a fool-proof one (at least as foolproof as Java's) - the reason Java can pull it off is because it runs in a controlled virtualized environment. Conversely, the Java language is designed for GC, and you'll have a hard time writing a Java compiler that doesn't do GC.
–
tdammersOct 8 '11 at 21:55

1

@tdammers: I agree that garbage-collection needs to be supported by the language to be possible. However, the main point is not virtualization and controlled environment, but strict typing. C and C++ are weakly typed, so they allow things like storing pointer in integer variable, reconstructing pointers from offsets and such things that prevent the collector from being able to tell reliably what is reachable (C++11 prohibits the later to allow at least conservative collectors). In Java you always know what is a reference, so you can collect it precisely, even if compiled to native.
–
Jan HudecOct 10 '11 at 7:50

1

@ThorbjørnRavnAndersen: I can write a valid C program that stores pointers in such a way that no garbage collector could ever find them. If you then hook a garbage collector to malloc and free, you would break my correct program.
–
Ben VoigtOct 15 '11 at 3:52

1

@ThorbjørnRavnAndersen: No, I wouldn't call free until I was done with it. But your proposed garbage collector that doesn't free the memory until I explicitly call free isn't a garbage collector at all.
–
Ben VoigtOct 16 '11 at 14:47

C was designed in an era when garbage collection was barely an option. It was also intended for uses where garbage collection would not generally work - bare metal, real time environments with minimal memory and minimal runtime support. Remember that C was the implementation language for the first unix, which ran on a pdp-11 with 64*K* bytes of memory. C++ was originally an extension to C - the choice had already been made, and it's very hard to graft garbage collection onto an existing language. It's the kind of thing that has to be built in from the ground floor.

@deadalnix - The C++ approach to memory management is newer than garbage collectors. RAII was invented by Bjarne Stroustrup for C++. Destructor cleanup is an older idea, but the rules for ensuring exception-safety are key. I don't know when exactly when the idea itself was first described but the first C++ standard was finalized in 1998, and Stroustrups "Design and Evolution of C++" wasn't published until 1994, and exceptions were a relatively recent addition to C++ - after the publication of the "Annotated C++ Reference Manual" in 1990, I believe. GC was invented in 1959 for Lisp.
–
Steve314Oct 15 '11 at 12:25

1

@deadalnix - are you aware that at least one Java VM used a reference-counting GC that could (almost) be implemented using C++-style RAII using a smart pointer class - precisely because it was more efficient for multithreaded code than existing VMs? See www.research.ibm.com/people/d/dfb/papers/Bacon01Concurrent.pdf. One reason you don't see this in C++ in practice is the usual GC collection - it can collect cycles, but can't choose a safe destructor order in the presence of cycles, and thus cannot ensure reliable destructor cleanup.
–
Steve314Oct 15 '11 at 14:10

The real answer is that the only way to make a safe, efficient garbage collection mechanism is to have language-level support for opaque references. (Or, conversely, a lack of language-level support for direct memory manipulation.)

Java and C# can do it because they have special reference types that cannot be manipulated. This gives the runtime the freedom to do things like move allocated objects in memory, which is crucial to a high-performance GC implementation.

For the record, no modern GC implementation uses reference counting, so that is completely a red herring. Modern GCs use generational collection, where new allocations are treated essentially the same way that stack allocations are in a language like C++, and then periodically any newly allocated objects that are still alive are moved to a separate "survivor" space, and an entire generation of objects is deallocated at once.

This approach has pros and cons: the upside is that heap allocations in a language that supports GC are as fast as stack allocations in a language that doesn't support GC, and the downside is that objects that need to perform cleanup before being destroyed either require a separate mechanism (e.g. C#'s using keyword) or else their cleanup code runs non-deterministically.

Note that one key to a high-performance GC is that there must be language support for a special class of references. C doesn't have this language support and never will; because C++ has operator overloading, it could emulate a GC'd pointer type, although it would have to be done carefully. In fact, when Microsoft invented their dialect of C++ that would run under the CLR (the .NET runtime), they had to invent a new syntax for "C#-style references" (e.g. Foo^) to distinguish them from "C++-style references" (e.g. Foo&).

What C++ does have, and what is regularly used by C++ programmers, is smart pointers, which are really just a reference-counting mechanism. I wouldn't consider reference counting to be "true" GC, but it does provide many of the same benefits, at the cost of slower performance than either manual memory management or true GC, but with the advantage of deterministic destruction.

At the end of the day, the answer really boils down to a language design feature. C made one choice, C++ made a choice that enabled it to be backward-compatible with C while still providing alternatives that are good enough for most purposes, and Java and C# made a different choice that is incompatible with C but is also good enough for most purposes. Unfortunately, there is no silver bullet, but being familiar with the different choices out there will help you to pick the correct one for whatever program you're currently trying to build.

You ask why these languages haven't been updated to include an optional garbage collector.

The problem with optional garbage collection is that you can't mix code that uses the different models. That is, if I write code that assumes you are using a garbage collector you can't use it in your program which has garbage collection turned off. If you do, it'll leak everywhere.

Can you imagine writing a device handler in a language with garbage collection? How many bits could come down the line while the GC was running?

Or an operating system? How could you start the garbage collection running before you even start the kernel?

C is designed for low level close to the hardware tasks. The problem? is it is such a nice language that its a good choice for many higher level tasks as well. The language czars are aware of these uses but they need to support the requirements of device drivers, embedded code and operating systems as a priority.

C good for high level? I snorted my drink all over my keyboard.
–
DeadMGOct 14 '11 at 9:18

2

Well, he did say "many higher level tasks". He could be troll-counting (one, two, many...). And he didn't actually say higher than what. Jokes aside, though, it's true - the evidence being that many significant higher-level projects have been successfully developed in C. There may be better choices now for a lot of those projects, but a working project is stronger evidence than speculation about what might have been.
–
Steve314Oct 15 '11 at 3:11

Although GC was invented before C++, and possibly before C, both C and C++ were implemented before GCs were widely accepted as practical.

You can't easily implement a GC language and platform without an underlying non-GC language.

Although GC is demonstrably more efficient than non-GC for typical applications code developed in typical timescales etc, there are issues where more development effort is a good trade-off and specialized memory management will outperform a general-purpose GC. Besides, C++ is typically demonstrably more efficient than most GC languages even without any extra development effort.

GC is not universally safer than C++-style RAII. RAII allows resources other than memory to be automatically cleaned up, basically because it supports reliable and timely destructors. These cannot be combined with conventional GC methods because of issues with reference cycles.

GC languages have their own characteristic kinds of memory leaks, particularly relating to memory that will never be used again, but where existing references existed that have never been nulled out or overwritten. The need to do this explicitly is no different in principle than the need to delete or free explicitly. The GC approach still has an advantage - no dangling references - and static analysis can catch some cases, but again, there's no one perfect solution for all cases.

Basically, partly it's about the age of the languages, but there will always be a place for non-GC languages anyway - even if it is a bit of a nichey place. And seriously, in C++, the lack of GC isn't a big deal - your memory is managed differently, but it isn't unmanaged.

Microsofts managed C++ has at least some ability to mix GC and non-GC in the same application, allowing a mix-and-match of the advantages from each, but I don't have the experience to say how well this works in practice.

The short and boring answer to this question is that there needs to be a non-garbage collected language out there for the people that write the garbage collectors. It's not conceptually easy to have a language that at the same time allows for very precise control over the memory layout and has a GC running on top.

The other question is why C and C++ don't have garbage collectors. Well, I know C++ has a couple of them around but they aren't really popular because they are forced to deal with a language that wasn't designed to be GC-ed in the first place, and the people that still use C++ in this age aren't really the kind that misses a GC.

Also, instead of adding GC to an old non-GC-ed language, it is actually easier to create a new language that has most of the same syntax while supporting a GC. Java and C# are good examples of this.

Somewhere on programmers.se or SO, there's a claim someone made to me that someone was working on a self-bootstrapping garbage-collected thingy - IIRC basically implementing the VM using a GC language, with a bootstrapping subset used to implement the GC itself. I forget the name. When I looked into it, it turned out that they'd basically never achieved the leap from the subset-without-GC to the working-GC level. This is possible in principle, but AFAIK it has never been achieved in practice - it's certainly a case of doing things the hard way.
–
Steve314Oct 15 '11 at 1:52

@Steve314: I'd love to see that if you ever remember where you found it!
–
hugomgOct 15 '11 at 2:08

@steve314: Having witten the answer this thread is attached to, I already receive a notification for all comments. Doing an @-post in this case would be redundant and is not allowed by SE (don't ask me why though). (The real cause though is because my number is missing)
–
hugomgOct 15 '11 at 3:09

Because, C & C++ are relatively low level languages meant for general purpose, even, for example, to run on a 16-bit processor with 1MB of memory in an embedded system, which couldn't afford wasting memory with gc.

Garbage collection is fundamentally incompatible with a systems language used for developing drivers for DMA-capable hardware.

It's entirely possible that the only pointer to an object would be stored in a hardware register in some peripheral. Since the garbage collector wouldn't know about this, it would think the object was unreachable and collect it.

This argument holds double for compacting GC. Even if you were careful to maintain in-memory references to objects used by hardware peripherals, when the GC relocated the object, it wouldn't know how to update the pointer contained in the peripheral config register.

So now you'd need a mixture of immobile DMA buffers and GC-managed objects, which means you have all the disadvantages of both.

Arguably all the disadvantages of both, but fewer instances of each disadvantage, and the same for advantages. Clearly there is complexity in having more kinds of memory management to deal with, but there may also be complexity avoided by choosing the right horse for each course within your code. Unlikely, I imagine, but there's a theoretical gap there. I've speculated about mixing GC and non-GC in the same language before, but not for device drivers - more for having a mostly GC application, but with some manually memory-managed low level data structure libraries.
–
Steve314Oct 15 '11 at 5:07

@Steve314: Wouldn't you say that remembering which objects need to be manually freed is as onerous as remembering to free everything? (Of course, smart pointers can help with either, so neither one is a huge problem) And you need different pools for manually managed objects vs collected/compactible objects, since compaction doesn't work well when there are fixed objects scattered throughout. So a lot of extra complexity for nothing.
–
Ben VoigtOct 15 '11 at 13:32

1

Not if there's a clear divide between the high-level code which is all GC, and the low-level code that opts out of GC. I mainly developed the idea while looking at D some years ago, which allows you to opt out of GC but doesn't allow you to opt back in. Take for example a B+ tree library. The container as a whole should be GC, but the data structure nodes probably not - it's more efficient to do a customized scan through the leaf nodes only than to make the GC do a recursive search through the branch nodes. However, that scan does need to report the contained items to the GC.
–
Steve314Oct 15 '11 at 13:53

The point is, that's a contained piece of functionality. Treating the B+ tree nodes as special WRT memory management is no different to treating them as special WRT being B+ tree nodes. It's an encapsulated library, and the application code doesn't need to know that GC behaviour has been bypassed/special-cased. Except that, at least at the time, that was impossible in D - as I said, no way to opt back in and report the contained items to the GC as potential GC roots.
–
Steve314Oct 15 '11 at 13:58

There is no guarantee that your garbage will be collected in Java, so it may be hanging around using up space for a long time, while the scanning for unreferenced objects (ie garbage) also takes longer than explicitly deleting or freeing an unused object.

The advantage is, of course, that one can build a language without pointers or without memory leaks, so one is more likely to produce correct code.

There can be a slight 'religious' edge to these debates sometimes - be warned!