RSS

In an earlier post, I talked about making programs trash the heap, and someone wanted to know what that was. Trashing the heap is something you’ve seen before. It often looks like this:

Here is how it works:

The Heap…

You’ve probably noticed that programs take up computer memory. As they do stuff, they need to store information. The program says, “Hey, I need 8 bytes of memory.” The system finds a free spot in memory big enough to hold something 8 bytes in size, and tells the program where it is. This happens thousands of times a second for a busy program. Get four bytes. Get eight more bytes. Then release the four because you’re done with them. Now get 100 bytes. Now get a megabyte. Now drop the eight bytes.

This sea of data is called “the heap”. It’s also sometimes called the “free store”, but I’ve only ever heard old beardy types use that terminology. I think the last time I saw the words “free store” was in 1991-ish. And the book I was reading was already old.

Anyway, usually this activity is abstracted for a programmer. You just create variables when you need them and throw them away when you’re done.

But sometimes, in some languages, you do need to worry about memory. And this is where things get messy.

…and the trashing thereof.

Let’s say you’re programming in C or C++, and you have the program grab enough memory to store 20 bytes of data. And now you make a perfectly innocent mistake and accidentally copy 4,096 bytes into that 20-byte slot. (This is actually easy to do for a lot of reasons. More on that in a minute.) Your data will fill up those 20 bytes, and then overwrite the next 4,076 of data. Any variables that happened to be occupying that space have now had their values replaced with something different.

Congratulations, you’ve just trashed the heap. If you are very very lucky, the program will crash right away.

If you are not lucky, it will continue to run but begin behaving oddly. See, those 4,076 bytes of memory might have been filled with crucial bits of data needed to keep this program operational. If it crashes instantly you can look at what it was doing just before death and you’ll find the trouble spot. But those bytes of memory could have been empty, and thus spewing a bunch of random garbage into that space is “harmless”. (This time.) What you’ve got in this case is a random crash bug. The program may run fine, act oddly, or insta-crash, all based on what things happened to be in that spot of memory at the time.

WHYYYYY!?!?

This is the subject of holy wars. Some say that C is a crap language because you can (and must) interact with memory directly. Other people say the people in the first group are just crap programmers.

I am not an expert on languages. Other than doing a lot of BASIC in my teenage years I’ve spent limited time dabbling outside of C, so I can’t make any really good comparisons. But 90% of the problems I have in C are because it doesn’t have a simple way of handling strings of text.

Here is a bit of old-school BASIC code that adds two strings of data together:

In many languages you can define bits of text, cut them up, join them, or whatever you like and you don’t have to worry about memory. In fact, it’s actually impossible to worry about memory in old-school BASIC – it has no tools for doing so. It’s simple. It’s readable. It’s impossible to make it crash. (Although it’s still possible to create all sorts of other bugs. But trashing the heap is not a risk.)

In C*, the language does not do all of this legwork for you. If you wanted to add those strings together you’d need to measure the length of the first string. Then measure the length of the second. Then allocate a block of memory large enough for both strings plus 1 extra byte. Then copy the first string into that spot of memory. Then copy the second string in the spot 24 bytes after that (just after the first string).

(Note to would-be nitpickers: I’m aware you’d use sprintf to save yourself a few lines, and I know you wouldn’t really do things just this way. This post is for non-coders. Don’t Be That Guy.)

The advantage of the C way is that it is crazy fast and memory efficient. This was important back when machines ran at sub-megahertz speeds and had 64k of memory. Which is not even enough memory to store this one image:

But today we’ve got computers with lots of power and spending three minutes trying to save ten bytes of memory is a horrifying waste of programmer resources. It’s like pushing your car through an intersection to save on gas. The effort is much greater than the savings and you’re a lot more likely to cause a crash somewhere.

90% of my memory mishaps are the result of juggling string data like this.

1) Measuring and allocating memory is annoying and adds a lot of extra lines of code and is prone to mishaps.

2) You have to remember to explicitly free the memory later, “Okay, I’m done with this spot now. Something else can use that memory.” If you forget, then each time this part of the program is run it will grab more memory. (This is called a memory “leak”. The program eats up more and more memory the longer it runs.)

3) You have to remember to not use that spot after you’ve freed it. The variable might still be around, but after you’ve freed the memory it pointed to it’s just a crash waiting to happen.

4) Some programmers – myself included – save themselves the headache of measuring & allocating by just grabbing a space that’s “always going to be big enough”. Instead of measuring a and b, I’ll just grab… hmmmm… 100 bytes? That sounds good. In effect, I’m routing around the features of the language that were intended to make C fast and memory efficient by deliberately wasting memory. And of course, maybe in some unusual circumstances 100 might not be enough. Did I remember to add a bunch of code to catch and handle that case?

5) Strings must end in an invisible terminating character. When you print or copy strings, it looks for this terminator to let it know when to stop copying. If that terminator isn’t there for some reason it will keep printing or copying until it runs right out of the space you’ve allocated and will sail off into the heap looking for it. You’ll end up printing a bunch of garbage or (worse) copying a lot more stuff than you intended. This also means that the length of all strings (in memory) is the number of characters it contains, plus one. It’s just really easy to make one-off mistakes like this.

Interfacing directly with memory is really fast but also dangerous. It’s a powerful tool, like a flamethrower. Critics of C and C++ say that languages shouldn’t have flamethrowers. Supporters say that flamethrowers are fine, you just need to not make any mistakes. I’m of the opinion that having a flamethrower is a good thing, but I shouldn’t need to use it every time I want to light a cigarette.

I don’t mind this hassle when I’m dealing with something big. When I’m loading 10MB texture maps and complex 3d models into memory I don’t mind the overhead involved to take care of them efficiently. They’re big and you’re usually in an unbelievable hurry when you’re dealing with those. But juggling crappy little 10-bytes strings like they were live hand grenades is tedious. Partly because they’re so trivial, and partly because it’s something that needs to be done often. Even after all these years I still get annoyed at how cluttered and inelegant it is when I want to deal with a couple of short strings. What would be a single line of code in any other language ends up being half a dozen. (If done properly. If done improperly it’s just two lines of code now and half an hour of pulling your hair out six months from now when you have to sort out why it’s crashing.)

There are add-ons for C++ out there that will help with this, but they’re not standard. If you use one, you may find your code is no longer portable. Or it will be a headache for other programmers to read and maintain. Or those add-ons might conflict with something else you’re trying to use.

Interesting read, and understandable for C-laymen such as myself. Thank you.

Question, though: Is it only the various C-languages that do this? I can’t remember Java or Delphi forcing me to do this kind of stuff, anyway; are there more programming languages that force this manual memory allocation?

Algol, COBOL, FORTRAN and all these other really old things do this too. But “modern” languages like Java, Delphi (which is pretty much Pascal.v2 with GUI), VBasic, Scala, Python, Ruby or even C# do not require it (much).

C and C++ are just very widely used and therefore prone to a lot of criticism.

It’s primarily the C family of languages, because C was created to write an operating system in. When you’re writing an OS (or any other application that needs direct hardware access and predictable timing), you really need the low-level features that C provides.

Unfortunately, because it worked so well at its original task it became the “hot” language of the time, got used for general purpose applications, and was used by folks that had NO idea what they were doing. This resulted in amazing quantities of really bad code that was capable of crashing the entire system.

For modern application programming, C would be an epically bad choice. C++ inherited all of C, including its low-level features, so although it’s better it’s still booby-trapped and (IMO) a bad choice for new development. Serious application development should be done using an appropriate language. Java & C# are adequate, though my personal favorite is Python and I’ve heard good things about Ruby.

Really, the only reason to be using C/C++ is if you need to be close to the hardware for some reason. I live in it, since I do embedded development, but I’m the exception.

On any modern machine/OS, the only really hardware specific code is (should be) in drivers or the OS’s boot code – this is the code that does things like “put the bytes F1 3E 83 A9 in memory at physical address 5F0000, because that’s where the IO port that sends messages to the CD drive is” (not really).

More probably, it’s because you want to use libraries. OpenGL, FMOD, Havok, maths (matrixes and linear algebra, not square root!), networking, threading and other plumbing. Somewhat ironicly, you have to be much more knowledgeable about C to use a library from a non-C language in many cases:

C’s binary interface is the de-facto for pretty much any operating system under the sun, so any generally useful library tends to be written for C usage first, and other languages have to have glue written to connect to them. This glue is incredibly boring *and* difficult to write, so unless it’s both a fairly popular language and a fairly popular library, you will have to learn the C API so you can write the glue anyway. In both cases, all the documentation and the help other users give you will be against the C API, so you have to convert everything in your head. On top of that, every time something goes wrong you have to check whether it’s the glue that’s wrong, on top of your code using the API or the API itself.

Well spoken! I have, at times, written small string libraries to get around most of my annoyances with C strings, but, well, there’s more than one way of implementing a (character) string and some ways are good for some things and other ways for other things, so in the end, I always end up with five partial implementations, all different, none covering the WHOLE need, but all (together) doing so, with painful impedance mismatch everywhere.

Oh, another “fun” thing with overwritten allocations… Sometimes, they change the book-keeping that’s used to keep tabs on what is and isn’t allocated, then it becomes very erratic.

My hat is off to you for daring to write a GUI program in C++…have you thought about porting it to, say, C#? Or would the time sink needed for that not be worth it? The present iteration of .NET is pretty darn good in this programmer’s opinion.

Being an old geezer, I don’t know C#. I realize it’s the hip new thing now and I’ve been sort of meaning to check it out at some point, but on my personal timeline it feels like the language appeared five minutes ago.

There is something called Mono developed by Novell, that is a free, cross-OS implementation of the .NET runtime. Unity and some medium-visibility Gnome software (Gnome Do, Banshee, F-Spot, Tomboy) runs on Mono.

Except .NET 4.0 didn’t really add that much to the plate in terms of features, esp. in the portable field. The biggest focus was interoperability with COM & P/Invoke, which don’t work outside Windows anyways, features for C# 4.0, which are also mostly about interop. The rest is support for nice stuff like full support for dynamic languages, Contracts and Parallel processing and a couple new numeric types.

Sure, there are bugfixes and whatnot, but for the most part, developers can stay on 2.0 (3.0 and 3.5 work on top of the 2.0 CLR and Mono already implemented most of the fatures anyways)
And if you’re really desperate, Mono 2.8 is already on preview with support for C# 4.0 (and associated framework changes)

Yes, but in Delphi strings are automatically allocated, reference counted and freed for you without having to do any work (unless you really WANT to do them the old fashioned way). Which helps because as Shamus said you play with strings a LOT in most programs.

Awwww VB isnt all bad. I figure its kinda like crystal meth. You get this really nice GUI created really fast but when you come down, you have no idea what you’ve done and no idea of the kind of long term problems you’ve just caused yourself.

I think you missed the outcry VB devs did when they discovered VB.net, in the spirit of fixing things, made it impossible to trivially migrate. C# and VB.net are NOT VB, and approaching (or the opposite) the languages with that prejudice is nothing short of idiotic.

It is not unlike the move from VB3 to VB4. That was when the move from 16 to 32 bit occurred. VBX’s went bye-bye and the OCX became king. The code changes in that version change were tremendous and time consuming.

Visual Basic .NET is a sanitized version of Visual Basic with true object oriented support (it’s kind of required now) that compiles code to the CLR.

I don’t know where the previous poster got that C# contains a “liberal smattering of VB,” because it’s more Java-inspired than VB-inspired, though in my opinion it’s a bit less clunky and potentially less verbose than Java (that said, I’m far more experienced with C# than I am Java). In fact, I’d go so far as to say that the only reason that C# even remotely resembles VB.NET is because they both use the same support assemblies (kind of like a standard library of sorts).

Basically: C# is kind of like a C++/Java hybrid that is designed to compile to CLR bytebode. VB.NET is a greatly enhanced (and surprisingly sensical, unlike VB6) Visual Basic-inspired language that is designed to compile to CLR bytecode. They are radically different in terms of syntax, and VB.NET tends to do a bit more handholding than C#. Which one you use is personal preference. They are both capable of doing the exact same things.

Ah, true. I suppose C# 3.0 did kind of make a liar out of me there. I haven’t touched VB.NET since VS.NET 2002, so meh.

That being said, that is a language construct. Even if you can’t use that exact syntax you can still pull off the same thing in VB.NET (this also holds true with a couple of other features that I can’t remember off the top of my head).

For most applications, particularly business-oriented ones, you can accomplish the same thing with similar code using both languages. Better? :)

I don’t think C# is at all related to VB. Being a VB programmer in my teenage years, then C++, and now C#, I actually loath VB with every fibre of my being (pardon VB programmers :-)), but love C#. So don’t write off C# because of VB. After all, the things they have similar, like the garbage collector, aren’t they really the (few) good things about VB?

You might want to add “+ huge levels of dependency”. Running C# for GUIs means installing a .Net framework (or work-alike like Mono), of which there are several major incarnations which are not entirely compatible with one another, and myriad sub-versions within each that change the behavior of various things from one to the next. This also adds in a layer that rapidly becomes either a march of security update patches (roughly bimonthly) or a potential security problem. Frameworks in general may save a bunch of work on the front end, but in the summation, might end up being simply a push off of work to other people at other times.

While yes the framework requirement is onnerous, as is esuring the correct version is installed on target hardware which is often an issue on managed systems that are not allowed automatic updates on site hurts, it’s not a whole lot worse than the Java requirement for all the Java code out there.

Being a young punk, C# feels like someone listed all the problems with C++, wrote an awkward implementation of the solutions, and declared it the new standard library. Which is exactly what I was looking for.

they dump metric F-TONS of money into getting higher education to teach that microsoft solutions are the be-all-end-all. then they dump metric F-TONS of money into convincing idiot business managers that microsoft solutions are the only way to go and that anyone who questions that should be fired immediately. end result being, if you arent brainwashed in the course of getting a CS degree, you quickly learn to shut your mouth and swallow the microsoft line because thems that goes along, get along, and youd like to keep getting your paycheck so you can pay off that huge student loan.

C# is not without its warts (switch statements, for instance, bug the ever-loving life out of me: case fall-through no longer exists, but a case that doesn’t end in break, return or throw is a compile error), but it’s much easier to live with than C, C++ or Java (Java has always felt kinda clunky to me, for no good reason). IME, YMMV.

While C# doesn’t have case fall-through, it does have something better: case jumps. Which really is the best of both worlds — preventing fall-through eliminates a whole class of bugs caused by “oops, I forgot to ‘break'”, while adding jumps lets you explicitly specify where you’re trying to share implementation. Thusly:

(I’ve heard a similar complaint somewhere else, and hin my very slim programming experience have never actually run into it at all, so don’t have any direct experience with what sorts of issues it might cause.)

GoTo programming is very, very, very difficult to follow for anyone, including the person that wrote it. If you ever have to go back and look at the code again, GoTos make your job exponentially more difficult.

Edsger W. Dijkstra’s letter to the editor, which the editor titled, “Go To Statement Considered Harmful” is the most famous argument against the goto statement. But to summarize: goto makes it easy to write “spaghetti code”, where the program logical jumps around all over. It can be hard to identify how you ended up at a particular location. Other language elements like loops, conditionals, and functions provide more explicit entry and exit points.

While all that is true, the idea that goto is always bad is an overreaction. goto is a powerful and potentially dangerous tool. One should avoid it because the other tools are almost always safer and easier to read. But every once in a while goto can actually simplify program flow, making the resulting code easier to read. For example, goto is sometimes the clearest way to handle errors by jumping to a shared recovery section of the code.

The counter-counter point is that it’s dangerous enough that you should scare people away from it. In 15 years of professionally programming, I use goto once every 5 or so years. I see other valid uses every year or two. The concern is that when you put dangerous features into the hands of random programmers, many of them lack the restraint to use it appropriately.

“Are programmers smart enough to use powerful and dangerous features?” is the core question dividing a lot of programming languages. :-)

I’d honestly love it if GOTO was limited to switch cases, since it would make it slightly more obvious when you were falling through to the next case, and you should pretty much never use it outside of a switch case in C#.

No case-fall-through? I consider that an irrelevant feature, not a bug. Honestly, I’ve not ever used that ever, but I have fixed quite a few bugs where people forgot a break. That said, having to write ‘break;’ still seems really pointless, and it is the same as an if/else block. I have never been a fan of switch statements to begin with, they are glorified ifs anyway. The compiler can bloody well leave me alone with its optimisation issues.

Java isn’t so clunky anymore starting from their fifth iteration, which introduces generics. Their compiler really sucks for generics and produces horrible code, but in 99.9% of all cases that doesn’t matter, and having things like Map makes everything so insanely simple to write, and they fixed all their issues with encapsulated primitives too. Compared to newfangled things like Scala or Ruby, it might feel clunky. Compared to C++, it feels as elegant as battles in Crouching Tiger Hidden Dragon.

Nah, switch statements are glorified jump tables (or, rather, jump tables that have a pretty surface syntax) and are, as such, pretty useful. However, what I’d REALLY like is something that takes a whole swoop of various conditional tests, with associated code bodies and execute the first block that has a matching condition.

Which is unfortunate, because I occasionally want to say things like that. Chained “else if” statements do the job (and are logically identical), but aren’t as elegant to read or write. It’s one of several pieces of syntactic sugar I wish C++ would steal, along with Pascal’s “with STRUCT do”.

In, kinda, C-like syntax. And with the ability to test random variables, not a single one. The closest I’ve seen, so far, is COND in Common Lisp (well, most lisp-like environments), but that lacks the ability to fall through and that MAY be nice, I think.

I’ve worked a very small amount (very, very small amount) with Java 5. I am familiar with it, at least in passing. The clunky feeling I’ve gotten from the language hadn’t really disappeared. Can’t really put a finger on the why, though some of it has to do with the load of syntax it takes just to get to “Hello, World,” a problem C# also has. Java is, or, at least, was, conceptually cleaner than C#, but the trade-off seemed to be requiring more code to get from A to B. IMHO, YMMV, etc.

Anyway, case fall-through is handy in limited situations for reducing code repetition. If you have two cases where case B can be summarized as case-A-with-an-extra-step, you can do the extra step of case B, fall through to case A and then break.

Aside from a certain brevity of expression for simple comparisons, it’s pretty much the one feature that switch has over if, and, in C#, it doesn’t even have that. As a result, C#’s switch statement offers very, very little and does so with syntax that no longer even makes sense.

That may be more accurate than you think. C# seems to have come out of nowhere sometime between when I graduated with an AAS in Programing (2005), learned that I couldn’t do anything with that degree and went back to get a BS in Political Science. So I literally haven’t the faintest idea what it is.

I personally haven’t used C# for anything, but my father — life-long coder — had to learn it and is using it for his current personal and professional projects.

He prefers it to C++ because he doesn’t have to deal directly with memory. On another note, the Windows GUI objects that it ship with are garbage. Text labels won’t draw over images, for example. So, it’s capable of quickly making Windows GUI apps, but if the GUI is complex it might start to break.

All the stuff in “System.Windows.Forms” is just a thin wrapper over the native Windows… uh… window APIs. “new Form” becomes “CreateWindow(“”, …)”, “new Button” becomes “CreateWindow(“BUTTON”, …)”, “myButton.Label = “Foo”” becomes “SetWindowText(hwndMyButton, “Foo”)” and so on (Yeah, labels and buttons and all the other controls are windows as well. It’s called Windows for a good reason :).

If you want a fancy new GUI library in C#, use WPF, which implements all the controls and layout you want in fancy DirectX. After you learn all the insanely complex (but powerful) systems that support it.

I worked in C++ years ago, but isn’t one of the differences between C and C++ that C++ supports string concatenation, like stringC=stringA+stringB

Of course you need to allocate stringC first I suppose.

And I agree with this. Especially since I come from the other side of the problem. C# where there is no way to manually allocate memory (well there are ways to use pointers, and allocate memory, but you can’t deallocate it later). Yes, 99% of time you don’t need memory management, but in that 1% is where you are trying to do some complex Image Processing things, and you need any extra edge you can find.

This is a somewhat pedantic reply, but I’m a programmer; fussing over the trivial details is what I’m paid to do. ;)

C++ is basically the same as C, but with a bunch of new features added. Native support for strings wasn’t one of those new features; C++ still treats strings as raw character arrays. However, STL (the “Standard Template Library”) does provide really useful string implementations which allow you to easily combine, split, and otherwise munge strings to your heart’s content, exactly as easily as you would want.

To my knowledge, the STL is now available pretty much everywhere you’ll find a C++ compiler. It’s not part of C++, but it almost always comes together with your C++ package. I’ve personally used STL’s string classes in programs running on PC, Mac, PlayStation 2, PSP, Dreamcast, Wii, XBox 360, PlayStation 3, and iPhone. So it’s pretty cross-platform and reliable.

I’m one of those old geezers who normally advocates using only the basic compiler primitives, not big add-on libraries, for largely the same reasons that Shamus mentions. But I strongly, strongly recommend using STL strings (or a work-alike) instead of using the built-in C string functionality whenever possible. It’s just too easy to mess up when you’re handling raw C strings directly. It’s just not worth the risk.

The Standard Template Library is part of C++ ISO Standard. Compilers that doesn’t come with STL is not standard compliant.
Btw Shamus, from your writing I get a sense that you program C++ the old way. Do you ever try STL ? or Boost ? The style of C++ programming has change a lot this last decade.

That, and a lot of us have legacy libraries and structures that take char* parameters and such. I suppose you can use std::string and then grab the char out from under it when you need it. Probably a worthwhile approach, but I’ve never gotten into the habit.

Yeah, well, when you’re habits are actively causing problems and headaches, a change of habit is well advised.

Think of it like optimizing yourself; it takes a while to find exactly what needs to be fixed, and takes another while to get that fix up and running, but once you do you’ll wonder why it took you so long to get around to fixing it.

I hope you’re talking about char* as output parameter. char* as input-only is pure evil (imho). At work we have to use a library whose authors couldn’t care less about const-correctness, so you can’t even read data from objects declared as const. And then it want input as non-const references, which means that effectively const functions passing class members to that library require said members to be mutable. I am sure one day someone will stumble around our code and think “What the hell, why on earth are these variables declared mutable”. Then he decides to revert them to normal declaration, and nothing would compile any more. Heh.

Depending on how widespread the library is, I wouldn’t use mutable, I’d use const_cast in the specific cases that needed it instead (probably by writing wrapper functions around the library functions that had const-correct interfaces).

Of course, either way you’re vulnerable to the library implementation changing something you’re not expecting it to. But you could mitigate that by copying the data yourself before passing it in (in the wrapper functions), to guarantee that the library couldn’t screw it up. That’s definitely the safest option, although it might hurt performance a little.

Well that library is available as open source, so it’s not that we don’t know whether it might change something (because it doesn’t), but it’s just annoying that it breaks the whole concept of const-correctness. Anyway, don’t ask me why, but I didn’t even think of just copying the member variables before passing them; probably because I’ve become so used to very clear and strict const-concepts and const-reference argument passing the I pretty much automatically avoid any unneccessary copying without thinking about it, heh.

I like Win32’s OpenPrinter, I think, which takes a char* (or LPTSTR if you’re into that stuff). I can’t think of any reason it is NOT a const parameter, but it isn’t. One of the annoying cases where you either cast the const off string::c_str() (evil), or have to go and allocate another char array.
Haven’t compared std::string v MFC/ATL::CString for a while… wonder if STL still wins in speed these days.

I was the same. I spent too long stuck in plain C on the PS2 (enforced actually because the same code ran on the Gamecube) and when I moved to windows based development elsewhere one of the things I was encouraged to use after a performance reveiw was STL.

Well, simply said, std::string solves the very string operation problems Shamus is talking about – and still provides an interface to have it read by old-fashioned functions, but manipulation goes by included operators and methods.

I guess Shamus knows this anyway, but for those who don’t:

std::string allows you to do the very same thing as in the Basic example:

This alone is already reason enough why I think that for 99.9% of all situations, C++ is clearly superior to C (because no such thing would even be possible there); you can write all your code as low-level as someone would do in C if you wish, but yet take advantage of std::string everywhere.

That and std::vector, which does something similar but for arbitrary types, not specifically characters of text.

std::string is not quite as convenient as having a built-in string class in the language. For example, you can’t do this:

void myFunction (std::string s) {
// Do something with s
}

(...)

myFunction("string" + "literal");

even though std::string has a conversion-from-char* constructor and a + operator. The compiler will still believe that you’re trying to add two string literals, ie char* objects, and complain. You have to do this:

myFunction(std::string("string") + "literal");

which is not a whole lot of extra code, admittedly, but still.

Nonetheless, std::string really does solve a lot of the pain of working with strings. It’s still annoying to concatenate strings with numbers, though, as in

Yes, that is true. For some time I’ve done work with Borland Builder, where they have that AnsiString class which has overloads like that.

Luckily, at my current workplace, we are also using boost (or more precisely – it is installed and we are allowed to use it, that doesn’t mean that people typically do so), which has that absolutely convenient lexical_cast function which works just like static_cast etc., only that they transform numerical values to strings back and forth. Not as elegant as writing “string1 = string2 + int 1”.

But… hey, I just got an idea. I could write such an operator outside of the class (but inside the std namespace, of course). I guess I’ll just be making a wrapper for the aforementioned lexical_cast, but it would really be so convenient.

Overloading + to mean “concatenate strings” always rubs me wrong. It seems like a good idea, then people start doing clever stuff, like implementing “operator+(std::string &lhs, int rhs)”. (Which you can do, if you really want.) Now you’re facing code that looks like:

std::string s1 = 20+"10";
std::string s2 = "four: " + 2 + 2;

and you have no idea what should happen. Which will surprise fewer programmers, 30 or 2010, 4 or 22? After all, if you’re silently converting integers into strings, why not convert strings into integers? Lots of languages will do exactly that! You end up with weird rules, like “The left hand side has to be a string,” or needing to pay very close attention to the precedence rules. You’re in the land of special cases and exceptions. You can obviously live there just fine (See also: Java), but I’m not convinced that it’s inherently better.

Given C++’s grounding in C, I think the solution (stringstreams) was a reasonable one. For those not familiar with them, the above code looks like this:

int result = bigCalculation();
std::ostringstream output;
output << "The number is " << result;
// If you want the std::string, use output.str()

While overloading the bitshift operator isn’t great, it is symmetric with the preexisting I/O stream interface C++ had, and bitshifting strings is kinda silly.

As someone who has done a fair amount of string manipulation in C++ over the last few weeks, it’s not a big deal. It’s certainly not as convenient as, say, Perl, but

Also, your first line "std::string s1 = 20+"10";" shows the *real* reason C++ should *never* have defined string operator+(). That is assigning whatever is 20 bytes past the start of the static string "10"!

The “STL” correctly only refers to the collections section of the C++ standard library: the stuff that uses iterators. std::string is a bit of an edge case, since it was in the pre-standard C++ library long before STL, but it is a collection of characters:

After programming C/C++ commercially, I declared that any language that didn’t require me to manage memory was a cake walk. I still stand by that.

I actually failed to get a job once (listen up, Shamus… this could be you) at a Powerbuilder shop. I was leaving a job where I wrote C++. The programming manager asked me what the worst bug I’d ever had was and how I solved it. The bug involved memory management, double-deleted pointers and garbage collection.

He said, “Um… okay.” He had no clue. He quickly realized I shouldn’t work for him and I came to the EXACT SAME CONCLUSION. Win.

I think you can, in fact, use a simple garbage collector to handle strings especially, like this Boehm-Demers-Weiser GC. It’s not necessarily right for every project, but as a fellow C guy, you probably have already taken this lesson to heart.

That’s exactly what he said in the article: the memory doesn’t get deallocated and thus, the program still thinks it’s in use. Next time it passes by that part of the code, it’ll allocate another piece of memory it’ll fail to deallocate, and so forth. Over time, the little pieces of memory that were “leaked” start accumulating and you’ll see the memory footprint of your process grow.

If you ever want to start writing text books and other books on programming, you could probably make a killing making C and C++ clear to new programmers. As someone who suffered through the dry barrens of Dietel & Dietel’s C book, I firmly believe that one of the most daunting aspects of learning a language is the absolute CRAP that passes for writing in those books.

If you ever write an Intro to C book, I’ll buy it. I’d know that it wouldn’t be BORING. When I dive into a book on programming I don’t expect Tolkien or Aasimov at their best, but I would like it to be useful for something OTHER than curing insomnia (hello Android documentation as an example of something that gave me much more sleep).

I was tempted to add a “Me too” or some other comment on some of the other stuff here, but this comment caught my eye. Even as someone who writes code for a living, I would like to see Shamus Young’s Guide to Programming. (Or whatever you would call it.) I’m completely serious. You have a good writing style, know how to make things clear, and keep it interesting. Would you ever consider writing one? I know I’d recommend it to people I know. Maybe you could send a couple of these articles to a publisher or something? (I’m not really sure how you would go about starting that.)

Oh man, I just had this picture of Shamus teaching online C classes. 6 People sitting in vent with him, and Shamus teaching them how to make a Hello World program. I am pretty sure I would pay for that course. Like I told him the other day, all I know is Java, and I should really learn something new.

Honestly, some of what Shamus has been writing on the subject reminds me of In The Beginning there was the Command Line by Neil Stephenson. Though Stephenson was talking about general computer history and not specifically programing IIRC.

Technically, Stephenson was talking about the history of Operating Systems. He talks about the history of computers, but his focus is clearly on the OS.

The thing that blew my mind from that book was when he talks about all the advances made by computers from the 1950’s to the 1980’s, and how a lot of those advances were basically invisible to the operating system. Whether you’re using punchcards, a keyboard and printer, or a keyboard and monitor, to the computer it’s all the same. Character input and character output. It wasn’t until 1984 that computer interfaces really changed (from the OS perspective).

And standard too! I haven’t coded in C but I mainly use C++ as a hobbyist, so no “real world” problems for me, but I never had trouble using strings (note I’m referring to C++ STL strings). They come as part of the standard so are implemented in every compiler.

On the other hand, I really liked your article, found it well explained and interesting. Keep on!

Thanks for the explanation. It’s good to get another confirmation that my decision to not delve into programming was the right choice. I quit that about the time they started talking about “object-oriented” as I found it tedious and not as rewarding as fixing existing systems. Seems not much has changed apart from the tools and languages since then.

However, I still find scriptwriting nescessary in my work with support and maintenance, so the skills haven’t been a complete waste.

Same here, I’m always looking at my decision to leave the generic “computer science” degree and moving to the CCNP netadmin program. In fact it seems we made our minds up at the same point, with object-oriented stuff. But, as you mention, knowing the fundamentals of programming and how computers act on instructions certainly helps.

And Shamus, if the quality would be the same as this post, I must third the proposal that you write a programming book — I’d buy it and read it even though I don’t plan to write much code in my life, and I’d show it to everyone I know who teaches that kind of thing.

I’m still not doing much more with my coding skills than scripting (were doing some Fortran coding before), but the concept of object orientation seems to me an extremely good thing, even though it did take a while to wrap my head around it.

Initially it’s just a roundabout way of doing things, but once you’re settled in, you can do amazing things that would require brutal hacks without object orientation.

So I must be wrong, but I thought the OS would take care of protecting a program from read/writing memory not allocated to it, or is that just for special cases? Of course that doesn’t stop a program from corrupting its own memory space.

Also I never got why the string termination character was necessary _if_ your string takes the whole allocated memory space.

There are usually multiple layers of allocation. What the program requests from the OS tends to be page-sized chunks (multiples of, usually, 4 KB). These are then divvied up by the runtime library into your 1-,3-,7-,23- or anything-else-sized allocation requests (with a bit of padding for the book-keeping, because you never know when you’ll need to reclaim a chunk).

The OS protection in question typically involves throwing an exception or other serious error. Unexpected errors tend to lead to application crashes. However, it is preferable for the at-fault application to crash than having a different application or even the OS crash instead.

String termination allows for strings of variable length to be stored in the same memory location. If the value of string A changes from “Hello, world!” to “Hello!”, you don’t print “Hello! world!” later (or “Hello! “, for that matter).

Additionally, the use of character-terminated strings allows for strings of, theoretically, arbitrary length (in practice, limited by available memory). An alternate approach used in some other languages, length-termination, typically has a known maximum length beyond which strings may not grow (ISTR this is 255 characters in at least some versions of Pascal, for instance).

BS. EVERY modern OS has memory protection. It’s what makes apps crash instead of OSs. However, the OS can’t know if you’re doing a valid assignment or just scribbling all over your variables, so trashing your own heap is perfectly possible.
In *NIX/Linux/BSD/etc they’re called segfaults. In Windows they’re called Illegal operations or whatever. In the end, it’s the same thing.

The modern line of Windows is descended from NT, not 95, and has had high quality memory protection from day one. Normal applications can merrily write wherever they want in memory with zero risk.

So, why doesn’t it work in practice? A variety of options: Buggy drivers by third parties remain a common problem. It’s (part of) why Microsoft is so keen on WHQL testing; they’re getting grief for Windows crashing, so they want to reduce the risk. Also, software that doesn’t play by the rules. There is a long list of crap that software shouldn’t do: poke around in the boot sector, install custom drivers, hook directly into the OS memory space. And modern Windows, by default, doesn’t allow it. But software publishers whine that they need to do dangerous things, mostly so they can implement their buggy DRM implementation of the week. So the software gets run at the Administrator or System security level where it’s free to fuck everything up. Add in viruses and spyware playing much the same games and you’ve got a recipe for crash soup.

The problem isn’t the memory protection. It’s a culture of insecurity endemic to commercial software developers that Microsoft allows to continue in the name of backward compatibility. (And understandably. If DangerousGarbage 3.2 works on Windows N, but doesn’t on Windows N+1, users will be blaiming Microsoft, not DangerousGarbage for abusing its privileges.)

Given all of this, to an extent it’s impressive how stable Windows is. And, by and large, it is stable. Several friends who work professionally as systems administrators all grudgingly yield that since Windows 2000, it’s a pretty good operating system.

Believe it or not, I’m not a fan of Microsoft. (My heart goes to Canonical, makers of the the Ubuntu distribution of Linux.) They do a lot of terrible stuff. But I have zero complaints about the memory management in modern Windows releases.

Even at admin level (System is impossible to access unless you pull off a privilege-elevating exploit), processes run each in their own address spaces. Even if you want to, you can’t just scribble over someone else’s memory. To crash another program or Windows (without exploiting inputs), you have to get into the appropriate process, which can’t be done with an EXE. You need to inject DLL for that. Admin just makes such a thing possible. (It’s how installers and debuggers work, after all).
Even then, it’s the exact same thing as other OSs. Once malicious code reaches Admin/SU status, they’ve got complete control. Memory protection is not engineered towards that. But the OS can’t know if something is legit or not. That’s why Admin is not the default level of accounts anymore, and programs have, by large, learned not to ask for admin unless needed. In fact, Vista ever-derided UAC was designed to be a roadblock to privilege escalation. From there on, PEBCAK.

But we’ve digressed now. You’re confusing memory protection with security. Mem protection is just meant to isolate processes. Security is much more complex.

To take what you said a step further, the Administrators group in Windows doesn’t have as much control over the system as root in a *nix system. As a root user, you can easily nuke a filesystem, write garbage to kernel memory, and do a ton of other wicked and nasty things.

In the Windows world, your average Administrator has less direct control over low-level system functionality. You can’t destroy a system disk with “rm -rf /”. You can still do damage if you know what values to garble and such, but it takes far more than a simple typo to do so.

To crash another program or Windows (without exploiting inputs), you have to get into the appropriate process, which can't be done with an EXE. You need to inject DLL for that.

Gee, that makes it better – who ever uses DLLs these days? Duh….

programs have, by large, learned not to ask for admin unless needed.

Yay, MOST programs DON’T opt out anymore. Duh….

As I said, security you can opt out of is NOT security.

(Memory protection IS a form of security, as in the general usage of the word, just not “keeping people (especially hackers) out of the system” security, which is what we use that word to refer to most often when talking about computers these days. No, I was not confusing the two.)

One might also point out that most of the article deals not just with trashing the heap, but also with the stack, as well. Though, overwriting 4000+ bytes on the stack is generally a lot more dangerous than doing it to the heap (i.e., buffer overruns).

So, a door that requires a code typed on a keypad to enter is a form of security… even if the door has a secondary handle on it (that anyone can use, if they notice it) that will open the door without said code?

Having a form of security that is optional at the request of those you are supposed to secure against is NOT security at all.

But it does LOOK like security. Yay! (I would like to thank the TSA for being an even better object lesson on this point.)

I guess this would mean something if all processes were going around installing themselves as system drivers or whatever. Yes, the *can* do this. But I don’t see it as the OS’s job to “defend” me from malicious processes. (One of the reasons I never went to Vista.)

Look, I use windows. A lot. Twelve hours a day. Ten programs running at a time. Firefox with ten taps open. Everything eating tons of memory. A couple of ill-behaved programs crash themselves now and again. Meanwhile, I’m writing software that sometimes crashes spectacularly. And yet yesterday my machine rebooted for the first time in a month. Yes, I’m aware that a Linux machine is a juggernaut, but your appraisal of the memory system in Windows is wrong. If you were right, this machine wouldn’t last an hour.

Linux crashes for me just like Windows does, and for many of the same reasons. The only time Linux ever crashes for me is because of an errant driver (usually the crapware that NVIDIA and ATI like to call “drivers” [though they have gotten better over time, at least]).

In fact, as of Vista, Windows handles video card driver crap-outs far better than Linux does. While Linux either hard-locks or freaks out, Windows can usually reset the video card driver and have me back up and running in 10-15 seconds. Prior to Vista, that was a driver-specific feature, and I can say that ATI’s GPU reset feature worked fairly well in XP (for some reason, Rollercoaster Tycoon 3 tripped it on my laptop semi-regularly; probably a driver bug).

I do agree with you that it is MUCH better than it used to be, and I use Windows every day, as well (computer programmer, all Windows based – I make my salary off of Microsoft, to a large extent).

But I blue-screened my brand new computer a few weeks ago when I tried to use PixelCity the first time on it. Yup, windows is so stable…

And you booted it for the first time in a whole month? That people consider that a long time for a Windows machine (even in a case like yours) makes my case for me.

Windows leaks memory (or allows programs on it to do so, which is logically equivalent). If it didn’t, reboots would only be required after OS patches, and would be a COMPLETE waste of time the rest of the time.

Actually, it works perfectly. One process can’t stomp on another process’s data unless the other process lets it. There’s no way the OS can tell whether a program is making a “good” write or a “bad” write to its own memory, though.

Unfortunately, since Windows is the most popular OS at the moment, it has lots of people writing software for it — even people who shouldn’t be, since they write crappy software.

As Miral said above, it most certainly does work. NT was built as a server and workstation OS. If there were no memory protection, a Windows server could potentially drop like a ton of bricks over a bad PHP script. I’ve gotten upwards of three month uptimes using 2000, XP, Vista, and 7, and generally the only reason I reboot at that point is to do updates and other maintenance tasks. I don’t think I would have broken a week if NT’s memory protection “didn’t work very well.”

The only thing that memory protection does not and can not protect the system from are kernel mode drivers, but that’s nothing new. What do you suppose would happen if you were to compile a Linux kernel module that trampled the system memory?

CmdMarcos, yes and no, depending on what OS, but nothing stops a program from trashing its own allocated memory. Recent OS’s mostly keep you from stomping on other processes, but you can still generally affect them by stomping on shared stuff and resources.

First thing this makes me think of is X-Com, where you’re limited to 80 items because of RAM limitations at the time, and the way old games like the original Pokemons and Sonic 3 would count items over 99 with two digits, one of which was garbled pixels. Very interesting.

+1 to wanting to read a proper programming book by Shamus. Hell, there’s enough unique stuff on this site to put it together and tart it up a bit for a book, like Dave Sirlin did with his Playing to Win website. Could be an idea.

heh, I had always wondered why those programmers had limited themselves to just one byte to represent numbers.

Though, now that I think about it, with as limited as computers were back then, I guess it would make sense to save every possible byte.

But then a new question arises for me, even supposedly “next-gen” games have this weird limit, where space should no longer be an issue. Two specific games that come to mind are Oblivion and the first Starcraft. Starcraft limited the number of upgrades a particular unit could recieve to 255, and Oblivion had the same limit on how much you could artificially raise your character’s attributes.

Because you have billions of the things? We’re talking about character statistics, you’re never going to have more than a dozen or so of them. Unless you have millions of (fully detailed) NPCs floating around, using an int instead of a byte isn’t going to make a dent in your memory.

STL is a wonderful band-aid over some of the issues of C++. It also comes with a few of its own caveats, but it takes away a bit of the “here’s more rope to hang yourself with”.

C# [Disclaimer: Microsoft currently pays my mortgage] and its built-in memory management do wonders for letting you NOT worry about trashing the heap or stack–at the price of performance. Great for non-graphics apps that spend 95% of their time waiting on a user or a remote signal anyway.

Umm.. the standard library (see my pedant rant about STL above :)) is the POINT of C++. The “++” is about letting things like std::string and std::vector exist, so better to say “C++ is a wonderful band-aid over some of the issues of C”. Of course, C++ *also* gives you a whole bunch of rope to hang yourself with, if you don’t feel like using C’s gun to shoot yourself in the foot.

Funny thing about (exact, compacting) GCed languages like C#: they are actually *faster* than manual (or conservative GCed) memory management for long running processes due to cache coherency and locality. When C# looses, it’s generally because a) the developer used libraries that exchange development speed for execution speed (eg, LINQ), b) The C# compiler generates worse code than C++ (delibrately, it is a much simpler and faster compiler), which only matters rarely, or c) The total running time is short enough that the runtime initialization is dominant.

And even in the cases where systems and compilers are intelligent enough to stop it spewing over random areas of memory, they usually kill the errant process anyway. So you’ve got the same problem, just possibly immediately and easier to track back to the issue

If someone would be kind enough to ignore the vagueness of my following question:

Is C++ hard to learn? In my Theoretical Physics University course I am going to start I will be learning Fortran and C++ to code simulations for the problems we have to work out, I don’t code and have never coded. I would like some idea on difficulty level of the language.

Questions like this are a little hard to answer. Not because the answer is vague, but because the answer really does depend on _you_. How difficult the language is depends on how you’re learning it, and whether you, yourself, think in a way that works for programming. (And, arguably, a host of other small factors.)

To answer your question, I don’t think so. C++ isn’t hard to learn. Some applications of the language can be difficult because they require you to learn something specific (e.g. graphics) or really focus on how you solve a problem (complex math). C++ is a lot easier to learn than C because it avoids some of the direct ways you can mess something up. I will suggest something, however. Since your course doesn’t sound like it is an introduction to programming, find a resource and start learning now. It can only help you later.

That is what I ment by vague… English is not my strong point, I just sort of blunder through it. Complex Maths wont be a problem but graphics might be (although I might not need to work heavily with that). I think getting a head start might be a good idea. Thanks for your advice.

Just to clarify: I was only using those examples to illustrate that the language itself shouldn’t take long. You’ll probably be writing a program that successfully answers a given equation before very long at all. Writing a simulation may or may not be a while after that. However, if you do anything more complicated later, that new use of the language will be the difficult part, not using the language itself, since you’ll still be using the same keywords and structures, just in new ways.

I personally learn programming languages and the like through following tutorials and then fooling around with other peoples’ code. However, something tells me that C++ isn’t the language I want to do that kind of thing in …

I didn’t mean to answer vaguely, but my original response was very long and meandering so I edited. A bit much apparently, but can’t be helped now.

What I meant was that how difficult anyone will find C++ is largely dependant on the person. Just like math is hard for some, but easy to others. And that you shouldn’t really worry and just wait and see for yourself.

When learning to program, you have to learn two different sets of skills:

1. The language itself. There are a lot of differences between C, Java, Visual Basic and Fortran. You will have to learn the intrinsics at some point.

2. But first, you need to learn to think in terms of process flows, memory, pointers and all those other abstract concepts. The differences in this regard between the languages are very minor (C vs Java) up to completely irrelevant (C# vs Java).

Note that the second part takes quite a few years to get down decently, while the first point can be fixed in months, or even weeks, depending on experience and difficulty of the language.

For 1: When you say this can be done in months/weeks did you mean when starting from scratch or learning a new language? If the latter than would staring from scratch increase the time I would spend learning it?

For 2: That is pretty much what I expected. Abstract concepts made me giggle, as my course is Theoretical Physics abstract concepts encompasses ALL that I am going to learn. Thanks for that it was most helpful.

If you can write code easily in language X, then it will take you three months at the most to get proficient at an acceptable level with language Y, because they only differ in syntax, and not in concepts. Sure, the uber-gurus will still use the features of their chosen language better, but you can painlessly get work done.

That is to say, as long as both languages belong to the same category. Query languages (SQL, XQuery, XSLT) are a very different beast from procedural/OO languages (C*, Java, Python, Pascal) and so are functional languages (Prolog, Haskell).

In the end, it depends on how much you can grok pointers. Everything else is peanuts. If your brain isn’t wired to handle pointers, references and whatnot, C/C++ will be very difficult. The rest is the standard programming’s mindset: If you can think in an imperative style (“To solve X, you need to do A, then B, then repeat C until N is equal to M”), you’ll do good.
(Tidbit: Another programming style is functional (“X can be represented as a composition of functions F, G & H por input A”), which is gaining popularity since it makes it easy to do stuff in parallel. Naturally, it lends itself well for physics, maths and related. Check out Haskell)

Until I got to the first brackets I didn’t understand anything there. But they are technical terms I’m guessing so not important to know the word itself. I will indeed check out Haskell whatever it is…

Heh. Pointers are variables just like any other variable. But instead of holding data, they hold the address where you can find said data. References are the address itself. You could say pointers are variables that hold references. So access to the data becomes indirect. Instead of accessing X for value N, you access X for location P which has value N.
Some people have trouble thinking in terms of indirect access (and the ramifications thereof), which is what makes pointers so difficult to use.

My experience with language learning:
once I learned Lisp and C and anything with object-oriented features, I was able to learn every other language trivially.

But while learning the language is trivial, the long part is learning the libraries. My first C program, for example, has a function that converts strings containing characters like ’12’ into the actual number 12. Because I didn’t know atoi() existed. I wasted time and thought and debugging power on something that comes with the language – because I didn’t know the libraries.

Learning a language is like learning a grammar and consequently easy, but libraries are more akin to vocabulary, and take more brute memorization.

So it would be like learning that there is a notation for square root instead of deriving it by a formula? I’m feeling swamped here…
If that is what you mean then judging from the comments I’ve got back then that might be what causes me the most problems but one that can be remedied quickly. Thanks. I have had a lot of stuff to think about from all the people who responded and I am greatful for it.

I want to add something important people never realize when they talk about performance of do-it-yourself memory management. Everyone thinks it’s faster because you don’t have a garbage collector. But that is actually not always true: A GC will do most of its work when the CPU would be idle anyway, while your memory management code will be executed at the point you specify. So your fast and cheap code takes up a few valuable cycles, while the slow GC takes up a ton of worthless cycles.

Would you rather pay someone with a single ounce of gold, or with a pound of dirt? In the end, paying “more” can be cheaper.

That is pretty much the opposite of reality. If you mistrust garbage collection, you are asking for trouble.

Modern GCs do a stunning job, nearly all of the time. At the point where a GC consistently runs into trouble, you are doing very unusual things, such as crazy high-performance calculations or memory allocation in a scale that was considered all but impossible just a few years ago. But your average desktop application requires less than 1 GB of RAM (most get by with a few to a few dozen megabytes even, that is less than 1% of what your crappy netbook offers) and spends 90% of its time waiting for user input. Which means the GC can clean up your memory nine times for every key you press. If that’s not enough, you wrote really shitty code somewhere else. ;)

If you want some evidence: Take a look at minecraft. It is able to simulate worlds that are bigger than the earth, and it runs in Java, which not only has a GC, but doesn’t even directly compile to machine code (which makes everything slower by a ridiculous factor)!

Sort of. Minecraft generates terrain around you, about a kilometer (from my crude eyeballing). If you get close to the edge of the generated terrain, it generates some more. It will theoretically generate an infinite amount of terrain, but the practical limit is disk space.

Also, it’s not accurate to say that it’s simulating it all at once. The game would quickly grind to a halt processing everything. There is a limited zone around you that is being simulated. Everything else exists in a sort of suspended animation. Of course, you’re not there to observe it, so you’re unlikely to notice.

Actually, Java does compile down to machine code, just that it does it at runtime using profile guided optimisation. Look in to HotSpot one day and some of the amazing things it can do with JVM byte code.

I meant that when you go around saying that you should trust something like that some dumb young programmer with bright eyes and lazy bones will write a program during the weekend. Happily thinking that all his memory management will be done by the garbage collector and ignorant of the fact that the GC doesn’t like it when you do that thing.

You know, the one that all the experienced X-programmers know not to do, because X’s GC always throws a great big sulk over it and refuses to touch that part of the software? That either isn’t in the official documents or is hidden like the Arc of the Covenant? So every time the program starts that part of the code some memory gets allocated but not freed until it’s closed completely and you can bet that part gets called every other second.

Then just to top it off it’s supposed to be run on the background, hours on end. Like say, an IM client. And you’d like to be able browse the net without cutting off a free method of communication with your friends, but you can’t do it because your computer is chugging along since that bloody client is leaking memory like Pratchett.

And all the other ones are either also written by similiar fuckbends who didn’t bother to actually do any optimization because “X is almost as light as C” forgetting that they’re running the damn thing on a high-end desktop computer and that they barely ever have more than two tabs open because more would confuse and frighten them.

And the rest are written in C by other idiots who either have never heard of the term “feature creep” or never understood why it was a bad thing, so it’s in perpetual alpha. Never to be bugfixed, because “that’s what beta is for”.

Now pardon me, I’ve got an orphanage to burn down. Those bloody gits’ happiness is annoying me and peeing on their parade apparently didn’t do more than temporarily annoy them.

(Seriously though, I meant that you should always make sure that the GC can do it’s job. For instance in Flash if an island gets too big it’s GC won’t touch it so you have to keep ’em small (which incidentally, isn’t/wasn’t in the official documentation apparently) . Basically a shift from managing every damn memory allocation yourself, you have to just make sure that the GC is happy and working.)

(Oh, and I don’t like it when people say things like “it’s stunning” or “it’s awesome” especially when they’re obviously trying to counterbalance someone else’s cynicism, and are going overboard to hyping. Which is bad.)

Also: Apart from that example you gave (which sounds like a shitty GC anyways), I’ve never heard of special restrictions in GC usage. Certainly not the JVM’s or CLR, at least.
Of course, many people mix “GC” with “Resource disposal”, and that’s where the shit hits the fan. Nothing more fun than discovering the program leaked 4 file handles, 3 brushes, 5 bitmaps and a couple mutexes. (To throw numbers out there)

I ran into a blog post by someone who made a game in Flash which leaked memory after a while (can’t remember specifics, but it was a turn-based flight thingie). The reason was that the GC in Flash doesn’t touch islands that get past a certain size and assumes they’re always needed. This wasn’t according to him mentioned in the official documentation and found about it by pure luck.

And then got Flash fanboys claiming that it was documented and that it was his fault anyway and blah blah blah.

The source for this piece of knowledge (assuming I remember correctly) mentioned that every language he had worked in he had to either manage the memory manually or make sure that the GC is doing it for him. To him it was apparently a “GC helps, but not as much as is given to believe”

Also, “bad analogy” as in “inaccurate”, “dude. Not cool” or “just plain sucks”? They’re all par for the course with me. Even at the same time.

@Newbie: As Shamus mentions above, C/C++ offers you a lot of flexibility and performance, at the cost of making it much easier to shoot yourself in the foot. I know you might be constrained by the third-party libraries you’re required to use, but if that’s not the case I’d recommend pretty much _anything_ else.

Almost missed this. Anyway I don’t think I would have problems with mistakes (if that’s what you mean with shooting myself in the foot) I am obsessive to a very sharp point. And I am afraid it looks like I don’t have a choice. Thanks for the helpful input nonetheless.

EVERYONE makes mistakes. Everyone. Indeed, the more thorough you are, the fewer mistakes you make, the WORSE the ones you make usually are (in that they are so much harder and more complicated to find – “dumb” mistakes are the best to make because they are quick to find and easy to fix).

I’m with RobertB – only use a flamethrower if you NEED a flamethrower. Using a flamethrower to cook you food or light your cigarette is REALLY REALLY STUPID. You might do it right the vast majority of the time, but the rare mistake is a very bad thing.

Yes but with maths and Physics I am very capable with going through LOADS of lines of equations, diagrams and working out to find my mistakes. Not to mention then that the mistakes I have made will be fewer in number because when making a mistake can ruin about 3 hours of your life you learn not to do so many (EDIT: also when your exams are half that time it also helps re-enforce the ability to make fewer mistakes). I don’t know whether I will need the “flamethrower” I just know I am being taught how to use that and a “Lighter?” (by the way I have no idea whether comparing Fortran to a lighter is correct I was just trying to be cool =D )

Fortran is more akin to a thermic lance, with a slippery grip and an obscure habit of occaionally swapping the operator and business end around. But, unlike the flamethrower of C and C++, it’s perfectly safe to use in gusting wind.

A pilot light? No. That’s something you see in fiction. A pilot light works great on a boiler in a basement, but not so well on a hand-held tool used outside. Electric igniters don’t get blown out by the wind.

Fortran is a pretty old language. As such it isn’t really designed to be a teaching language so it isn’t always user friendly. On the upside (and probably the reason your university teaches it) is that it is considered very good for numerical analysis and there are quite a few libraries that very purpose.

FORTRAN (now Fortran – no longer all caps, yay) is still in use because no language has been created to replace it. It is capable of using numbers of ANY level of precision (limited only by available memory). No other language does that (or at least nothing better enough to replace it – I don’t think it’s even been attempted in a long time), so Fortran is still in use.

The problem isn’t that you’re not going to be able to get a grip on C/C++ from a detail perspective. The problem is that the sorts of errors endemic to C/C++ memory management and pointer manipulation can be easy to make and tricky to find. So _if_ you don’t have a compelling reason to use C/C++ (i.e. locked into legacy libraries, performance is critical but not so critical you want to use assembler), then you should use something that doesn’t offer these sorts of opportunities for error.

I code for a living, and probably 85-90% of my work is in C/C++. But if I’m doing one-off or smaller-scale apps where the conditions above don’t apply, I’ll do them in Java and not feel the least bit bad about it.

This reminds me of that old programming adage — “Debugging is twice as hard as programming. This means that if you write code in the cleverest way you can, by definition you are not smart enough to debug it.” Or, to put it another way: “keep it simple, stupid!”

I’ve been programming only in C for the last 7 years(right out of school), and have to say that memory management issues really aren’t that much of a problem for us. We have pretty good memory tracking tools for detecting any kind of memory overwrite or memory leak, and utility functions for alot of str duplication and manipulation so you’re not copy/pasting the same code everywhere.

Also, nerd note: snprintf is slow(ish), you’re faster to just do like strncpy/strncat/strncat/etc for simply stuff like that, if its something where the code will get ran alot. If you dont care then the clarity of snprintf is always nice.

I really like that flamethrower analogy – I think it gets the point across better than the handsaw analogy I used in the last C thread here.

Sure, you only hurt yourself if you make a mistake with a flamethrower, but still, NOBODY USES FLAMETHROWERS for their normal sources of heat or flame (cooking food, lighting cigarettes, etc). If someone uses a flamethrower to light their cigarette and burns themself, no one would call them stupid for making a mistake with the flamethrower…

They would call them stupid for USING THE FLAMETHROWER IN THE FIRST PLACE.

And so it is with C. Excellent analogy.

Edit: and I would totally buy the Shamus Young Book on Programming, as so many others have said. Really, Shamus, you have a gift for this stuff.

Weren’t you arguing that C shouldn’t even exist though? Because the flamethrower analogy utterly fails to support that position. There are a number of tasks that flamethrowers are used for, even if they’re not common.

I am a 15 year C programmer (still use it).. Heard an interesting stat about 10 years ago. The last 10% of the bugs to be fixed in a C project take 90% of the bug fix budget and of those 90% are memory issues (overruns, rogue pointers etc). Do not know it is true, but I can easily believe it.

After spending 14 years fixing memory issues (the first year I did not fix any only made them), I really appreciate any language with garbage collection (in particular Python and Ruby). You can focus on the problem you are creating the program to solve not the problems you are creating in the program.

Ultimately, the big thing here is a different attitude towards programming and programming languages. I’ve had to deal with C++ and Java (and done some Python and Smalltalk and some other languages) and the attitude difference is clear:

C/C++: You’re the king, you know what you’re doing, you tell me what to do and I’ll do it. Even if I think it’s stupid, I’ll do it, just in case you know better than I do.

Java: I’m the king, I know what I’m doing, and you get to use me to do what you want. I do as much as I can for you and hide the details, and I won’t let you do something that I think is stupid.

Both have their downsides. C/C++ may take longer to code in some cases and you have to be very careful that you don’t screw-up. But if you need to do something odd, you can do it. On the Java side, you might not be able to do it if you want something not standard, and sometimes it takes a lot of work to convince Java (GridBag, I’m looking at you here) to do what you want. And in Java, you never have to think about what you’re doing, so sometimes you just don’t know what you’re doing.

The biggest example of the latter is that I recent spent an entire weekend tracking a deadlock on threads where basically normal, don’t-think-about-it locking managed to have two methods lock each other out, just because of the code path that it HAPPENS to follow. C/C++ makes you THINK about locking, while Java says “Just tell me to synchronize and I’ll lock things for you”. Bleh.

For the former, I was recently cursing at the Java compiler because it wouldn’t compile because I didn’t initialize a variable in a case for a new case I was adding … that I knew always ran last and didn’t want to initialize to anything anyway at that point. C/C++ would have warned me and moved on.

It depends, really, on what you like. I like more control, so I prefer C/C++, at the cost of one type of error and some extra work in some cases. Some like the ease of use, and so prefer Java at the cost of a different type of error and extra work in other cases. Really, the best language to use is … the best language to use for your feature, taking into account how you like to code.

C/C++ are part of what now called unmanaged languages. As pointed out in the OP you have to allocate and deallocate memory yourself. Visual Basic 6, Java, C#, VB.NET all manage your memory for you and do automated garbage collection.

Having coded and maintained a CAD/CAM application in VB6 for 15 years along with a C++ add-on for Orbiter Space Simulator (Project Mercury and Project Gemini for Orbiter) unmanaged langauges should not be used for general purpose application development.

They should be used when you absolutely need to use them. In my VB6 CAD/CAM application we have several libraries that are written in C or C++ that are called by VB6. And reason is that we need low level access to the hardware of the computer to interface with the metal cutting machines the software controls.

Also my advice really only applies to NEW projects. Existing projects should not switch unless there is another compelling reason. For NEW project any of the managed mainstream managed languages will be far more productive then their unmanaged counterparts. The time you save on testing and maintenance is considerable.

A great post, Shamus. Count me in as a prospective buyer of your forthcoming programming book. A lot of programming books these days are like PHD dissertations converted to book form, and are about half as exciting to read. One of my favorite game book writers is Andre LaMothe, who writes in a clear and accessible fashion, with a lot of geeky pop culture references thrown in.

Another factor with C/C++ besides self-managed dynamic memory is security. NONE of the original functions take security into account. Many viruses and malware take advantage of buffer overruns to get into memory areas they would otherwise be blocked from by the OS.

I’ll jump on the bandwagon and say that you could, and should, totally write a programming book, if only because it combines your writing work with your programming work, but also because you’re damn good at making it clear what’s going on it the program, even for me, who, at best, can write in HTML. =/

I see the one-pixel black dot, but from the other comments it sounds like that’s not what you’re asking about?

The single-pixel image actually makes a certain amount of sense in the context of the post (though even my distinctly non-techie brain thinks that 64K sounds like a bit too much for the amount of overhead in an image format…).

+1 to the “I only see a dot” contingent. I’m using IE, and if I try to load the link in its own window, it loads just fine. I’m reading your site directly.

eta: I bet it’s the image’s empty “width” attribute. If you had no “width” at all, it’d behave like your “crash” image up top. But IE and FF seem to treat a “width” attribute differently, so I bet IE is treating it as a width of 0 and not showing anything beyond that.

It looks like the source includes “width=”” in the img tag. I wonder if different browsers handle that differently, with IE scaling the image down to a width of zero and Firefox assuming it means nothing. That’s the only thing I can guess.

Thank you for this, Shamus. I’m not a programmer by any means and have no interest in programming, but you’ve actually managed to explain some things elegantly for someone with no understanding of the subject matter.

You really are an excellent writer. While you’re looking for work you should really consider professional writing.

If you’re writing an operating system, a new programming language, or a high performance game, you probably need to be using C and just learn to deal with memory efficiently and correctly.

If you’re not, save yourself a ton of time and effort and problems and use a language with memory management and reflection and such. You’ll thank yourself later.

Actually, in the game case, ideally you would find a way to reserve C for the graphics parts and do everything ELSE in the high level language. I doubt anyone does this in practice, however. Sad for them.

On a current project at work we drive most of the game through lua but any intense functionality (rendering, pathfinding, math heavy code, anything network related) has to be written in C.

The tools we use for development are largely C# and most of our build scripts are in python.

It really does come down to using the right tool for the job. As people keep saying, you’re an idiot if you use a flamethrower to light a cigarette because learning to use a cigarette lighter might be too hard.

EDIT: Speaking of work, I just noticed someone else in the office using the PixelCity screen saver. +1 wins to Shamus.

I love PixelCity… but it runs SO SLOW, even on the new machine at work. It’s probably some kind of setting or graphics card issue, but it does make me sad. So many other people seem to get to really enjoy it.

a$ = "Twilight is a great book"
b$ = "- for starting arguments!"
c$ = a$ + b$
PRINT c$
Run this, and it will take two phrases:Twilight is a great book
and:- for starting arguments!
And merge them together to print the entire sentence.Twilight is a great book - for starting arguments!

No, it most certainly will not. Do you really believe that the space in your final output between “book” and “-” will magically materialize if it is neither in the end of the former string nor the beginning of the latter? [/pedant] ;)

I don’t agree with you on the “performance used to be important but isn’t any longer” part of the argument. You can easily die a death by a thausand paper cuts if you don’t pay attention to the little things (like string handling for example!). Even on today’s machines:http://blog.buschnick.net/2010/09/performance-rant.html

In my brief C++ career (about 8 months working on one project) I don’t think I ever trashed the heap by writing something too large to an array; my favourite memory bug was allocating memory to a pointer, moving the pointer, then calling delete() on it.

What I don’t get is why C doesn’t have a Standard Template Library like C++ does. Obviously the Linux ninja-gurus don’t need all the tools the STL gives you, but surely some kind of standard simple string class when you just need a standard simple string class like what you get in the STL (or vectors at least! I never knew how easy array handling could be until I started using vectors for a lot of stuff instead of arrays.) isn’t going to hurt.

To an extent C can’t. The Template part of the STL requires language support. std::string may look simple, but it’s actually a specialization of std::basic_string. You can easily make strings of wchar_t’s to do unicode. Not good enough? You can make strings of int, or even a custom character class if you really want. You can do roughly equivalent things in C, but they’ll always be crude and a bit error prone. (You’re in the realm of #define macros and casting.)

Okay, what if we jettison templates? Skip super configurable objects, just give me plain strings, maybe in char and wchar_t variants. That’s much easier. But C won’t help clean up objects on the stack or nested inside of other objects. Part of why C++’s classes are awesome is that you write a good destructor and it will be automatically called when the object goes out of scope. C has no equivalent. You still need pay attention to when you’re done with a given string and explicitly ask it to be cleaned up. Still much more error prone than C++’s std::string.

Okay, we’ll accept that. But there is still value in providing smarter string functions that do things like automatically growing to hold data, ensuring null termination, and perhaps tracking length so you can include the null character or at least make strlen really fast. A variety of libraries exist to do exactly that (and for a skilled C programmer it’s a few hours of work to whip one up). Why not include that? At this point we’re down to language design philosophy and history. C is very much “assembly, but better.” It was written for low level programming, an area it continues to excel at. You want to keep the assumptions to bare minimum. Making more powerful constructs part of the default assumptions comes with a cost (e.g. memory usage, code size, speed) that someone developing an operating system, a low-level library, or an embedded system frequently don’t want to pay. A given project might want some parts, but not all. There are also tradeoffs: for one project the ability to have null characters in a string is important and worth the cost, for others it isn’t. Something like the STL is quite good, but it’s designed to work for most people, most of the time. If you want more, you can easily built it yourself, get a third party library, or move to C++.

There is an elegance to C’s minimalism, and I think it’s part of C’s strength. The C Programming Language is, for all practical purposes, a complete definition of C, and is surprisingly thin and readable.

I like C and C++. They’re great tools, each well suited to different tasks. I don’t see a lot of value in trying to make C a more versatile tool when you can easily reach over and pick up C++ instead.

I thank you, and also you’re welcome, since I know you love to talk about this stuff.

I actually understood that. (Though some of the comments seem to be in a foreign language.) Now just teach me all the rest of programming, and we’ll start a software company. Voila! Both our money problems solved!!

The original Pokemon games are famously easy to trash the heap in, yet not programmed to crash if it happens – that Z80 WILL soldier on no matter what nonsense you’re telling it. Which leads to all kinds of strange corruptions.

I’m a psychologist by training; what I spend most of my time doing is making mods for The Witcher. So the only programming I know how to do is in Neverwinter Nights scripting language, not exactly a big, important language. :-)

But you’ve made the memory problems of C both interesting and understandable for someone like me. I read the whole article and enjoyed it. I read some parts aloud to my husband the Computer Science professor.

You were wondering whether to go for a job coding or one writing. Only you know which way your heart leads. But I can tell you that you’re a gifted writer. Maybe you could write the manuals for a game company? I think you’d be really, really good at that.

[…] which previous investigations have lead me to attribute to memory corruption, most often heap trashing. Anyhow, I have a bunch of verification scripts set up to automatically run the unit tests, static […]

[…] which previous investigations have lead me to attribute to memory corruption, most often heap trashing. Anyhow, I have a bunch of verification scripts set up to automatically run the unit tests, static […]