Sex, software, politics, and firearms. Life's simple pleasures…

Main menu

Post navigation

C, Python, Go, and the Generalized Greenspun Law

In recent discussion on this blog of the GCC repository transition and reposurgeon, I observed “If I’d been restricted to C, forget it – reposurgeon wouldn’t have happened at all”

I should be more specific about this, since I think the underlying problem is general to a great deal more that the implementation of reposurgeon. It ties back to a lot of recent discussion here of C, Python, Go, and the transition to a post-C world that I think I see happening in systems programming.

I shall start by urging that you must take me seriously when I speak of C’s limitations. I’ve been programming in C for 35 years. Some of my oldest C code is still in wide production use. Speaking from that experience, I say there are some things only a damn fool tries to do in C, or in any other language without automatic memory management (AMM, for the rest of this article).

This is another angle on Greenspun’s Law: “Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.” Anyone who’s been in the trenches long enough gets that Greenspun’s real point is not about C or Fortran or Common Lisp. His maxim could be generalized in a Henry-Spencer-does-Santyana style as this:

“At any sufficient scale, those who do not have automatic memory management in their language are condemned to reinvent it, poorly.”

In other words, there’s a complexity threshold above which lack of AMM becomes intolerable. Lack of it either makes expressive programming in your application domain impossible or sends your defect rate skyrocketing, or both. Usually both.

When you hit that point in a language like C (or C++), your way out is usually to write an ad-hoc layer or a bunch of semi-disconnected little facilities that implement parts of an AMM layer, poorly. Hello, Greenspun’s Law!

It’s not particularly the line count of your source code driving this, but rather the complexity of the data structures it uses internally; I’ll call this its “greenspunity”. Large programs that process data in simple, linear, straight-through ways may evade needing an ad-hoc AMM layer. Smaller ones with gnarlier data management (higher greenspunity) won’t. Anything that has to do – for example – graph theory is doomed to need one (why, hello, there, reposurgeon!)

There’s a trap waiting here. As the greenspunity rises, you are likely to find that more and more of your effort and defect chasing is related to the AMM layer, and proportionally less goes to the application logic. Redoubling your effort, you increasingly miss your aim.

Even when you’re merely at the edge of this trap, your defect rates will be dominated by issues like double-free errors and malloc leaks. This is commonly the case in C/C++ programs of even low greenspunity.

Sometimes you really have no alternative but to be stuck with an ad-hoc AMM layer. Usually you get pinned to this situation because real AMM would impose latency costs you can’t afford. The major case of this is operating-system kernels. I could say a lot more about the costs and contortions this forces you to assume, and perhaps I will in a future post, but it’s out of scope for this one.

On the other hand, reposurgeon is representative of a very large class of “systems” programs that don’t have these tight latency constraints. Before I get to back to the implications of not being latency constrained, one last thing – the most important thing – about escalating AMM-layer complexity.

At high enough levels of greenspunity, the effort required to build and maintain your ad-hoc AMM layer becomes a black hole. You can’t actually make any progress on the application domain at all – when you try it’s like being nibbled to death by ducks.

Now consider this prospectively, from the point of view of someone like me who has architect skill. A lot of that skill is being pretty good at visualizing the data flows and structures – and thus estimating the greenspunity – implied by a problem domain. Before you’ve written any code, that is.

If you see the world that way, possible projects will be divided into “Yes, can be done in a language without AMM.” versus “Nope. Nope. Nope. Not a damn fool, it’s a black hole, ain’t nohow going there without AMM.”

This is why I said that if I were restricted to C, reposurgeon would never have happened at all. I wasn’t being hyperbolic – that evaluation comes from a cool and exact sense of how far reposurgeon’s problem domain floats above the greenspunity level where an ad-hoc AMM layer becomes a black hole. I shudder just thinking about it.

Of course, where that black-hole level of ad-hoc AMM complexity is varies by programmer. But, though software is sometimes written by people who are exceptionally good at managing that kind of hair, it then generally has to be maintained by people who are less so…

The really smart people in my audience have already figured out that this is why Ken Thompson, the co-designer of C, put AMM in Go, in spite of the latency issues.

Ken understands something large and simple. Software expands, not just in line count but in greenspunity, to meet hardware capacity and user demand. In languages like C and C++ we are approaching a point of singularity at which typical – not just worst-case – greenspunity is so high that the ad-hoc AMM becomes a black hole, or at best a trap nigh-indistinguishable from one.

Thus, Go. It didn’t have to be Go; I’m not actually being a partisan for that language here. It could have been (say) Ocaml, or any of half a dozen other languages I can think of. The point is the combination of AMM with compiled-code speed is ceasing to be a luxury option; increasingly it will be baseline for getting most kinds of systems work done at all.

Sociologically, this implies an interesting split. Historically the boundary between systems work under hard latency constraints and systems work without it has been blurry and permeable. People on both sides of it coded in C and skillsets were similar. People like me who mostly do out-of-kernel systems work but have code in several different kernels were, if not common, at least not odd outliers.

Increasingly, I think, this will cease being true. Out-of-kernel work will move to Go, or languages in its class. C – or non-AMM languages intended as C successors, like Rust – will keep kernels and real-time firmware, at least for the foreseeable future. Skillsets will diverge.

It’ll be a more fragmented systems-programming world. Oh well; one does what one must, and the tide of rising software complexity is not about to be turned.

For today, maybe — but the first time I had Greenspun’s Tenth quoted at me was in the late ’90s. [I know this was around/just before the first C++ standard, maybe contrasting it to this new upstart Java thing?] This was definitely during the era where big computers still did your serious work, and pretty much all of it was in either C, COBOL, or FORTRAN. [Yeah, yeah, I know– COBOL is all caps for being an acronym, while Fortran ain’t–but since I’m talking about an earlier epoch of computing, I’m going to use the conventions of that era.]

Now the Object-Oriented paradigm has really mitigated this to an enormous degree, but I seem to recall at that time the argument was that multimethod dispatch (a benefit so great you happily accept the flaw of memory management) was the Killer Feature of LISP.

Given the way the other advantage I would have given Lisp over the past two decades–anonymous functions [lambdas] and treating them as first-class values–are creeping into a more mainstream usage, I think automated memory management is the last visible “Lispy” feature people will associate with Greenspun. [What, are you now visualizing lisp macros? Perish the thought–anytime I see a foot cannon that big, I stop calling it a feature…]

After looking at the Linear Lisp paper, I think that is where Lutz Mueller got One Reference Only memory management from. For automatic memory management, I’m a big fan of ORO. Not sure how to apply it to a statically typed language though. Wish it was available for Go. ORO is extremely predictable and repeatable, not stuttery.

If Lutz was inspired by Linear Lisp, he didn’t cite it. Actually ORO is more like region-based memory allocation with a single region: values which leave the current scope are copied which can be slow if you’re passing large lists or vectors around.

Linear Lisp is something quite a bit different, and allows for arbitrary data structures with arbitrarily deep linking within, so long as there are no cycles in the data structures. You can even pass references into and out of functions if you like; what you can’t do is alias them. As for statically typed programming languages… well, there are linear type systems, which as lliamander mentioned are implemented in Clean.

Newlisp in general is… smack in the middle between Rust and Urbit in terms of cultishness of its community, and that scares me right off it. That and it doesn’t really bring anything to the table that couldn’t be had by “old” lisps (and Lutz frequently doubles down on mistakes in the design that had been discovered and corrected decades ago by “old” Lisp implementers).

For a long time I’ve been holding out hope for a ‘standard’ garbage collector library for C. But not gonna hold my breath. One probable reason Ken Thompson had to invent Go is to go around the tremendous difficulty in getting new stuff into C.

>For a long time I’ve been holding out hope for a ‘standard’ garbage collector library for C. But not gonna hold my breath.

Yeah, good idea not to. People as smart/skilled as you and me have been poking at this problem since the 1980s and it’s pretty easy to show that you can’t do better than Boehm–Demers–Weiser, which has limitations that make it impractical. Sigh…

I think it’s not about C. Let me cite a little bit from “The Go Programming Language” (A.Donovan, B. Kernigan) —
in the section about Go influences, it states:

«Rob Pike and others began to experiment with CSP implementations as actual languages. The first was called Squeak which provided a language with statically created channels. This was followed by Newsqueak, which offered C-like statement and expression syntax and Pascal-like type notation. It was a purely functional language with garbage collection, again aimed at managing keyboard, mouse, and window events. Channels became first-class values, dynamically created and storable in variables.

The Plan 9 operating system carried these ideas forward in a language called Alef. Alef tried to make Newsqueak a viable system programming language, but its omission of garbage collection made concurrency too painful.»

So my takeaway was that AMM was key to get proper concurrency.
Before Go, I dabbled with Erlang (which I enjoy, too), and I’d say there the AMM is also a key to have concurrency made easy.

(Update: the ellipsises I put into the citation were eaten by the engine and won’t appear when I tried to re-edit my comment; sorry.)

I think this is the key insight.
There are programs with zero MM.
There are programs with orderly MM, e.g. unzip does mallocs and frees in a stacklike formation, Malloc a,b,c, free c,b,a. (as of 1.1.4). This is laminar, not chaotic flow.

Then there is the complex, nonlinear, turbulent flow, chaos. You can’t do that in basic C, you need AMM. But it is easier in a language that includes it (and does it well).

Virtual Memory is related to AMM – too often the memory leaks were hidden (think of your O(n**2) for small values of n) – small leaks that weren’t visible under ordinary circumstances.

Still, you aren’t going to get AMM on the current Arduino variants. At least not easily.

That is where the line is, how much resources. Because you require a medium to large OS, or the equivalent resources to do AMM.

Yet this is similar to using FPGAs, or GPUs for blockchain coin mining instead of the CPU. Sometimes you have to go big. Your Cooper Mini might be great most of the time, but sometimes you need a Diesel big pickup. I think a Mini would fit in the bed of my F250.

C displaced assembler because it had the speed and flexibility while being portable.

Go, or something like it will displace C where they can get just the right features into the standard library including AMM/GC.

Maybe we need Garbage Collecting C. GCC?

One problem is you can’t do the pointer aliasing if you have a GC (unless you also do some auxillary bits which would be hard to maintain). void x = y; might be decodable but there are deeper and more complex things a compiler can’t detect. If the compiler gets it wrong, you get a memory leak, or have to constrain the language to prevent things which manipulate pointers when that is required or clearer.

C++11 shared_ptr does handle the aliasing case. Each pointer object has two fields, one for the thing being pointed to, and one for the thing’s containing object (or its associated GC metadata). A pointer alias assignment alters the former during the assignment and copies the latter verbatim. The syntax is (as far as a C programmer knows, after a few typedefs) identical to C.

The trouble with applying that idea to C is that the standard pointers don’t have space or time for the second field, and heap management isn’t standardized at all (free() is provided, but programs are not required to use it or any other function exclusively for this purpose). Change either of those two things and the resulting language becomes very different from C.

Eric, I love you, you’re a pepper, but you have a bad habit of painting a portrait of J. Random Hacker that is actually a portrait of Eric S. Raymond. The world is getting along with C just fine. 95% of the use cases you describe for needing garbage collection are eliminated with the simple addition of a string class … which nearly everyone has in their toolkit.

>The world is getting along with C just fine. 95% of the use cases you describe for needing garbage collection are eliminated with the simple addition of a string class … which nearly everyone has in their toolkit.

Even if you’re right, the escalation of complexity means that what I’m facing now, J. Random Hacker will face in a couple of years. Yes, not everybody writes reposurgeon…but a string class won’t suffice for much longer even if it does today.

The true nature of a hacker is not so much in being able to handle the most deep and complex situations, but in being able to recognize which situations are truly complex and in preference working hard to simplify and reduce complexity in preference to writing something to handle the complexity. Dealing with a slain dragon’s corpse is easier than one that is live, annoyed, and immolating anything within a few hundred yards. Some are capable of handling the latter. The wise knight prefers to reduce the problem to the former.

One of the epic fails of C++ is it being sold as C but where anyone could program because of all the safetys. Instead it created bloatware and the very memory leaks because the lesser programmers didn’t KNOW (grok, understand) what they were doing. It was all “automatic”.

This is the opportunity and danger of AMM/GC. It is a tool, and one with hot areas and sharp edges. Wendy (formerly Walter) Carlos had a law that said “Whatever parameter you can control, you must control”. Having a really good AMM/GC requires you to respect what it can and cannot do. OK, form a huge – into VM – linked list. Won’t it just handle everything? NO!. You have to think reference counts, at least in the back of your mind. It simplifys the problem but doesn’t eliminate it. It turns the black hole into a pulsar, but you still can be hit.

Many will gloss over and either superficially learn (but can’t apply) or ignore the “how to use automatic memory management” in their CS course. Like they didn’t bother with pointers, recursion, or multithreading subtleties.

I would say that there is a parallel between concurrency models and memory management approaches. Beyond a certain level of complexity, it’s simply infeasible for J. Random Hacker to implement a locks-based solution just as it is infeasible for Mr. Hacker to write a solution with manual memory management.

My worry is that by allowing the unsafe sharing of mutable state between goroutines, Go will never be able to achieve the per-process (i.e. language-level process, not OS-level) GC that would allow for really low latencies necessary for a AMM language to move closer into the kernel space. But certainly insofar as many “systems” level applications don’t require extremely low latencies, Go will probably viable solution going forward.

Putting aside the hard deadlines found in real-time systems programming, it has been empirically determined that a GC’d program requires five times as much memory as the equivalent program with explicit memory management. Applications which are both CPU- and RAM-intensive, where you need to have your performance cake and eat it in as little memory as possible, are thus severely constrained in terms of viable languages they could be implemented in. And by “severely constrained” I mean you get your choice of C++ or Rust. (C, Pascal, and Ada are on the table, but none offer quite the same metaprogramming flexibility as those two.)

I think your problems with reposturgeon stem from the fact that you’re just running up against the hard upper bound on the vector sum of CPU and RAM efficiency that a dynamic language like Python (even sped up with PyPy) can feasibly deliver on a hardware configuration you can order from Amazon. For applications like that, you need to forgo GC entirely and rely on smart pointers, automatic reference counting, value semantics, and RAII.

You mentioned that reposurgeon wouldn’t have been written under the constraints of C. But C++ is not C, and has an entirely different set of constraints. In practice, it’s not thst far off from Lisp, especially if you avail yourself of those wonderful features in C++1x. C++ programmers talk about “zero-cost abstractions” for a reason.

Semantically, programming in a GC’d language and programming in a language that uses smart pointers and RAII are very similar: you create the objects you need, and they are automatically disposed of when no longer needed. But instead of delegating to a GC which cleans them up whenever, both you and the compiler have compile-time knowledge of when those cleanups will take place, allowing you finer-grained control over how memory — or any other resource — is used.

Oh, that’s another thing: GC only has something to say about memory — not file handles, sockets, or any other resource. In C++, with appropriate types value semantics can be made to apply to those too and they will immediately be destructed after their last use. There is no special with construct in C++; you simply construct the objects you need and they’re destructed when they go out of scope.

This is how the big boys do systems programming. Again, Go has barely displaced C++ at all inside Google despite being intended for just that purpose. Their entire critical path in search is still C++ code. And it always will be until Rust gains traction.

As for my Lisp experience, I know enough to know that Lisp has utterly failed and this is one of the major reasons why. It’s not even a decent AI language, because the scruffies won, AI is basically large-scale statistics, and most practitioners these days use C++.

Modern C++ is a long way from C++ when it was first standardized in 1998. You should *never* be manually managing memory in modern C++. You want a dynamically sized array? Use std::vector. You want an adhoc graph? Use std::shared_ptr and std::weak_ptr.
Any code I see which uses new or delete, malloc or free will fail code review.
Destructors and the RAII idiom mean that this covers *any* resource, not just memory.
See the C++ Core Guidelines on resource and memory management: http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#S-resource

>Modern C++ is a long way from C++ when it was first standardized in 1998.

That’s correct. Modern C++ is a disaster area of compounded complexity and fragile kludges piled on in a failed attempt to fix leaky abstractions. 1998 C++ had the leaky-abstractions problem, but at least it was drastically simpler. Clue: complexification when you don’t even fix the problems is bad.

My experience dates from 2009 and included Boost – I was a senior dev on Battle For Wesnoth. Don’t try to tell me I don’t know what “modern C++” is like.

> My experience dates from 2009 and included Boost – I was a senior dev on Battle For Wesnoth. Don’t try to tell me I don’t know what “modern C++” is like.

C++ in 2009 with boost was C++ from 1998 with a few extra libraries. I mean that quite literally — the standard was unchanged apart from minor fixes in 2003.

C++ has changed a lot since then. There have been 3 standards issued, in 2011, 2014, and just now in 2017. Between them, there is a huge list of changes to the language and the standard library, and these are readily available — both clang and gcc have kept up-to-date with the changes, and even MSVC isn’t far behind. Even more changes are coming with C++20.

So, with all due respect, C++ from 2009 is not “modern C++”, though there certainly were parts of boost that were leaning that way.

>So, with all due respect, C++ from 2009 is not “modern C++”, though there certainly were parts of boost that were leaning that way.

But the foundational abstractions are still leaky. So when you tell me “it’s all better now”, I don’t believe you. I just plain do not.

I’ve been hearing this soothing song ever since around 1989. “Trust us, it’s all fixed.” Then I look at the “fixes” and they’re horrifying monstrosities like templates – all the dangers of preprocessor macros and a whole new class of Turing-complete nightmares, too! In thirty years I’m certain I’ll be hearing that C++2047 solves all the problems this time for sure, and I won’t believe a word of it then, either.

> Are array accesses bounds-checked? Don’t yammer about iterators; what happens if I say foo[3] and foo is dimension 2? Never mind, I know the answer.

You are right, bare arrays are not bounds-checked, but std::array provides an at() member function, so arr.at(3) will throw if the array is too small.

Also, ranged-for loops can avoid the need for explicit indexing lots of the time anyway.

> Are bare, untyped pointers still in the language? Never mind, I know the answer.

Yes, void* is still in the language. You need to cast it to use it, which is something that is easy to spot in a code review.

> Can I get a core dump from code that the compiler has statically checked and contains no casts? Never mind, I know the answer.

Probably. Is it possible to write code in any language that dies horribly in an unintended fashion?

> Yes, C has these problems too. But it doesn’t pretend not to, and in C I’m never afflicted by masochistic cultists denying that they’re problems.

Did I say C++ was perfect? This blog post was about the problems inherent in the lack of automatic memory management in C and C++, and thus why you wouldn’t have written reposurgeon if that’s all you had. My point is that it is easy to write C++ in a way that doesn’t suffer from those problems.

So what I am hearing in this is: “Use these new standards built on top of the language, and make sure every single one of your dependencies holds to them just as religiously are you are. And if anyone fails at any point in the chain you are doomed.”.

Using Go has been a revelation, so I mostly agree with Eric here. My only objection is to equating C++03/Boost with “modern” C++. I used both heavily, and given a green field, I would consider C++14 for some of these thorny designs that I’d never have used C++03/Boost for. It’s a qualitatively different experience. Just browse a copy of Scott Meyer’s _Effective Modern C++_ for a few minutes, and I think you’ll at least understand why C++14 users object to the comparison. Modern C++ enables better designs.

Alas, C++ is a multi-layered tool chest. If you stick to the top two shelves, you can build large-scale, complex designs with pretty good safety and nigh unmatched performance. Everything below the third shelf has rusty tools with exposed wires and no blade guards, and on large-scale projects, it’s impossible to keep J. Random Programmer from reaching for those tools.

My personal experience is that C++11 code (in particular, code that uses closures, deleted methods, auto (a feature you yourself recommended for C with different syntax), and the automatic memory and resource management classes) has fewer defects per developer-year than the equivalent C++03-and-earlier code.

This is especially so if you turn on compiler flags that disable the legacy features (e.g. -Werror=old-style-cast), and treat any legacy C or C++03 code like foreign language code that needs to be buried under a FFI to make it safe to use.

Qualitatively, the defects that do occur are easier to debug in C++11 vs C++03. There are fewer opportunities for the compiler to interpolate in surprising ways because the automatic rules are tighter, the library has better utility classes that make overloads and premature optimization less necessary, the core language has features that make templates less necessary, and it’s now possible to explicitly select or rule out invalid candidates for automatic code generation.

I can design in Lisp, but write C++11 without much effort of mental translation. Contrast with C++03, where people usually just write all the Lispy bits in some completely separate language (or create shambling horrors like Boost to try to bandaid over the missing limbs…boost::lambda, anyone?
Oh, look, since C++11 they’ve doubled down on something called boost::phoenix).

Does C++11 solve all the problems? Absolutely not, that would break compatibility. But C++11 is noticeably better than its predecessors. I would say the defect rates are now comparable to Perl with a bunch of custom C modules (i.e. exact defect rate depends on how much you wrote in each language).

C++ happily turned into complexity metatarpit with “Everything that could be implemented in STL with templates should, instead of core language”. And not deprecating/removing features, instead leaving there.

> That’s correct. Modern C++ is a disaster area of compounded complexity and fragile kludges piled on in a failed attempt to fix leaky abstractions. 1998 C++ had the leaky-abstractions problem, but at least it was drastically simpler. Clue: complexification when you don’t even fix the problems is bad.

I agree that there is a lot of complexity in C++. That doesn’t mean you have to use all of it. Yes, it makes maintaining legacy code harder, because the older code might use dangerous or complex parts, but for new code we can avoid the danger, and just stick to the simple, safe parts.

The complexity isn’t all bad, though. Part of the complexity arises by providing the ability to express more complex things in the language. This can then be used to provide something simple to the user.

Take std::variant as an example. This is a new facility from C++17 that provides a type-safe discriminated variant. If you have a variant that could hold an int or a string and you store an int in it, then attempting to access it as a string will cause an exception rather than a silent error. The code that *implements* std::variant is complex. The code that uses it is simple.

I won’t argue with you. C++ is error-prone (albeit less so than C) and horrid to work in. But for certain classes of algorithmically complex, CPU- and RAM-intensive problems it is literally the only viable choice. And it looks like performing surgery on GCC-scale repos falls into that class of problem.

I’m not even saying it was a bad idea to initially write reposurgeon in Python. Python and even Ruby are great languages to write prototypes or even small-scale production versions of things because of how rapidly they may be changed while you’re hammering out the details. But scale comes around to bite you in the ass sooner than most people think and when it does, your choice of language hobbles you in a way that can’t be compensated for by throwing more silicon at the problem. And it’s in that niche where C++ and Rust dominate, absolutely uncontested.

> How many times do I have to repeat “reposurgeon would never have been
> written under that constraint” before somebody who claims LISP
> experience gets it?

That speaks to your lack of experience with modern C++, rather than an inherent limitation. *You* might not have written reposurgeon under that constraint, because *you* don’t feel comfortable that you wouldn’t have ended up with a black-hole of AMM. That does not mean that others wouldn’t have or couldn’t have, and that their code would necessarily be an unmaintainable black hole.

In well-written modern C++, memory management errors are a solved problem. You can just write code, and know that the compiler and library will take care of cleaning up for you, just like with a GC-based system, but with the added benefit that it’s deterministic, and can handle non-memory resources such as file handles and sockets too.

In well-written assembler memory management errors are a solved problem. I hate this idiotic cant repetition about how if you’re just good enough for the language it won’t hurt you – it sweeps the actual problem under the rug while pretending to virtue.

> I hate this idiotic repetition about how if you’re just good enough for the language it won’t hurt you – it sweeps the actual problem under the rug while pretending to virtue.

It’s not about being “just good enough”. It’s about *not* using the dangerous parts. If you never use manual memory management, then you can’t forget to free, for example, and automatic memory management is *easy* to use. std::string is a darn sight easier to use than the C string functions, for example, and std::vector is a darn sight easier to use than dynamic arrays with new. In both cases, the runtime manages the memory, and it is *easier* to use than the dangerous version.

Every language has “dangerous” features that allow you to cause problems. Well-written programs in a given language don’t use the dangerous features when there are equivalent ones without the problems. The same is true with C++.

The fact that historically there are areas where C++ didn’t provide a good solution, and thus there are programs that don’t use the modern solution, and experience the consequential problems is not an inherent problem with the language, but it does make it harder to educate people.

> It’s about *not* using the dangerous parts. … Every language has “dangerous” features that allow you to cause problems. Well-written programs in a given language don’t use the dangerous features when there are equivalent ones without the problems.

Why not use a language that doesn’t have “‘dangerous’ features”?

NOTES: [1] I am not saying that Go is necessarily that language – I am not even saying that any existing language is necessarily that language.
[2] /me is being someplace between naive and trolling here.

Historically, it was because hardware was weak and expensive – you couldn’t afford the overhead imposed by those languages. Now it’s because the culture of software engineering has bad habits formed in those days and reflexively flinches from using higher-overhead safe languages, though it should not.

Our three main advantages, runtime efficiency, innovation opportunity, building on a base of millions of lines of code that run the internet and an international standard.

Our four main advantages…

More seriously, C++ enabled the STL, the STL transforms the approach of its users, with much increased reliability and readability, but no loss of performance. And at the same time your old code still runs. Now that is old stuff, and STL2 is on the way. Evolution.

The rate is obviously lower because I’ve written less code and library code only survives if it is sound. Are you suggesting that reusing code is a bad idea? Or that an indeterminate number of reimplementations of the same functionality is a good thing?

You’re not on the most productive path to effective criticism of C++ here.

This column is too narrow to have a decent discussion. WordPress should rewrite in C++ or I should dig out my Latin dictionary.

Seriously, extending the reach of libraries that become standardised is hard to criticise, extending the reach of the core language is.

It used to be a thing that C didn’t have built in functionality for I/O (for example) rather it was supplied by libraries written in C interfacing to a lower level system interface. This principle seems to have been thrown out of the window for Go and the others. I’m not sure that’s a long term win. YMMV.

But use what you like or what your cannot talk your employer out of using, or what you can get a job using. As long as it’s not Rust.

> Well-written programs in a given language don’t use the dangerous features

Some languages have dangerous features that are disabled by default and must be explicitly enabled prior to use. C++ should become one of those languages.

I am very fond of the ‘override’ keyword in C++11, which allows me to say “I think this virtual method overrides something, and don’t compile the code if I’m wrong about that.” Making that assertion incorrectly was a huge source of C++ errors for me back in the days when I still used C++ virtual methods instead of lambdas. C++11 solved that problem two completely different ways: one informs me when I make a mistake, and the other makes it impossible to be wrong.

Arguably, one should be able to annotate any C++ block and say “there shall be no manipulation of bare pointers here” or “all array access shall be bounds-checked here” or even “…and that’s the default for the entire compilation unit.” GCC can already emit warnings for these without human help in some cases.

Is this a good summary of your objections to C++ smart pointers as a solution to AMM?

1. Circular references. C++ has smart pointer classes that work when your data structures are acyclic, but it doesn’t have a good solution for circular references. I’m guessing that reposurgeon’s graphs are almost never DAGs.

2. Subversion of AMM. Bare news and deletes are still available, so some later maintenance programmer could still introduce memory leaks. You could forbid the use of bare new and delete in your project, and write a check-in hook to look for violations of the policy, but that’s one more complication to worry about and it would be difficult to impossible to implement reliably due to macros and the generally difficulty of parsing C++.

3. Memory corruption. It’s too easy to overrun the end of arrays, treat a pointer to a single object as an array pointer, or otherwise corrupt memory.

> Circular references. C++ has smart pointer classes that work when your data structures are acyclic, but it doesn’t have a good solution for circular references. I’m guessing that reposurgeon’s graphs are almost never DAGs.

General graphs with possibly-cyclical references are precisely the workload GC was created to deal with optimally, so ESR is right in a sense that reposturgeon _requires_ a GC-capable language to work. In most other programs, you’d still want to make sure that the extent of the resources that are under GC-control is properly contained (which a Rust-like language would help a lot with) but it’s possible that even this is not quite worthwhile for reposturgeon. Still, I’d want to make sure that my program is optimized in _other_ possible ways, especially wrt. using memory bandwidth efficiently – and Go looks like it doesn’t really allow that.

Why would reposurgeon’s graphs not be DAGs? Some exotic case that comes up with e.g. CVS imports that never arises in a SVN->Git conversion (admittedly the only case I’ve really looked deeply at)?

Git repos, at least, are cannot-be-cyclic-without-astronomical-effort graphs (assuming no significant advances in SHA1 cracking and no grafts–and even then, all you have to do is detect the cycle and error out). I don’t know how a generic revision history data structure could contain a cycle anywhere even if I wanted to force one in somehow.

The repo graph is, but a lot of the structures have reference loops for fast lookup. For example, a blob instance has a pointer back to the containing repo, as well as being part of the repo through a pointer chain that goes from the repo object to a list of commits to a blob.

Without those loops, navigation in the repo structure would get very expensive.

Aren’t these inherently “weak” pointers though? In that they don’t imply ownership/live data, whereas the “true” DAG references do? In that case, and assuming you can be sufficiently sure that only DAGs will happen, refcounting (ideally using something like Rust) would very likely be the most efficient choice. No need for a fully-general GC here.

> How many times do I have to repeat “reposurgeon would never have been written under that constraint” before somebody who claims LISP experience gets it?

Maybe it is true, but since you do not understand, or particularly wish to understand, Rust scoping, ownership, and zero cost abstractions, or C++ weak pointers, we hear you say that you would never write reposurgeon would never under that constraint.

Which, since no one else is writing reposurgeon, is an argument, but not an argument that those who do get weak pointers and rust scopes find all that convincing.

I am inclined to think that those who write C++98 (which is the gcc default) could not write reposurgeon under that constraint, but those who write C++11 could write reposurgeon under that constraint, and except for some rather unintelligible, complicated, and twisted class constructors invoking and enforcing the C++11 automatic memory management system, it would look very similar to your existing python code.

I’m sure that you understand the _gist_ of all of these notions quite accurately, and this alone is of course quite impressive for any developer – but this is not quite the same as being comprehensively aware of their subtler implications. For instance, both James and I have suggested to you that backpointers implemented as an optimization of an overall DAG structure should be considered “weak” pointers, which can work well alongside reference counting.

For that matter, I’m sure that Rustlang developers share your aversion to “barbed wire and landmines” in a programming language. You’ve criticized Rust before (not without some justification!) for having half-baked async-IO facilities, but I would think that reposturgeon does not depend significantly on async-IO.

>For instance, both James and I have suggested to you that backpointers implemented as an optimization of an overall DAG structure should be considered “weak” pointers, which can work well alongside reference counting.

Yes, I got that between the time I wrote my first reply and JAD brought it up. I’ve used Python weakrefs in similar situations. I would have seemed less dense if I’d had more sleep at the time.

>For that matter, I’m sure that Rustlang developers share your aversion to “barbed wire and landmines” in a programming language.

That acidulousness was mainly aimed at C++. Rust, if it implements its theory correctly (a point on which I am willing to be optimistic) doesn’t have C++’s fatal structural flaws. It has problems of its own which I won’t rehash as I’ve already anatomized them in detail.

There’s also development cost. I suspect that using eg. Python drastically reduces the cost for developing the code. And since most repositories are small enough that Eric hasn’t noticed accidental O(n**2) or O(n**3) algorithms until recently, it’s pretty obvious that execution time just plainly doesn’t matter. Migration is going to involve a temporary interruption to service and is going to be performed roughly once per repo. The amount of time involved in just stopping the eg. SVN service and bringing up the eg. GIT hosting service is likely to be longer than the conversion time for the median conversion operation.

So in these cases, most users don’t care about the run-time, and outside of a handful of examples, wouldn’t brush up against the CPU or memory limitations of a whitebox PC.

This is in contrast to some other cases in which I’ve worked such as file-serving (where latency is measured in microseconds and is actually counted), or large data processing (where wasting resources reduces the total amount of stuff everybody can do).

Hmmn, I wonder if the virtual memory of Linux (and Unix, and Multics) is really the OS equivalent of the automatic memory management of application programs? One works in pages, admittedly, not bytes or groups of bytes, but one could argue that the sub-page stuff is just expensive anti-internal-fragmentation plumbing…

–dave
[In polite Canajan, “I wonder” is the equivalent of saying “Hey everybody, look at this” in the US. And yes, I that’s also the redneck’s famous last words.]

In my experience, with most of my C systems programming in protocol stacks and transaction processing infrastructure, the MM problem has been one of code, not data structure complexity. The memory is often allocated by code which first encounters the need, and it is then passed on through layers and at some point, encounters code which determines the memory is no longer needed. All of this creates an implicit contract that he who is handed a pointer to something (say, a buffer) becomes responsible for disposing of it. But, there may be many places where that is needed – most of them in exception handling.

That creates many, many opportunities for some to simply forget to release it. Also, when the code is handed off to someone unfamiliar, they may not even know about the contract. Crises (or bad habits) lead to failures to document this stuff (or create variable names or clear conventions that suggest one should look for the contract).

I’ve also done a bunch of stuff in Java, both applications level (such as a very complex Android app with concurrency) and some infrastructural stuff that wasn’t as performance constrained. Of course, none of this was hard real-time although it usually at least needed to provide response within human limits, which GC sometimes caused trouble with. But, the GC was worth it, as it substantially reduced bugs which showed up only at runtime, and it simplified things.

On the side, I write hard real time stuff on tiny, RAM constrained embedded systems – PIC18F series stuff (with the most horrible machine model imaginable for such a simple little beast). In that world, there is no malloc used, and shouldn’t be. It’s compile time created buffers and structures for the most part. Fortunately, the applications don’t require advanced dynamic structures (like symbol tables) where you need memory allocation. In that world, AMM isn’t an issue.

PIC18F series stuff (with the most horrible machine model imaginable for such a simple little beast)
LOL. Glad I’m not the only one who thought that. Most of my work was on the 16F – after I found out what it took to do a simple table lookup, I was ready for a stiff drink.

>In my experience, with most of my C systems programming in protocol stacks and transaction processing infrastructure, the MM problem has been one of code, not data structure complexity.

I believe you. I think I gravitate to problems with data-structure complexity because, well, that’s just the way my brain works.

But it’s also true that I have never forgotten one of the earliest lessons I learned from Lisp. When you can turn code complexity into data structure complexity, that’s usually a win. Or to put it slightly differently, dumb code munching smart data beats smart code munching dumb data. It’s easier to debug and reason about.

Perhaps its because my coding experience has mostly been short python scripts of varying degrees of quick-and-dirtiness, but I’m having trouble grokking the difference between smart code/dumb data vs dumb code/smart data. How does one tell the difference?

Now, as I type this, my intuition says it’s more than just the scary mess of nested if statements being in the class definition for your data types, as opposed to the function definitions which munch on those data types; a scary mess of nested if statements is probably the former. The latter though…I’m coming up blank.

Perhaps a better question than my one above: what codebases would you recommend for study which would be good examples of the latter (besides reposurgeon)?

You almost said my favorite canned example: a big conditional block vs. a lookup table. The LUT can replace all the conditional logic with structured data and shorter (simpler, less bug-prone, faster, easier to read) unconditional logic that merely does the lookup. Concretely in Python, imagine a long list of “if this, assign that” replaced by a lookup into a dictionary. It’s still all “code”, but the amount of program logic is reduced.

So I would answer your first question by saying look for places where data structures are used. Then guesstimate how complex some logic would have to be to replace that data. If that complexity would outstrip that of the data itself, then you have a “smart data” situation.

To expand on this, it can even be worth to use complex code to generate that dumb lookup table. This is so because the code generating the lookup table runs before, and therefore separately, from the code using the LUT. This means that both can be considered in isolation more often; bringing the combined complexity closer to m+n than m*n.

Admittedly I have an SQL hammer and think everything is a nail, but why not would *every* program include a database, like the SQLLite that even comes bundled with Python distros, no sweat, and put that lookup table into it, not in a dictionary inside the code?

Of course the more you go in this direction the more problems you will have with unit testing, in case you want to do such a thing. Generally we SQL-hammer guys don’t do that much, because in theory any fuction can read any part of the database, making the whole database the potential “inputs” for every function.

That is pretty lousy design, but I think good design patterns for separations of concerns and unit testability are not yet really known for database driven software, I mean, for example, model-view-controller claims to be one, but actually fails as these can and should call each other. So you have in the “customer” model or controller a function to check if the customer has unpaid invoices, and decide to call it from the “sales order” controller or model to ensure such customers get no new orders registered. In the same “sales order” controller you also check the “product” model or controller if it is not a discontinued product and check the “user” model or controller if they have the proper rights for this operation and the “state” controller if you are even offering this product in that state and so on a gazillion other things, so if you wanted to automatically unit test that “register a new sales order” function you have a potential “input” space of half the database. And all that with good separation of concerns MVC patterns. So I think no one really figured this out yet?

I wonder why you talked about inventing an AMM-layer so much, but told nothing about the GC, which is available for C language. Why do you need to invent some AMM-layer in the first place, instead of just using the GC?
For example, Bigloo Scheme and The GNU Objective C runtime successfully used it, among many others.

Rust seems like a good fit for the cases where you need the low latency (and other speed considerations) and can’t afford the automation. Firefox finally got to benefit from that in the Quantum release, and there’s more coming. I wouldn’t dream of writing a browser engine in Go, let alone a highly-concurrent one. When you’re willing to spend on that sort of quality, Rust is a good tool to get there.

But the very characteristics necessary to be good in that space will prevent it from becoming the “default language” the way C was for so long. As much fun as it would be to fantasize about teaching Rust as a first language, I think that’s crazy talk for anything but maybe MIT. (And I’m not claiming it’s a good idea even then; just saying that’s the level of student it would take for it to even be possible.) Dunno if Go will become that “default language” but it’s taking a decent run at it; most of the other contenders I can think of at the moment have the short-term-strength-yet-long-term-weakness of being tied to a strong platform already. (I keep hearing about how Swift is going to be usable off of Apple platforms real soon now… just around the corner… just a bit longer….)

>Dunno if Go will become that “default language” but it’s taking a decent run at it; most of the other contenders I can think of at the moment have the short-term-strength-yet-long-term-weakness of being tied to a strong platform already.

I really think the significance of Go being an easy step up from C cannot be overestimated – see my previous blogging about the role of inward transition costs.

Ken Thompson is insidiously clever. I like channels and subroutines and := but the really consequential hack in Go’s design is the way it is almost perfectly designed to co-opt people like me – that is, experienced C programmers who have figured out that ad-hoc AMM is a disaster area.

Go probably owes as much to Rob Pike and Phil Winterbottom for its design as it does to Thompson — because it’s basically Alef with the feature whose lack, according to Pike, basically killed Alef: garbage collection.

I don’t know that it’s “insidiously clever” to add concurrency primitives and GC to a C-like language, as concurrency and memory management were the two obvious banes of every C programmer’s existence back in the 90s — so if Go is “insidiously clever”, so is Java. IMHO it’s just smart, savvy design which is no small thing; languages are really hard to get right. And in the space Go thrives in, Go gets a lot right.

First, `pure` functions and transitive `const`, which make code so much easier to reason about

Second. Allmost entire language is available at compile time. That, combined with templates, enables crazy (in a good way) stuff, like building optimized state machine for regex at compile-time. Given, regex pattern is known at compile time, of course. But that’s pretty common.
Can’t find it now, but there were bechmarks, which show it’s faster than any run-time built regex engine out there. Still, source code is pretty straightforward – one don’t have to be Einstein to write code like that [1].

There is a talk by Andrei Alexandrescu called “Fastware” were he show how various metaprogramming facilities enable useful optimizations [2].
And a more recent talk, “Design By Introspection” [3], were he shows how these facilities enable much more compact designs and implementaions.

As the greenspunity rises, you are likely to find that more and more of your effort and defect chasing is related to the AMM layer, and proportionally less goes to the application logic. Redoubling your effort, you increasingly miss your aim.

Even when you’re merely at the edge of this trap, your defect rates will be dominated by issues like double-free errors and malloc leaks. This is commonly the case in C/C++ programs of even low greenspunity.

Interesting. This certainly fits my experience.

Has anybody looked for common patterns in whatever parasitic distractions plague you when you start to reach the limits of a language with AMM?

I went through a phase earlier this year where I tried to eliminate the concept of an errno entirely (and failed, in the end reinventing lisp, badly), but sometimes I still think – to the tune of flight of the valkeries – “Kill the errno, kill the errno, kill the ERRno, kill the err!’

I have on several occasions been part of big projects using languages with AMM, many programmers, much code, and they hit scaling problems and died, but it is not altogether easy to explain what the problem was.

But it was very clear that the fact that I could get a short program, or a quick fix up and running with an AMM much faster than in C or C++ was failing to translate into getting a very large program containing far too many quick fixes up and running.

Insightful, but I think you are missing a key point about Lisp and Greenspunning.

AMM is not the only thing that Lisp brings on the table when it comes to dealing with Greenspunity. Actually, the whole point of Lisp is that there is not _one_ conceptual barrier to development, or a few, or even a lot, but that there are _arbitrarily_many_, and that is why you need be able to extend your language through _syntactic_abstraction_ to build DSLs so that every abstraction layer can be written in a language that is fit that that layer. [Actually, traditional Lisp is missing the fact that DSL tooling depends on _restriction_ as well as _extension_; but Haskell types and Racket languages show the way forward in this respect.]

That is why all languages without macros, even with AMM, remain “blub” to those who grok Lisp. Even in Go, they reinvent macros, just very badly, with various preprocessors to cope with the otherwise very low abstraction ceiling.

(Incidentally, I wouldn’t say that Rust has no AMM; instead it has static AMM. It also has some support for macros.)

To the extent that the compiler’s insertion of calls to free() can be easily deduced from the code without special syntax, the insertion is merely an optimization of the sort of standard AMM semantics that, for example, a PyPy compiler could do.

To the extent that the compiler’s ability to insert calls to free() requires the sort of special syntax about borrowing that means that the programmer has explicitly described a non-stack-based scope for the variable, the memory management isn’t automatic.

Perhaps this is why a google search for “static AMM” doesn’t return much.

In Rust, as in C++ or even C, references have value semantics. That is to say any copies of a given reference are considered to be “the same”. You don’t have to “explicitly describe a non-stack-based scope for the variable”, but the hitch is that there can be one, and only one, copy of the original reference to a variable in use at any time. In Rust this is called ownership, and only the owner of an object may mutate it.

Where borrowing comes in is that functions called by the owner of an object may borrow a reference to it. Borrowed references are read-only, and may not outlast the scope of the function that does the borrowing. So everything is still scope-based. This provides a convenient way to write functions in such a way that they don’t have to worry about where the values they operate on come from or unwrap any special types, etc.

If you want the scope of a reference to outlast the function that created it, the way to do that is to use a std::Rc, which provides a regular, reference-counted pointer to a heap-allocated object, the same as Python.

The borrow checker checks all of these invariants for you and will flag an error if they are violated. Since worrying about object lifetimes is work you have to do anyway lest you pay a steep price in performance degradation or resource leakage, you win because the borrow checker makes this job much easier.

Rust does have explicit object lifetimes, but where these are most useful is to solve the problem of how to have structures, functions, and methods that contain/return values of limited lifetime. For example declaring a struct Foo { x: &'a i32 } means that any instance of struct Foo is valid only as long as the borrowed reference inside it is valid. The borrow checker will complain if you attempt to use such a struct outside the lifetime of the internal reference.

This is just a cost of doing business. Hacker culture has, for decades, tried to claim it was inclusive and nonjudgemental and yada yada — “it doesn’t matter if you’re a brain in a jar or a superintelligent dolphin as long as your code is good” — but when it comes to actually putting its money where its mouth is, hacker culture has fallen far short. Now that’s changing, and one of the side effects of that is how we use language and communicate internally, and to the wider community, has to change.

But none of this has to do with automatic memory management. In Rust, management of memory is not only fully automatic, it’s “have your cake and eat it too”: you have to worry about neither releasing memory at the appropriate time, nor the severe performance costs and lack of determinism inherent in tracing GCs. You do have to be more careful in how you access the objects you’ve created, but the compiler will assist you with that. Think of the borrow checker as your friend, not an adversary.

Present day C++ is far from C++ when it was first institutionalized in 1998. You should *never* be physically overseeing memory in present day C++. You need a powerfully measured cluster? Utilize std::vector. You need an adhoc diagram? Utilize std::shared_ptr and std::weak_ptr.

Any code I see which utilizes new or erase, malloc or through and through freedom fall flat code audit.

What makes you refer to this as a systems programming project? It seems to me to be a standard data-processing problem. Data in, data out. Sure, it’s hella complicated and you’re brushing up against several different constraints.

In contrast to what I think of as systems programming, you have automatic memory management. You aren’t working in kernel-space. You aren’t modifying the core libraries or doing significant programmatic interface design.

Never user-facing. Often scripted. Development-support tool. Used by systems programmers.

I realize we’re in an area where the “systems” vs. “application” distinction gets a little tricky to make. I hang out in that border zone a lot and have thought about this. Are GPSD and ntpd “applications”? Is giflib? Sure, they’re out-of-kernel, but no end-user will ever touch them. Is GCC an application? Is apache or named?

Inside kernel is clearly systems. Outside it, I think the “systems” vs. “application” distinction is about the skillset being applied and who your expected users are than anything else.

I would not be upset at anyone who argued for a different distinction. I think you’ll find the definitional questions start to get awfully slippery when you poke at them.

What makes you refer to this as a systems programming project? It seems to me to be a standard data-processing problem. Data in, data out. Sure, it’s hella complicated and you’re brushing up against several different constraints.

When you’re talking about Unix, there is often considerable overlap between “systems” and “application” programming because the architecture of Unix, with pipes, input and output redirection, etc., allowed for essential OS components to be turned into simple, data-in-data-out user-space tools. The functionality of ls, cp, rm, or cat, for instance, would have been built into the shell of a pre-Unix OS (or many post-Unix ones). One of the great innovations of Unix is to turn these units of functionality into standalone programs, and then make spawning processes cheap enough to where using them interactively from the shell is easy and natural. This makes extending the system, as accessed through the shell, easy: just write a new, small program and add it to your PATH.

So yeah, when you’re working in an environment like Unix, there’s no bright-line distinction between “systems” and “application” code, just like there’s no bright-line distinction between “user” and “developer”. Unix is a tool for facilitating humans working with computers. It cannot afford to discriminate, lest it lose its Unix-nature. (This is why Linux on the desktop will never be a thing, not without considerable decay in the facets of Linux that made it so great to begin with.)

@tz: you aren’t going to get AMM on the current Arduino variants. At least not easily.

At the upper end you can; the Yun has 64 MB, as do the Dragino variants. You can run OpenWRT on them and use its Python (although the latest OpenWRT release, Chaos Calmer, significantly increased its storage footprint from older firmware versions), which runs fine in that memory footprint, at least for the kinds of things you’re likely to do on this type of device.

Go binaries are statically linked, so the best approach is probably to install Go on your big PC, cross compile, and push the binary out to the device. Cross-compiling is a doddle; simply set GOOS and GOARCH.

> you are likely to find that more and more of your effort and defect chasing is related to the AMM layer

But the AMM layer for C++ has already been written and debugged, and standards and idioms exist for integrating it into your classes and type definitions.

Once built into your classes, you are then free to write code as if in a fully garbage collected language in which all types act like ints.

C++14, used correctly, is a metalanguage for writing domain specific languages.

Now sometimes building your classes in C++ is weird, nonobvious, and apt to break for reasons that are difficult to explain, but done correctly all the weird stuff is done once in a small number of places, not spread all over your code

Interesting thesis .. it was the ‘extra layer of goodness’ surrounding file operations, and not memory management, that persuaded me to move from C to Perl about twenty years ago. Once I’d moved, I also appreciated the memory management in the shape of ‘any size you want’ arrays, hashes (where had they been all my life?) and autovivification — on the spot creation of array or hash elements, at any depth.

While C is a low-level language that masquerades as a high-level language, the original intent of the language was to make writing assembler easier and faster. It can still be used for that, when necessary, leaving the more complicated matters to higher level languages.

Autovivification saves you much effort, thought, and coding, because most of the time the perl interpreter correctly divines your intention, and does a pile of stuff for you, without you needing to think about it.

And then it turns around and bites you because it does things for you that you did not intend or expect.

The larger the program, and the longer you are keeping the program around, the more it is a problem. If you are writing a quick one off script to solve some specific problem, you are the only person who is going to use the script, and are then going to throw the script away, fine. If you are writing a big program that will be used by lots of people for a long time, autovivification, is going to turn around and bit you hard, as are lots of similar perl features where perl makes life easy for you by doing stuff automagically.

With the result that there are in practice very few big perl programs used by lots of people for a long time, while there are an immense number of very big C and C++ programs used by lots of people for a very long time.

On esr’s argument, we should never be writing big programs in C any more, and yet, we are.

I have been part of big projects with many engineers using languages with automatic memory management. I noticed I could get something up and running in a fraction of the time that it took in C or C++.

And yet, somehow, strangely, the projects as a whole never got successfully completed. We found ourselves fighting weird shit done by the vast pile of run time software that was invisibly under the hood automatically doing stuff for us. We would be fighting mysterious and arcane installation and integration issues.

This, my personal experience, is the exact opposite of the outcome claimed by esr.

Oh, dear Goddess, no wonder. All three of those languages are notorious sinkholes – they’re where “maintainability” goes to die a horrible and lingering death.

Now I understand your fondness for C++ better. It’s bad, but those are way worse at any large scale. AMM isn’t enough to keep you out of trouble if the rest of the language is a tar-pit. Those three are full of the bones of drowned devops victims.

Yes, Java scales better. CPython would too from a pure maintainability standpoint, but it’s too slow for the kind of deployment you’re implying – on the other hand, PyPy might not be, I’m finding the JIT compilation works extremely well and I get runtimes I think are within 2x or 3x of C. Go would probably be da bomb.

Oh, dear Goddess, no wonder. All three of those languages are notorious sinkholes – they’re where “maintainability” goes to die a horrible and lingering death.

Can confirm — Visual Basic (6 and VBA) is a toilet. An absolute cesspool. It’s full of little gotchas — such as non-short-circuiting AND and OR operators (there are no differentiated bitwise/logical operators) and the cryptic Dir() function that exactly mimics the broken semantics of MS-DOS’s directory-walking system call — that betray its origins as an extended version of Microsoft’s 8-bit BASIC interpreter (the same one used to write toy programs on TRS-80s and Commodores from a bygone era), and prevent you from writing programs in a way that feels natural and correct if you’ve been exposed to nearly anything else.

VB is a language optimized to a particular workflow — and like many languages so optimized as long as you color within the lines provided by the vendor you’re fine, but it’s a minefield when you need to step outside those lines (which happens sooner than you may think). And that’s the case with just about every all-in-one silver-bullet “solution” I’ve seen — Rails and PHP belong in this category too.

It’s no wonder the cuddly new Microsoft under Nadella is considering making Python a first-class extension language for Excel (and perhaps other Office apps as well).

Visual Basic .NET is something quite different — a sort of Microsoft-flavored Object Pascal, really. But I don’t know of too many shops actually using it; if you’re targeting the .NET runtime it makes just as much sense to just use C#.

As for Perl, it’s possible to write large, readable, maintainable code bases in object-oriented Perl. I’ve seen it done. BUT — you have to be careful. You have to establish coding standards, and if you come across the stereotype of “typical, looks-like-line-noise Perl code” then you have to flunk it at code review and never let it touch prod. (Do modern developers even know what line noise is, or where it comes from?) You also have to choose your libraries carefully, ensuring they follow a sane semantics that doesn’t require weirdness in your code. I’d much rather just do it in Python.

VB.NET is unusued in the kind of circles *you know* because these are competitive and status-conscious circles and anything with BASIC in the name is so obviously low-status and just looks so bad on the resume that it makes sense to add that 10-20% more effort and learn C#. C# sounds a whole lot more high status, as it has C in the name so obvious it looks like being a Real Programmer on the resume.

What you don’t know is what happens outside the circles where professional programmers compete for status and jobs.

I can report that there are many “IT guys” who are not in these circles, they don’t have the intra-programmer social life hence no status concerns, nor do they ever intend apply for Real Programmer jobs. They are just rural or not first worlder guys who grew up liking computers, and took a generic “IT guy” job at some business in a small town and there they taught themselves Excel VBscript when the need arised to automate some reports, and then VB.NET when it was time to try to build some actual application for in-house use. They like it because it looks less intimidating – it sends out those “not only meant for Real Programmers” vibes.

I wish we lived in a world where Python would fill that non-intimidating amateur-friendly niche, as it could do that job very well, but we are already on a hell of a path dependence. Seriously, Bill Gates and Joel Spolsky got it seriously right when they made Excel scriptable. The trick is how to provide a smooth transition between non-programming and programming.

One classic way is that you are a sysadmin, you use the shell, then you automate tasks with shell scripts, then you graduate to Perl.

One, relatively new way is that you are a web designer, write HTML and CSS, and then slowly you get dragged, kicking and screaming into JavaScript and PHP.

The genius was that they realized that a spreadsheet is basically modern paper. It is the most basic and universal tool of the office drone. I print all my automatically generated reports into xlsx files, simply because for me it is the “paper” of 2017, you can view it on any Android phone, and unlike PDF and like paper you can interact and work with the figures, like add other numbers to them.

So it was automating the spreadsheet, the VBScript Excel macro that led the way from not-programming to programming for an immense number of office drones, who are far more numerous than sysadmins and web designers.

Aaand… I think it was precisely because of those microcomputers, like the Commodore. Out of every 100 office drone in 1991 or so, 1 or 2 had entertained themselves in 1987 typing in some BASIC programs published in computer mags. So when they were told Excel is programmable with a form of BASIC they were not too intidimated…

This created such a giant path dependency that still if you want to sell a language to millions and millions of not-Real Programmers you have to at least make it look somewhat like Basic.

I think from this angle it was a masterwork of creating and exploiting path dependency. Put BASIC on microcomputers. Have a lot of hobbyists learn it for fun. Create the most universal office tool. Let it be programmable in a form of BASIC – you can just work on the screen, let it generate a macro and then you just have to modify it. Mostly copy-pasting, not real programming. But you slowly pick up some programming idioms. Then the path curves up to VB and then VB.NET.

To challenge it all, one needs to find an application area as important as number cruching and reporting in an office: Excel is basically electronic paper from this angle and it is hard to come up with something like this. All our nearly computer illiterate salespeople use it. (90% of the use beyond just typing data in a grid is using the auto sum function.) And they don’t use much else than that and Word and Outlook and chat apps.

Anyway suppose such a purpose can be found, then you can make it scriptable in Python and it is also important to be able to record a macro so that people can learn from the generated code. Then maybe that dominance can be challenged.

TIOBE says that while VB.NET saw an uptick in popularity in 2011, it’s on its way down now and usage was moribund before then.

In your attempt to reframe my statements in your usual reference frame of Academic Programmer Bourgeoisie vs. Office Drone Proletariat, you missed my point entirely: VB.NET struggled to get a foothold during the time when VB6 was fresh in developers’ minds. It was too different (and too C#-like) to win over VB6 devs, and didn’t offer enough value-add beyond C# to win over the people who would’ve just used C# or Java.

I have been part of big projects with many engineers using languages with automatic memory management. I noticed I could get something up and running in a fraction of the time that it took in C or C++.

And yet, somehow, strangely, the projects as a whole never got successfully completed. We found ourselves fighting weird shit done by the vast pile of run time software that was invisibly under the hood automatically doing stuff for us. We would be fighting mysterious and arcane installation and integration issues.

Sounds just like every Ruby on Fails deployment I’ve ever seen. It’s great when you’re slapping together Version 0.1 of a product or so I’ve heard. But I’ve never joined a Fails team on version 0.1. The ones I saw were already well-established, and between the PFM in Rails itself, and the amount of monkeypatching done to system classes, it’s very, very hard to reason about the code you’re looking at. From a management level, you’re asking for enormous pain trying to onboard new developers into that sort of environment, or even expand the scope of your product with an existing team, without them tripping all over each other.

cppcoreguidelines-* and modernize-* will catch most of the issues that esr complains about, in practice usually all of them, though I suppose that as the project gets bigger, some will slip through.

Remember that gcc and g++ is C++98 by default, because of the vast base of old fashioned C++ code which is subtly incompatible with C++11, C++11 onwards being the version of C++ that optionally supports memory safety, hence necessarily subtly incompatible.

To turn on C++11

Place

cmake_minimum_required(VERSION 3.5)
# set standard required to ensure that you get
# the same version of C++ on every platform
# as some environments default to older dialects
# of C++ and some do not.
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

Originally i made this system because i wanted to test programming a micro kernel OS, with protected mode, PCI bus, usb, ACPI etc, and i didn’t want to get close to the ‘event horizon’ of memory mannagement in C.

But i didn’t wait the Greenspun law to kick in, so i first developped a safe memory system as a runtime, and replaced the standard C runtime and memory mannagement with it.

I wanted zero seg fault or memory error possible at all anywhere in the C code. Because debuguing bare metal exception, without debugger, with complex data structures made in C look very close to the black hole.

I didn’t want to use C++ because C++ compiler have very unpredictible binary format and function name decoration, which make it much harder to interface with at kernel level.

I wanted also some system as efficient as possible to mannage lockless shared access between thread of the whole memory as much as possible, to avoid the ‘exclusive borrow’ syndrome of rust, with global variables shared between threads with lockless algorithm to access them.

I took inspiration from the algorithm on this site http://www.1024cores.net/ to develop the basic system, with strong references as the norm, and direct ‘bare pointer’ only as weak references for fast access to memory in C.

What i ended doing is basically a ‘strongly typed hashmap DAG’ to store object references hierarchy, which can be manipulated using ‘lambda expressions’, in sort that applications can manipulate objects in a indirect manner only through the DAG abstraction, without having to manipulate bare pointers at all.

This also make a mark and sweep garbage collector easier to do, especially with an ‘event based’ system, the main loop can call the garbage collector between two executions of event/messages handlers, which has the advantage that it can be made at a point where there is no application data on the stack to mark, so it avoid mistaking application data in the stack for a pointer. All references that are only in stack variables can get automatically garbage collected when the function exit, much like in C++ actually.

The garbage collector can still be called by the allocator when there is OOM error, it will attempt a garbage collection before failing the allocation, but all references in the stack should be garbage collected when the function return to the main loop and the garbage collector is run.

As all the references hierarchy is expressed explicity in the DAG, there shouldn’t be any pointer stored in the heap, outside of the module’s data section, which correspond to C global variables that are used as the ‘root element’ of object hierarchy, which can be traversed to find all the actives references to heap data that the code can potentially use. A quick system could be made for that the compiler can automatically generate a list of the ‘root references’ in the global variables, to avoid memory leak if some global data can look like a reference.

As each thread have their own heap, it also avoid the ‘stop the world syndrome’, all threads can garbage collect their own heap, and there is already some system of lockless synchronisation to access references based on expression in the DAG, to avoid having to rely only on ‘bare pointers’ to manipulate object hierarchy, which allow dynamic relocation, and make it easier to track active references.

It’s also very useful to track memory leak, as the allocator can keep the time of each memory allocation, it’s easy to see all the allocations that happenned between two points of the program, and dump all their hierarchy and property only from the ‘bare reference’.

Each thread contain two heaps, one which is manually mannaged, mostly used for temporary strings , or IO buffers, and the other heap which can be mannaged either with atomic reference count, or mark and sweep.

With this system, C program rarely have to use directly malloc/free, nor to manipulate pointers to allocated memory directly, other than for temporary buffer allocation, like a dynamic stack, for io buffers or temporary strings who can easily be mannaged manually. And all the memory manipulation can be made via a runtime which keep track internally of pointer address and size, data type and eventually a ‘finalizer’ function that will be callled when the pointer is freed,

Since i started to use this system to make C programs, alongside with my own ABI which can dynamically link binaries compiled with visual studio and gcc together, i tested it for many different use case, i could make a mini multi thread window mannager/UI, with aysnc irq driven HID driver events, and a system of distributed application based on blockchain data, which include multi thread http server who can handle parrallel json/rpc calls, with an abstraction of applications stack via custom data type definition / scripts stored on the blockchain, and i have very little problem of memory, albeit it’s 100% in C, multi threaded and deal with heavily dynamic data.

With the mark and sweep mode, it can become quite easy to develop multi thread applications with good level of concurrency, even to do simple database system, driven by a script over asynch http/json/rpc, without having to care about complex memory mannagement.

Even with the reference count mode, the manipulation of references is explicit, and it should not be to hard to detect leaks with simple parsers, i already did test with antlr C parser, with a visitor class to parse the grammar and detect potentially errors, as all memory referencing happen through specific type instead of bare pointers, it’s not too hard to detect potential memory leak problem with a simple parser.

I have one question, do you even need global AMM? Get one of element of graph, when it will/should be released in your reposugeon? Over all I think it is never because it usually link with other from this graph. Overall do you check how many objects are created and released during operations? I do not mean some temporal strings but object representing main working set.

Depending on answer it could be if you load some graph element and it will stay indefinitely in memory then this could easy be converted to C/C++ by simply never using `free` for graph elements (and all problems with memory management goes out of the windows).
If they should be released early then when it should happened? Do you have some code in reposurgeon that purge not needed objects when not needed any more? Depend on simply accessibility of some object do not mean it needed, many times is quite opposite.

I now working on C# application that had similar bungle like this and previous developers “solution” was to restarting whole application instead of fixing lifetime problems. Correct solution was C++ like code, I create object, do work and purge it explicitly. With this non of components have memory issues now. Of corse problem there lay with lack of knowing tools they use and not complexity of domain, but did you do analysis what is needed and what not, and how long? AMM do not solve this.

btw I big fan of lisp that is in C++11 aka templates, great pure functional language :D

If I understood this correctly situations look like:
I have processes that loaded repo A, B and C and active working on each one.
Now because of some demand we need load repo D.
After we are done we back to A, B and C.
Now question is should be D data be purged?
If there are memory connection form previous repos then it will stay in memory if not then AMM will remove all data from memory.
If this is complex graph when you have access to any element the you can crawl to any other element of this graph (this is simplification but probably safe assumption).
In first case (there is connection) is equivalent to not using `free` in C. Of corse if not all graph is reachable then there will be partial purge of it memory (let say that 10% will stay), but what will happens when you need again load repo D? Current data avaialbe is hidden deep in other graphs and most of data is lost do AMM. you need load everything again and now repo D size is 110%.

In case there is not connection between repos A, B, C and repo D then we can free it entirely.
This is easy done in C++ (some kind of smart pointer that know if it pointing same repo or other).

I only disagree with word `insane`, C++ have lot of problems like UB, lot of corner cases, leaking abstraction, whole crap form C (and my favorite: 1000 line errors from templates), but is not insane to work with memory problems.

You can easy create tools that make all this problems bearable, and this is biggest flaw in C++, many problems are solvable but not out of box. C++ is good on crating abstraction:https://www.youtube.com/watch?v=sPhpelUfu8Q
That will fit your domain then it will not leak much because it fit right the underling problem.
And you can enforce lot of things that allow you to reason locally about behavior of program.

In case of creating this new abstraction is indeed insane then I think you have problems in Go too because only problem that AMM solve is reachability of memory and how long you need for it.

btw best thing that show difference between C++03 and C++11 is `std::vector<std::vector>`, in C++03 this is insane stupid and in C++11 is insane clever because it have performance characteristic of `std::vector` (thanks to `std::move`) and no problems with memory management (keep index stable and use `v.at(i).at(j).x = 5;` or warp it in helper class and use `v[i][j].x` that will throw on wrong index).