Violating the “smart enum” pattern in C#

For a while now, I’ve been a big fan of a pattern in C# which mimics Java enums to a certain extent. In general, it’s a lovely pattern. Only after reading a comment on a recent blog post by Eric Lippert did I find out about a horrible flaw. Dubious thanks to John Payson for prompting this post. (And I fully intend to include this in the Abusing C# talk that I give every so often…)

Trigger warning: this post contains code you may wish you’d never seen.

Let’s start off with the happy path – a moment of calm before the storm.

When things go well…

The idea is to create a class – which can be abstract – with a certain fixed set of instances, which are available via static properties on the class itself. The class is not sealed, but it prevents other classes from subclassing it by using a private constructor. Here’s an example, including how it can be used:

(Note the use of the new “automatically implemented read-only property” from C# 6. Yum!)

Obviously there are possible variations of this – there could be multiple instances of one subclass, or we could even create instances dynamically if that were useful. The main point is that no other code can create instances or even declare their own subclasses of Operation. That private constructor stops everything, right? Um…

My eyes! The goggles do nothing!

I’m sure all my readers are aware that every C# constructor either has to chain to another constructor using : this(...) or chain to a base class constructor, either implicitly calling a constructor without arguments or explicitly using : base(...).

What I wasn’t aware of – or rather, I think I’d been aware of once, but ignored – was that constructor initializers can have cycles. In other words, it’s absolutely fine to write code like this:

So now, we have a class where you can call the constructor, and an exception will be thrown. But hey, you still can’t get at the instance which was created, right? Well, this is tricky. We genuinely can’t use this anywhere in code which we expect to execute – there’s no way of reaching a constructor body, because in order to do that we’d have to have some route which took us through the base class constructor, which was can’t access. Field initializers aren’t allow to use this, either explicitly or implicitly (e.g. by calling instance methods). If there’s a way of circumventing that, I’m not aware of it.

So, we can’t capture a reference to the instance before throwing the exception. And after the exception is thrown, it’s garbage, and therefore unavailable… unless we take it out of the garbage bin, of course. Enter a finalizer…

Admittedly if multiple threads call GetInstance() at a time, there’s no guarantee which thread will get which instance, or whether they’ll get the same instance… but that’s fine. There’s very little to guarantee this will ever terminate, in fact… but I don’t care much, as clearly I’m never going to use this code in production. It works on my machine, for the few times I’ve been able to bear running it. Here’s a complete example (just combine it with the Operation class as before) in case you want to run it yourself:

Fixing the problem

The first issue is the cyclic constructors. That’s surely a mistake in the language specification. (It’s not a bug in the compiler, as far as I can tell – I can’t find anything in the spec to prohibit it.) This can be fixed, and it shouldn’t even be too hard to do. There’s nothing conditional in constructor initializers; each constructor chains to exactly one other constructor, known via overload resolution at compile time. Even dynamic doesn’t break this – constructor initializers can’t be dispatched dynamically. This makes detecting a cycle trivial – from each constructor, just follow the chain of constructor calls until either you reach a base class constructor (success) or you reach one of the constructor declarations already in the chain (failure). I will be pestering Mads for this change – it may be too late for C# 6, but it’s never too early to ask about C# 7. This will be a breaking change – but it will only break code which is tragically broken already, always resulting in either an exception or a stack overflow.

Of course, a C# language change wouldn’t change the validity of such code being created through IL. peverify appears to be happy with it, for example. Still, a language fix would be a good start.

There’s a second language change which could help to fix this too – the introduction of private abstract/virtual methods. Until you consider nested classes, neither of those make sense – but as soon as you remember that there can be subclasses with access to private members, it makes perfect sense to have them. If Operation had a private abstract method (probably exposed via a public concrete method) then SneakyOperation (and any further subclasses) would have to be abstract, as it couldn’t possibly override the method. I’ve already blogged about something similar with public abstract classes containing internal abstract methods, which prevents concrete subclasses outside that assembly… but this would restrict inheritance even further.

Does it reach the usefulness bar required for new language features? Probably not. Would it be valid IL with the obvious meaning? I’ve performed experiments which suggest that it would be valid, but not as restrictive as one would expect – but I could be mistaken. I think there’s distinct merit in the idea, but I’m not expecting to see it come to pass.

Conclusion

So, what can you take away from this blog post? For starters, the fact that the language can probably be abused in far nastier ways than you’ve previously imagined – and that it’s very easy to miss the holes in the spec which allow for that abuse. (It’s not like this hole is a recent one… it’s been open forever.) You should absolutely not take away the idea that this is a recommended coding pattern – although the original pattern (for Operation) is a lovely one which deserves more widespread recognition.

Finally, you can take away the fact that nothing gets my attention quicker than a new (to me) way of abusing the language.

I can’t think of any code I’ve written in C# that just screams for the pattern you used in Operator, but it looks extremely similar to how F#’s discriminated unions after they’ve been tossed into a decompiler. Do you know if there’s a proper name for the pattern?

I guess “Jon Skeet’s Almost Java Enums pattern” will do. Despite the lack of static type checking that comes with unions in F#, it seems like this would be great in C# for the same purposes I use unions otherwise. I’ll have to bookmark this and try it out somewhere so I don’t forget.

Jon, with All Hallow’s Eve drawing near, I think it very timely of you to post a piece of code that is, quite literally, so horrifying that one can’t look away. Tonight, I may need to sleep with the lights on and the covers up over my head.

Excellent post, as always! I always thought that looking at bad code can teach you how to write good code. After today, I’m not so sure anymore (The finalizer-resurrect-hack will haunt me forever…). Still, from a learning point-of-view, this is very intresting, thanks!

The serialization hack, using GetUninitializedObject, of course also points out the other gotcha with this pattern, which is that if you serialize and then deserialize instances of your locked down class or its subclasses, you will get additional instances which are not the same ones stored in your static fields – although you are at least assured they won’t be instances of classes you didn’t define. Could be a source of subtle bugs though, if code relies on the identity-equality of an operation with Operation.Add for some purpose.

Nice article, and amazing abuse as usual :D
It seems that the VB.Net compiler team already take that in count (or it was for another reason) because when I wanted to try it in VB.Net (I often do that to see the behavioural differences between .Net languages) the .ctor cycle was detected and flagged as error
Also there are two little “typos” in the article :
– When you present the Sneaky .ctor which takes a Func you’ve kept the comment associated with the divide by 0 from the previous .ctor
– In the complete example at the end, the generic parameter of Func is missing.

Since access restrictions in programming languages are simply a tool for detecting violations of encapsulation (in the same way that type declarations are a tool) and have nothing to do with security, there is no reason to care about such abuses, even more so when it comes to IL, and foolhardy to make language changes merely to prevent such abuse (which is not to say the spec/compiler shouldn’t be changed to prevent constructor loops; allowing them is an obvious error).

Java-style enums are very useful for implementing State and Strategy Pattern, amongst other things. I’m surprised that C# doesn’t have anything quite like them by now. Sure you can simulate them, but appropriate syntactic sugar would be nice.

just a suggestion for the private abstract methods. One could get something similar with internal abstract which is allowed in the current version. This will at least mitigate the inherit from other assembly case.

i’ve just stumbled upon this patter on SO, but i can’t think of use of it, except just “type safe enum that cant be used with case”. will you give a couple of valuable examples please.
in short: i see the elegance and i like it but i can’t see much value yet

Will see if I can find some examples, but don’t have the time to search exhaustively now. Just bear it in mind as a potential pattern – and consider that unlike a regular C# enum, it can behave polymorphically.

i, looks like, asked something i wasn’t implying. i need not examples as code snippets, but just some real-world application of pattern described in a few words. for me, things for Chain of Responsibility or Adaptor etc are instantly obvious. but for mentioned pattern — nothing. almost! as Strategy and State were both mentioned here. i think these are good examples. especially State becomes so neat!

I think the most obvious application of a type safe enum is the ability to provide a custom ToString/Description for each enum value without needing to resort to attribute+reflection+extension methods. It provides a bit of encapsulation, putting the code related to the enum within the enum itself instead of externally.

Oh, and there is discussion about integrating pattern matching into switch/case that would allow arbitrary expressions. The feature isn’t well defined yet, but it may be able to be used for matching the type safe enum objects.https://github.com/dotnet/roslyn/issues/5757