I was wondering whether using (almost) empty derived classes to give additional semantics to a class hierarchy was a good idea (or practice) ?

Because a (not so short) example is always better than a long speech, let's assume we're writing a class hierarchy for SQL generation. You want to represent a data manipulation command (INSERT, UPDATE or DELETE), so you have two alternatives :

I personally don't like the approach of adding empty subclasses. I think that List<Item> and ItemList have the same semantics.
–
superMJul 17 '12 at 11:27

I suppose the second version might be "better" if there are situations where some code could expect an InsertCommand object and instead receives a DeleteCommand, you get that tested at compile time. I'm not sure it would be as easy if you had to evaluate the CommandType property.
–
FrustratedWithFormsDesignerJul 17 '12 at 14:10

Object-oriented programming does not really help you for this task. You do not need to remember the state. So, I think you should use functional style to generate SQL in this case. Better yet, use a good existing library.
–
JobAug 25 '12 at 15:09

7 Answers
7

Your second example has the benefit of no case statement, and no CommandType. This means you can a new statement type without having to modify a case statement. A case statement ought to cause an OOP programmer to pause a moment and consider whether the design should be changed to get rid of the case statement. A variable denoting a thing's type ought to cause the programmer to pause a long while.

However, the visitor pattern doesn't seem to be needed here. Simpler would be to use a class hierarchy, as you did, but then put the SQL generation inside that hierarchy rather than having it be a separate class:

The SqlCommand drives the process of creating the sql, pushing its details into the SqlBuilder. This keeps the SqlCommand and SqlBuilder decoupled, since the SqlCommands do not have to expose their entire state for the builder to use. This has the benefits of your second example, but without the complexity of the visitor pattern.

If it turns out that sometimes you need the sql built one way and sometimes another (perhaps you want to support more than one SQL back end), then you abstract the SqlBuilder class, like so:

I prefer the first one, but maybe that's just me - I like DRY, and like keeping things simple, and there's nothing simpler than a switch statement for this. (I always think that, unless you have a good reason to refactor the switch into a polymorphic class, this refactoring is mainly to show off how clever code can be).

You are writing a lot more code to achieve the same result, and that requires a lot of cut and paste - and I find c&p coding is one place that bugs are more likely to appear. I also think trying to read the code is more difficult as it skips through all those classes and methods. But ultimately, its just down to taste - they both do the same thing.

For your second question.. it depends, once you derive from a class you're responsible for checking all the plumbing works. For example, you do not derive from stl container classes because they don't have virtual destructors, and thus are not designed for inheritance. Unless you know the potential pitfalls of the class you're inheriting from, its much safer to contain the base class.

The other reason is just semantics, a CustomerList is not a List of Customers, so someone reading your code would have to think "is this the same as a List<> or have they modified the functionality?" If you do not make any modifications, then leave it as the underlying type, or use a typedef/using statement that at least just says "I've renamed this for readability only".

The downside to having more classes in the hierarchy is that when you have to debug things it's that much more complicated to figure out what's going on. The deeper your class hierarchy, and the more levels that have actual code, the harder it is to understand the code when debugging (because you always have to be mindful of where you are in the call chain at any point, and what actual object you are, etc, etc).

The real question is: Does this make your code overall more readable? If you'll only have that switch statement in one spot, then I'd say it's a win. Whereas, if you'd otherwise end up duplicating that switch statement all over the place, then the second form is probably a win.

@superM - I mention it because I've worked with code that I had a LOT of trouble with due to the depth of the class hierarchy. It's a problem even with a well-constructed hierarchy only 3 or 4 deep, but when you 8 or 10 deep full of bad ideas, well, you can imagine how that goes.
–
Michael KohneJul 17 '12 at 12:03

I know what you're talking about. I had to refactor my former colleague's code, who had left by that time. And she was really fond of such coding style.
–
superMJul 17 '12 at 12:06

Using the type system (in a statically type language) to express semantics is exactly what it is meant for. Consider a type for database IDs that you subclass for each table, say a CustomerId, an OrderId and so on. The benefit is that you never mistake a CustomerId for an OrderId. But they are clearly empty subclasses. See the Google IO talk on GWT by Ray Ryan at around 7:00 for an example.

I like second approach better since when one looks into InsertCommand, this reads as "THERE IS an insert command", which means that there should be another commands. When you read next, you see these commands and you totally understand the idea.

The first approach reads as "THERE IS a command... with some details in there...".

I wouldn't use the vistor pattern for just 3 types, it's extra complexity for no real gain, I wouldn't use it for less than a dozen types, instead using a switch statement.

But there is a third option -- a dictionary with a delegate. It is a bit of a compromise, and can be used for a small to medium sized type-list where you populate the dictionary statically in code, and can be used with a large number of types via reflection and a external list (config file, database table, etc).

Any time you encode information into the type system, you're doing it wrong (in mainstream languages like C++, java, and C#). It then requires you to use a visitor pattern, or a pile of type checks to extract that information which is fragile at best.

Further, in the mainstream languages it makes it very awkward to store/change that information. If you for example wanted to execute a command entered by the user in some SQL IDE.

In short, if you are storing information - just use a variable; that's what they're there for.

You don't normally need to extract that information. If you really need to, it feels like there are some problems with your design. Code like if(a instanceof X) method() can always be refactored to a.method() without need to get exact type of a.
–
loki2302Jul 17 '12 at 13:50

@loki2302 If you don't need to extract the information, there's no reason to subclass it. I'd love to see why people downvoted, since it's fairly clear from my perspective that it's an anti-pattern.
–
TelastynJul 17 '12 at 16:06

Your answer is pretty generic - you have also covered polymorphism since it also can be considered "encoding information to the type system".
–
loki2302Jul 17 '12 at 18:17

1

But that's the thing; polymorphism is behavior, not data. To get any sort of behavior change with an empty type you need some external code to be able to identify and switch on it. In languages without dynamic dispatch (or pattern matching in some cases) that is awkward at best.
–
TelastynJul 17 '12 at 18:36