One thing is, that Derived class does have a VTable. That could have been guessed. As you can see, I've split Derived class specific members into a separate struct, that I placed directly inside Derived, rather than inlining members. This was done in order to explain the code in set_derived_member() method, which looks like this:

The important part here is that implementation of set_derived_member() make no assumptions about layout of Derived being passed in, with exception for VTable, which is expected to be at the start. The sub-part of Derived can be located anywhere in the object, method always looks it up via VTable. This has two implications:

negative performance impact, as instead of accessing variable via offset in object, a VTable lookup is performed

in case of multiple inheritance, duplication can be avoided for diamond-problem (as shown below)

As you see, in the final layout of DerivedMultiple there is only one Base part. The methods Derived1::foo() and Derived2::bar() have identical code, all they require is a VTable at the start of _this. Because DerivedMultiple satisfies this requirement, it can be passed in directly to either of two.
Let's analyze the code part step by step:

When doing pointer assignment compiler is not required to do anything about pointer, as all that is needed is a VTable at the start.
When accessing anything from Base a VTable-lookup is done to obtain a pointer to Base part inside object. foo() call is trivial.
The code where we use Derived2 is pretty much the same:

The base is obtained the same way as inside implementations of foo() and bar(). The important thing to note here is that making Base into a virtual class does not change that, except that pointer to VTable could be reused then.

In case of multiple inheritance and diamond hierarchy, two copies of common base class can be avoided

Internal layout of class is unpredictable, compiler can rearrange sub-objects

Virtual inheritance may not solve problem with duplicate base class; this can happen, if there is a complex mix of classes, where some classes do not use virtual inheritance, so their layout has to be preserved

2015 m. balandžio 3 d., penktadienis

This is the second part, explaining how inheritance works in C++ under the hood.
If you haven't read the first part, I recommend to have at least a quick look, as it clarifies the approach I'm taking. You can find first part here.
In this part I'll explain probably the most feared feature in C++ - multiple inheritance.

As you can see, when we call method, that comes from DerivedTwo, we don't pass in pointer to our object as first argument! Instead, we pass in pointer to the subobject part, where the DerivedTwo part is located!
But now we have another question: what if we call foo() from inside bar()? How is CommonBase resolved when we have pointer pointing somewhere inside DerivedMultiple object?

OK, so this gives us two puzzles. First, when assigning pointer to DerivedMultiple to a pointer to DerivedTwo, the pointer is automatically shifted to the subobject part! Second, and the most important, THERE IS NOTHING SPECIAL DONE TO RESOLVE CommonBase!
Yes, that right - the two calls will access the different CommonBase subobject inside DerivedMultiple!

Looks like I've been lying to you a bit, when explaining how method overrides work :)
What you see here happening is:

method address is obtained from VTable as usual

because object we have a pointer to can be involved in multiple inheritance, we can not pass pointer to it to a function - what if we have a pointer to some subobject inside, while method expects a pointer to actual object?

before pointer is passed to function, it goes through some compiler function, that looks to VTable and returns us a valid address to pass to function (not necessairy to beginning of real object)

function is used all the time when object address is passed to a virtual method, because we never know, what types are derived from given class, the tree can have a very complicated mixture of single and multiple inheritance

Hints for safe use of multiple-inheritance

try to use only single inheritance and interfaces; in C++ interface would be a class, that has nothing but statics and pure-virtual methods

the biggest problems come from classes with fields and non-virtual methods; try to achieve, that non-first base class has none

make non-primary base classes as trivial as possible (ideally interfaces), best top level classes (not derived from anything)

avoid diamond, use virtual inheritance once noticed

be very very careful

Stay tuned for part III, which will have another complicated aspect - virtual inheritance!

2015 m. kovo 26 d., ketvirtadienis

Inheritance in C++ is one of most complex forms of inheritance there is. Understanding, how it works and what hidden features are involved is useful (if not required) to not mess things up.
I'll try to explain it all in detail by examples.
Before we start, there are few things to note:

Visibility (both member and inheritance) has no effect, so everything in all examples is public

The C++ code will translated to C code to reveal, what is done automatically by compiler

The "C++ compiler" is an imaginary one, in attempt to make things simple and clear

Namespaces and name mangling are ignored for simplicity (have no impact on inheritance)

What we see different from simple inheritance is that there is something called _vtable as first member (compiler is free to place it anywhere, but it is usual to place it as first member).
Another thing that changes significantly is how methods are called. Let's take C++ code:

As you see, now things get a bit more complicated, because a pointer to VTable is prepended before parent (it could place it after it too, but I'm placing it this way, because it will make it easier to understand multiple inheritance later)! Let's see how it works!

So, as you see, when assigning to base the pointer is automatically shifted by compiler to point to parent! The pointer to base class actually points not to object, but inside it. This enables the simple method call as it would be if had an object of SimpleBase.
Calling inherited method from the real object is also different:

SimpleBase_foo(&object._parent);

in this call not the object is passed as parameter, but the subobject of relevant type.

2015 m. sausio 13 d., antradienis

There are a lot of posts about what should or shouldn't be done when writing automated tests for software. Below is my list.

Tests are written for others first, only then for yourself

There are only few cases, where tests are written to check whether the code works. In most cases writing test is not the most efficient way to check. Instead, tests are written primarily to catch regressions when unrelated changes are made. Since it's quite easy to break something you don't know about, tests are written to prevent others to break something they possibly don't even consider.
As such, claims "I don't need tests" are pretty much void if there is more than one developer on the project.

Test that hasn't failed at least once has not been proven to test anything

Should be obvious, but if the test has never been red, how do you know that it actually tests something? Maybe the test is simply always green and will not catch any bug!
Tests are just like any other code, they have to be verified. The simple way to do this is to introduce a bug in the code and run the test, to see if it fails.

When code evolves, test suite should evolve with it

The only case when code changes not necessarily cause test changes is refactoring. In all other cases if code changes, but tests don't, the new features are not covered, so the test suite degrades.
Test suite is relevant only if it is kept in sync with code it tests.

Tests should test the smallest possible feature set

This is easier said than done. Isolating different features from one another can be difficult. Testing features is only one part. Ideally it should be easy to find, what got broken when test fails. If test depends on multiple features at the same time, failure of that test shows, that one of these features is broken, but not always tells which one.
There are two solutions here:

Make tests depend only on one feature, so that test failure indicates a problem with that feature

Order tests accordingly. If test depends on 3 different features, but 2 of them have been thoroughly tested before, a failure is likely caused by the third

First one of these two options is preferred.

Testing against mocks is inadequate

It is a popular suggestion by unit-tests proponents to mock everything to achieve single feature isolation.
There is a pitfall in doing so: mocks are never a real thing! Testing against mocks only proves that code works with these mocks! There is almost no software in the world, that is 100% compliant with standards they support or even their own documentation (assuming that software has been around for a while). Yet, people for some reason think, that it is possible to keep mocks 100% identical in behavior to the real thing they imitate.
The real implementations evolve over time and mocks have to kept in sync. Sooner or later they diverge. For this reason integration and end-to-end tests are required, in order to make sure the code works with actual real implementations.

Unit testing is inadequate

Should be obvious: end users don't care, whether your tests are green or not. They care, whether software works or not. Unit tests can prove that individual components work, but they say nothing about the behavior, when those components are put together.

It's not important, whether code or test is written first

TDD zealots claim otherwise and they are wrong. While TDD can increase the coverage and quality of tests, it does not guarantee that!
What matters in the end is the quality of the code and the quality of the tests. When both are good, no one cares, in what order they were produced.

Only testing your own code is risky

in part this is another argument against TDD...
One of the reasons for doing code review is that it is hard to spot problems in your own code. The same applies to tests - when else drills your code besides yourself, the chances of catching bugs increases.

Only testing the correct behavior is inadequate

One common mistake made in automated tests is only verifying that code behaves correctly under correct conditions. For complete testing the opposite should also be tested: code should give errors under incorrect conditions.

2014 m. rugsėjo 8 d., pirmadienis

Maintaining backward compatibility is more of most important values for every software library, tool or system used by other systems via it's API. However, as system evolves, maintaining compatibility gets harder and sometimes it's not possible to improve it in a desired way, because that would mean breaking compatibility. At those points a tough decision has to be made: maintain compatibility or break it.
The list bellow is not complete by any means, but it shows few examples where I doubt that being backward compatible was the right decision. I also add what I think was the right decision and what we can learn from mistakes made.

WinMain [dead parameter in rarely used function]

In Windows API, the programs entry point is as follows:int CALLBACK WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow);
Note the second argument: it is always NULL!
The idea is, that this argument had a meaning in 16-bit Windows, but that was completely removed in 32-bit Windows. So, this parameter is in effect meaningless and is here just for backwards compatibility. While that seem to make sense at first, consider that Win16 and Win32 are not entirely compatible! Applications had to be migrated from one to another. And application has exactly one WinMain.
As a consequence, what you see here is short term backward compatibility (Win16 died quite soon after Win32 appeared) at the cost of long term API pollution. All for something as trivial as application entry point (that can be solved via preprocessor macro).

WPARAM ["hungarian compatibility"]

In Windows the signature for Window Procedure is like this:LRESULT WINAPI DefWindowProc(HWND hWnd, UINT Msg, WPARAM wParam, LPARAM lParam);
and the parameter in question is wParam, it's datatype to be exact.
The point here is that "W" stand for "WORD" in both datatype and parameter name. This was true in Win16 (WORD=2 bytes), but not anymore since Win32 (WORD datatype still is 2 bytes, but WPARAM now is 4 bytes).
There are two issues apparent here:

Hungarian notation is a bad idea and one of the reasons is in front of you (if you haven't noticed - parameter name LIES to you)

Generic datatypes (like int) are redefined as something else to make it possible to change them later. This was done in this case too, except that the name encoded the original type before and LIES to us now.

In Java:Integer.parseInt(null) // throws NumberFormatException
Double.parseDouble(null) // throws NullPointerException
This inconsistency originated from old versions of Java and is kept here for backward compatibility. This is documented behavior, so it's "a feature".
What this actually is, is called bug-compatibility. The funniest part is that both these methods can throw NumberFormatException, so a fix is quite simple and hardly will break badly something. I mean, if you handle exception properly, it will just work, otherwise you probably have a quite buggy system, one more or less doesn't make a difference...
Most importantly, these two are very old. Double.parseDouble() dates to Java 1.2, no such number on the other, but probably around the same. YOU REALLY REALLY COULD HAVE FIXED THIS BACK THEN! Instead, Sun maintained backward, sorry, bug-compatibility, just to see the bug getting harder to fix later.

Java generics vs. C# generics [focus on past, not on future]

Both languages fell into idea of object collections just to find out, that they destroyed a lot of type safety for more verbose coding (lose-lose situation, that is). How did they add generics later, without breaking languages backward compatibility?
Java went the hard way by turning existing non-generic into generic. They faced three issues:

both non-generic and generic collections should be available (so that old code still compiles)

convertibility between generic and old non-generic variants (mix old and new code)

It was easy and correct to default to Object for non-generic collections. Problems arised for generic collections that are more specific than Object. Solution was to make generic argument a syntactic sugar, only available at compile time, that is collection is still like it was before, just casting is auto-added by compiler. This was done, because old (existing) collections always accepted anything, so if an exception is introduced for non-compatible type, an existing code would be broken. Non-generic collection was made convertible to any generic collection (whatever the argument is). That in turn added two new issues:

what happens is non-generic is converted to a generic with incompatible argument?

how type-safety is controlled in new code, when under the hood is the old non-generic collection?

Java creators went the easy way in both these cases: they accepted ClassCastException for first and completely forbid direct conversion of one generic collection to other (i.e. List<Integer> can't be directly casted to List<Number>).
What went wrong here? Four issues:

a generic collection can be passed to old non-generic code which is free to insert anything, that will only explode in new code!

no generic type can still be casted to other one, even if arguments are compatible, you have to work that around via cast through non-generic collection

generic only exists at compile time, no runtime type checking exists

you can't forbid new stuff to be generic-only, it still can be used without generic arguments, where Object is assumed

What they could do instead? Make collections aware of their generic argument and throw exception, when incompatible object is inserted. That would accomplish the following:

type safety of generic collection - it will simply never contain incompatible objects

casting of references would simply work, as the protection is there at runtime

passing generic collection to old code would reveal bugs (object of wrong type inserted) or show invalid assumptions about it ("oops, it's not String-only collection")

Yes, this approach could break old code. But the alternative, that was chosen, made all new code suck. Looking forward, new code will slowly outnumber old and Java as a language will have inferior generics than it could!

C# took different approach here. It simply added generics as something completely new, not compatible with old in any way. Not ideal, as interoperability between old and new code is troublesome. But looking forward, old code will die out. So IMO it's a better approach, that that of Java

C++ compatibility with C [not quitting in time]

So, C++ is designed to be compatible with C, that is "a valid C program is a valid C++ program", as they say... Well, not really for several reasons:

the compatibility is lost with the first new keyword introduced (*caugh* class *caugh*) - what used to be a valid identifier, now is not

C++ has different linkage because of name mangling, which makes it incompatible with C. Worse, now the C libraries are forced to add extern "C" markers under __cplusplus define, to make themselves compatible with C++

enums and structs have tag names in C, but these are real type names in C++

What could they do? Well, actually they did the right thing, just for far too long. If C++ the goal was to completely replace C, it failed to to that. And it's long past the time to become an independent language and throw some old C junk away (well, you can introduce some constructs to access C from C++, we have so many of them, that few more doesn't really matter).
What breaking ties with C would achieve:

string literals can become real std::string objects with their functionality (like concatenation using "+")

arrays can be std::array by default (being assignable is the first win)

a lot of standard C library could be wrapped by C++ function that would accept C++ types (imagine printf() accepting std::string)

forget extern "C", you could just have something like #cinclude for C headers

Lessons that can be learned

A compatibility break, that is almost guaranteed to have a very small impact, is worth to do (WinMain)

If you redefine some type via typedef, make new type more generic, so you can change it (i.e. "an integer of size, which is at least X")

Hungarian notation is a bad idea, full one is ten times so

Bugs should be fixed! A fix that breaks some small importance thing will give you few rants from people, who are the ones to ignore (I can't imagine good developer complaining about fixed bug, even if it broke something in his buggy code).

New code or file format will gradually outnumber old one by large margin, thus look forwards, not backwards

If you fail to maintain full compatibility, use it as opportunity to break for better future

The number of "breaks" doesn't matter, what matter is overall pain introduced by compatibility break. So, if you broke something important, making minor things compatible wont help much.

2014 m. liepos 14 d., pirmadienis

From what I've seen so far, duplicate code is impossible to avoid in any large project. There are multiple reasons, how duplicate code is created and while it is typically assumed, that duplicate code is bad, this is not always the case.

Why duplicate code is bad

Duplicate bugs - it's obvious: if bug is discovered in code, the same bug exists everywhere the same code is used, thus there are many places to fix, instead of one

Hard to maintain - pretty much the same as previous, but more extended. In particular, you not only fix bugs, but also add features, optimizations and other improvements. Worse is that duplicate code diverges, making it harder to spot.

What "justifies" code duplication

Easier to maintain - while we claim the opposite, this one has some truth in it. By copying code written by someone else you are free to change it in any way you want. Changing common code is harder and often requires agreement across multiple involved parties. Bust: it looks so, but it makes code base larger, which in turn makes it harder to maintain.

More freedom to change - common code has to remain common, that is you can't add your specific features to it. The biggest problem with this is that it's an organizational issue: if code is duplicated to have more freedom to change it, it indicates a problem with management or company culture.

Faster to develop - everything, that requires involvement of multiple parties, takes more time to do. Bust: short term gain, you usually lose in the long run (unfortunately short term gains is what many manager only care about).

How duplicate code happens

Incompetence - it's sad, but there are a lot of bad developers. Many of the write code via copy-paste, and, as always, abusing copy-paste results in duplicates. This is what is often assumed when talking about duplicate code and yes, that is what we should fight.

Forgot to refactor - this is trickier. It's like the first one except that the developer is actually not bad. It's fine to use copy-paste in order to make things work. The problem is that you have to refactor at the end. Not forgetting to that is the hardest part... There is a gray area between this and the first one. Code review might be an answer to this one.

Too much trouble - sometimes avoiding code duplication is more trouble than worth. The place for common code might not exist! Create a library just for couple of functions? Don't forget, that this will bring entire maintenance hell for that library. Also there often is such thing as code ownership and shared code is owned by someone else. In short, we avoid code duplication to reduce problems, not to add new ones. When that is not the case, duplicating code can be acceptable.

Created naturally - it's not impossible that two developers might actually write almost identical code. In large projects with a lot of people this does happen and might take a while to find, that two guys of completely different teams wrote almost identical helper function.

So, to summarize, next time before blaming someone for incompetence, have a second thought.

2014 m. gegužės 4 d., sekmadienis

In short: exceptions are good for system and critical errors (like out of memory). The simple and more expected error is, exception is less useful and more trouble.

Error handling is hard. Not doing it properly comes back with mysterious failures, where no one can understand was went wrong. Doing it properly is pain in the ass, mostly because it takes a lot of time to do a boring lot of coding, when stuff already works! Really, most of us probably just code the happy path first, prove it and then go on handling all the possible not so happy cases. This is generally the right thing to do - what's the point of handling the errors when you're not yet sure the solution is right?

Sinking among ifs

That's the general idea for exception handling. A typical example given to students is like this:

The lines in bold are "good code". Everything else is there for error handling. It seems very nice to write all "good" function call one after another and move error handling code somewhere else - welcome try-catch!

Expected mistakes: user haven't filled required fields? Specified file name contains invalid characters? such types of errors are predictable and applications should be ready for them.

Glitches: a string "15 " (trailing space) in 99% of cases is an integer number 15, dammit.

The interesting thing here is that exactly the same error can belong to different group depending on exact situation. Failure while writing to file can mean that primary hard disk has just crashed and in few seconds entire computer will be unusable, or it can just mean that user has unplugged the USB stick. Who said that failure to open file is fatal? No config - assume hard-coded defaults.

Opening file is so difficult

...
So, we are opening a configuration file, that is not required to exist...

File *file = fopen(filename, "r");

Nice, NULL means it does not exist, otherwise it's something we can read!
What's the problem, you can write it the opposite way:

Does the same thing. Does it? Congratulations, you've just introduced full-moon bug! Files sometimes disappear, you know, get deleted. That can happen at any point in time, for example right between the existence check and opening... Fatal error, crash, or ... well, that file was never required to be there in the first place? So now code becomes:

How badly you can blow?

C once again. You call a function and you expect it to return. Is this guaranteed? No! Application might die inside, but we don't care. longjmp() can be called, but we don't care again - unless we made it ourselves.
Let's "upgrade" to C++. What can happen now? Yes, exception can be thrown, and there are types of them! Worse: new types of thrown exception can be added in the future!
It's considered a good practice to only catch exceptions you do care about and let other populate up the call stack. That's fine, but what about the new types of exception that might be added in the future? It looks like someone didn't design for future...

Exception safety

There is an amazing thing about exception safety I still can't explain. C++ is a language that with it's standard library throws something extremely rarely. A topic called "exception safety" is part of it's books. When we come to Java and co., where exceptions are thrown here, there and everywhere, this is somehow forgotten...

obj.foo(x, y);

You can only guess, how foo works with x and y, but there's one thing most seem to assume - all or nothing. If exception is thrown out of foo(), you want the state of obj unchanged! Simple concept, but not so easy to get it right.
Throw more exceptions and enjoy more full-moon bugs.

Exception specifications

This is something that pissed me off when I started learning Java. C++ has them too, but they are optional and no one seems to use them (except for standard library). Some even discourage it.
Looking at C#, they have thrown away specifications entirely.
Looking back at Java... ArrayIndexOutOfBoundsException, PersistenceException and multiple others are "unchecked" exceptions so you don't need to write them all over the place. Are the two I mentioned so "unexcpected"?

Conclusions

Exception handling works well with critical errors. Less serious the error is, less efficient exception handling is. For simple errors exceptions are more trouble.

Exceptions are designed to separate useful code from error handling code. When exception handling mechanisms appear inside of a nested code blocks, it's a first sign of exception misuse.

I also haven't mentioned, that exception are also expensive in terms of performance...