strdup versus strlen+malloc+memcpy benchmark

Why?

While doing (micro-)optimalisations to multitail I was wondering if my
wrapper around strdup (which does a strlen, malloc, check if anything valid came back from malloc,
and a memcpy) was much slower then a strdup alone.

How?

I wrote a little test program which can be found here: strdup-test.c.
For each graph I added info with what compiler I compiled things together with compilation-flags.
I tried to keep the compilation-flags as close to the one used while compiling libc altough that should not matter as the code is not much more then a few library (libc) calls.

Results?

Apart from Linux also tests on other platforms were run (out of curiosity).