Nick Piggin:> It's not scaling but just single threaded performance. gcc turns memcmp> into rep cmp, which has quite a long latency, so it's not appripriate> for short strings.

Honestly speaking I doubt how this 'long *' approach is effective(Of course it never means that your result (by 'char *') is doubtful).But is the "rep cmp has quite a long latency" issue generic for all x86architecture, or Westmere system specific?