After some more investigation, I will try to answer my own question. As suggested by Kerrek SB, I looked at the generated assembly code. The bottom line seems to be that GCC 6.2 does a much better job at unrolling the loop implicit in std::none_of compared to the other three versions.

GCC 6.2:

std::none_of is unrolled 4 times -> ~30µs

manual for, range for and iterator are not being unrolled at all -> ~45µs

As suggested by Corristo, the result is compiler dependend - which makes perfect sense. Clang 3.9 unrolls all but the range for loop, though to varying degrees.