My guess is yes.
The compiler is extremely unlikely to do any conversion before the write to mu unless you set some flag because it can lose precision. So you will get actual double math with the code the way it is.

A quick search suggests that the intel latency difference between a single precision and a double precision divide is between 15 and 7 cycles depending on the radix of the divider.