NM, re-reading, it is clear that it doesn't generate object code. I think it's a nifty project, and probably a good way to learn stuff, but I think if you spend effort writing optimized C++, I'd be very surprised to see this perform similarly.

Your ray tracer will need to be extremely fast before optimizing sqrt at this level will make much sense. Of course there are many cases where you can avoid sqrt altogether, and that makes sense, but I wouldn't go out of my way to avoid it. And with SSE on modern processors, the latency is about 10 ...

There has been work in both fixed-point hardware ray tracing and integer ray tracing. See e.g. Johannes Hanika's dissertation and the work by Gribble et al. (Note that just because people have done it doesn't mean it will be easy )