Yet this is not fully framerate-independent; results are slightly different at 30fps and 60fps, and more importantly, spikes in the framerate cause lots of weird artifacts, causing developers to attempt to fix the situation by clamping delta_time, which is not ideal.

The exponentiation method

Here is one way to fix it: assume that the code works correctly at 60 fps. This means that each frame, velocity is effectively multiplied by 1 - D / 60.

I have published a first article about how to build a quaternion from two arbitrary direction vectors that will transform one of the directions into the other. That article was deliberately omitting the special case when the two vectors were facing away from each other, which required special treatment. So I wrote a second article explaining how to quickly pick an orthogonal vector.

Combining the results from the two articles should be straightforward. Here is the resulting, final function:

Whichever way you look at the sphere of unit vectors, a continuous tangent field will always have a zero point:

Most of the maths operations we usually do (adding, multiplying, taking square roots, etc.) are continuous. Fortunately, we are not limited to them: we can use a simple if() construct to create a discontinuity that will save us.

A very good implementation

Here is my personal implementation of the orthogonal vector problem. It is very good. It has excellent numerical stability. It only does one test. It is short. It is beautiful. I really hope you consider using it.

Many implementations, such as Ogre3D’s, have additional tests or operations to perform this task, and use epsilons and normalised vectors to avoid singularities.

But our only test is actually enough: if |x|>|z|, it means the largest component in absolute value in v is either x or y. So we build a vector using x and y. Otherwise, the largest component in absolute value is either y or z. So we build a vector using y and z. That way, we ensure that the length of our returned vector never, ever comes near zero.

Whether some given code will cause inefficient branching is often unclear. In our case, it is very likely that the ternary operator will actually be branchless, with some help of the compiler and the hardware.

That said, how about we try to get rid of the ternary operator, just for fun?

Going branchless for fun

Let’s see. We had these two candidate vectors, w1 and w2, which worked almost always except when specific values of v caused them to be zero. And whatever the constant k we may pick, a vector of the form w1 + kw2 will eventually take the value zero for some value of v, too, because of the hairy ball theorem.

Now here comes the trick. Instead of using a constant k, we use a function f(x,y,z). This is what our w vector looks like:

From now I shall assume that v is a unit vector.

What can cause w to be zero, then?

One necessary condition is y = 0. When y ≠ 0 we can chose anything we want for f(x,y,z), it’ll always give us a good orthogonal vector. This restricts the problem to the y = 0 circle, giving us the useful equality x² + z² = 1.

The other condition is x = z*f(x,y,z). So if we manage to build a function f such that f(x,y,z) never equals x/z, we win.

Using x² + z² = 1 we can plot all the possible values for x/z as a function of x. It will show us, for a given x, the values that fcannot take:

The almost vertical slopes you see go to infinity upwards and downwards. As expected, this prevents us from using a continuous function: it would obviously cross one of these walls at some point.

I find it highly unlikely that this second version will perform better than the branching one. However, I haven’t put much thought into it and maybe someone will come up with a much better solution using the ideas presented here. Have fun!

In this article I would like to explain the thought process behind the derivation of a widely used formula: finding a quaternion representing the rotation between two 3D vectors. Nothing really new, but hopefully a few ideas could be reused at other times.

This is more or less the code used by the ​Ogre3D engine in OgreVector3.h, except they perform an additional normalisation step on the final result. This is mathematically useless, but due to numerical stability issues, it is probably safe to do so nonetheless.

This final normalisation step is actually an opportunity to simplify the code even further.

Improving on Ogre3D

We are down to two square roots and four divisions, plus quite a few mul/adds. Depending on the platform that we are running on, it is possible to simplify even further and improve performance. For instance, on many SIMD architectures, normalising a quaternion can be very fast.

This is the code we get if we multiply every component of the quaternion by 2.f * half_cos and let normalize() do the rest of the job:

Isn’t it beautiful, considering the sin(), cos() and acos() ridden mess we started with?

This algorithm can be found all over the Internet, but I do not know who first came up with it. Also, a lot of 3D engines (both publicly available and slightly more private) could benefit from it.

Update (06/01/2014)

In the comments below, Michael Norel provides the following improvement to the non-unit version. Since the values d = dot(u, v) and w = cross(u, v) are computed no matter what, the value sqlength(u) * sqlength(v) could be computed in a different way, *i.e.* d * d + sqlength(w). The following code does at least three multiplications less:

Note: this is not an optimisation. It is just one more tool you should have in your toolbox when looking for optimisations. It may be useful.

This is the trick:

You can check for yourself that it is always true: when x > y, |x - y| is the same as x - y, etc.

What good is it for? There is often an implicit comparison in min or max. It might be interesting to replace it with a call to the branchless fabs.

Example usage

Consider the following code:

float a, b, c, d;/* ... */return(a > b)&&(c > d);

That kind of code is often used eg. in collision checks, where a lot of tests can be done. This code does two comparisons. On some architectures, this means two branches. Not always something you want.

The test condition is equivalent to:

(a - b >0)&&(c - d >0)

Now when are two given numbers both positive? That is if and only if the smallest is positive:

In my previous article about the Remez exchange algorithm I said I was working on a Remez exchange toolkit for everyone to play with. Though it’s far from being full-featured, I already use it and I believe it is already extremely useful. So I decided to release LolRemez 0.1 to the masses.

The right part is the Taylor series of sin around 0. It converges very quickly to the actual value of sin(a). This allows a computer to compute the sine of a number with arbitrary precision.

And when I say it’s magic, it’s because it is! Some functions, called the ​entire functions, can be computed everywhere using one single formula! Other functions may require a different formula for different intervals; they are the ​analytic functions, a superset of the entire functions. In general, Taylor series are an extremely powerful tool to compute the value of a given function with very high accuracy, because for several common functions such as sin, tan or exp the terms of the series are easy to compute and, when implemented on a computer, can even be stored in a table at compile time.

Approximating sin with Taylor series

This is how one would approximate sin using 7 terms of its Taylor series on the [-π/2, π/2] interval. The more terms, the better the precision, but we’ll stop at 7 for now:

If you are approximating a function over an interval using its Taylor series then either you or the person who taught you is a fucking idiot because a Taylor series approximates a function near a fucking point, not over a fucking interval, and if you don’t understand why it’s important then please read on because that shit is gonna blow your mind.

Error measurement

Let’s have a look at how much error our approximation introduces. The formula for the absolute error is simple:

And this is how it looks like over our interval:

You can see that the error skyrockets near the edges of the [-π/2, π/2] interval.

“Well the usual way to fix this is to split the interval in two or more parts, and use a different Taylor series for each interval.”

Oh, really? Well, let’s see the error on [-π/4, π/4] instead:

I see no difference! The error is indeed smaller, but again, it becomes extremely large at the edges of the interval. And before you start suggesting reducing the interval even more, here is the error on [-π/8, π/8] now:

I hope this makes it clear that:

the further from the centre of the interval, the larger the error

the error distribution is very unbalanced

the maximum error on [-π/2, π/2] is about 6.63e-10

And now I am going to show you why that maximum error value is pathetic.

It doesn’t look very different, right? Right. The values a0 to a6 are slightly different, but the rest of the code is strictly the same.

Yet what a difference it makes! Look at this error curve:

That new function makes it obvious that:

the error distribution is better spread over the interval

the maximum error on [-π/2, π/2] is about 4.96e-14

Check that last figure again. The new maximum error isn’t 10% better, or maybe twice as good. It is more than ten thousand times smaller!!

The minimax polynomial

The above coefficients describe a ​minimax polynomial: that is, the polynomial that minimises a given error when approximating a given function. I will not go into the mathematical details, but just remember this: if the function is sufficiently well-suited (as sin, tan, exp etc. are), then the minimax polynomial can be found.

The problem? It’s hard to find. The most popular algorithm to find it is the ​Remez exchange algorithm, and few people really seem to understand how it works (or there would be a lot less Taylor series). I am not going to explain it right now. Usually you need professional math tools such as Maple or Mathematica if you want to compute a minimax polynomial. The ​Boost library is a notable exception, though.

But you saw the results, so stop using Taylor series. Spending some time finding the minimax polynomial is definitely worth it. This is why I am working on a Remez framework that I will make public and free for everyone to use, modify and ​do what the fuck they want. In the meantime, if you have functions to numerically approximate, or Taylor-based implementations that you would like to improve, let me know in the comments! This will be great use cases for me.