I've created an algorithm that allows atan2 to be calculated through a lookup-table.

All values for Y and X are allowed. You can control the RAM-usage/accuracy by setting the bits used in both dimensions.

I used 7 bits per dimension - that means 2^7 (128) values for each dimension: so 128*128 = 16k entries in lookup-tablethis results in an accuracy of 0,0078125 (1.0 / 128) in each dimension after the coords are normalized

great work Riven... how about you do same for other trigonometry ops? How about fast accuracy comparison, for your 7 bit example, in what digit it starts to differ (be inaccurate) from java's atan2? I don't really understand how it works yet (although I just glanced at it) so I can't understand your accuracy explaination.

I improved it. Now it uses O(N) memory instead of O(N2) and because it eliminates unnecessary assignments, multiplications, additions and 2-dimensional addressing, it's a lot faster. I increased the table size for more precision.

The theory:

I added the option to not only return results from -Pi to Pi, but any range you want.

privatestaticfinalintSIZE = 1024;privatestaticfinalfloatSTRETCH = Math.PI;// Output will swing from -STRETCH to STRETCH (default: Math.PI)// Useful to change to 1 if you would normally do "atan2(y, x) / Math.PI"

WARNING: micro-benchmarking heavily table based is very misleading as the tables will be remain cached in the benchmark..were that is unlikely under real usage patterns. Likewise walking linearly through the test range will cause branch predictors to be getting "right" virtually 100% of the time, where in real usage this will not be the case.

I just benchmarked my real application and it went from 20fps to 70fps. It's not only atan2, the 4th operation in the following sequence is atan2. Other operations include loading an image (memory intensive).

Zom-B, your version is definitely more accurate and requires half the look-up table size, but it's also slower than Riven's version (~38% slower). Also, the JavaMath timing in your results looks suspiciously high; on which JVM and what kind of CPU did you run the test on? Did you use the -server flag?

privatestaticfinalintSIZE = 1024;privatestaticfinalfloatSTRETCH = (float)Math.PI;// Output will swing from -STRETCH to STRETCH (default: Math.PI)// Useful to change to 1 if you would normally do "atan2(y, x) / Math.PI"

I forgot to re-seed the RNG between each test. So I decided to fix that, but rather than do the easy thing, I went with a big array of samples instead, so now it's 10 million iterations cycling over an array of 50,000 random floats. The results were interesting: FastMath's relative performance more than doubles past its previous benchmark.

I've also invented another, even more accurate, but more limited, microbenchmarking technique. Call the function once in the first loop and 10 times in the other, then solve for the run time of one execution. This is not applicable for atan2 because you can't call it multiple times in a row and expect the performance to be flat (and also calling random in between makes the technique useless).

The only situation where I'd use large tables for an approximation is if the target is interpreted. Data-flow is too big an issue in modern architectures. sproingie's first test is the most reasonable so far and still it shows the "fast" version in too good a light under reasonable usage. It succeeds in causing branch predictors to "stub their toes" and higher level caches needing to refill BUT lower level caches will very quickly have all the data. The second test is a step backwards. Note that if you think too deeply about this stuff it will really make your head hurt because (for instance) compiled versions under light temporal access patterns the version that needs to load the least data (including code) will probably be the winner...and you also have to note that loading data to core A can have an impact on the performance of core B that might be wanting to access the memory architecture as well.

If you need to reseed, then either the test is broken or random is. The only reason to reseed is to keep people from complaining about the test not being "fair"....and a couple lines of code makes it worth not needing to explain some statistic.

Quote

So I decided to fix that, but rather than do the easy thing, I went with a big array of samples instead, so now it's 10 million iterations cycling over an array of 50,000 random floats. The results were interesting: FastMath's relative performance more than doubles past its previous benchmark.

So what your doing is causing the hardware version to be drastically slowed down by introducing memory stalls (that code chunk has virtually nothing to do to hide the stall). The impact on the "fast" version will be less marked: out-of-order execution, hiding of stalls, etc. This seems to me to be a highly unlikely situation: input from linear arrays of random data. Even if the input did looks somewhat like this, it seems like there would be other work happening at the same time (again same points: out-of-order, hiding stalls, etc.)

Yes dead code is useless, but don't forget legal transforms as well. So this bias computation could be off:

The reason I would need to re-seed is to get the same random data so that the "sum" is meaningful beyond avoiding the loop being optimized out, i.e. to compare the accuracy of the two methods. Otherwise, I'd hope that 10 million randoms are distributed well enough.

As for the second, transforming linear arrays of numeric data is the bread and butter of visual algorithms. If you're computing thousands of normals, it's going to have pretty similar access patterns (different trig functions of course). It's rarely going to be as tight as a microbenchmark, but no one ever called microbenchmarks the last word.

As for StrictMath, it's being used to initialize the table to give it predictable behavior across platforms, that's all. Seems the most appropriate place to use it, where you're only paying the cost of using it once up front.

.... so that the "sum" is meaningful beyond avoiding the loop being optimized out, i.e. to compare the accuracy of the two methods.

But the sum is useless at testing accuracy. The only thing it can tell you if it's really your approximation is really broken and nothing else.

Quote

As for the second, transforming linear arrays of numeric data is the bread and butter of visual algorithms. If you're computing thousands of normals, it's going to have pretty similar access patterns (different trig functions of course).

In which case there is virtually always other exploitable information.

Quote

As for StrictMath, it's being used to initialize the table to give it predictable behavior across platforms, that's all.

Bit exact results for floats is crazy and was never the intent. Remember in this case that we're talking about singles, so the results well be bit exact regardless.

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org