Can a i7 be faster with sin() instead of a lookuptable? , and maybe a Celeron ( which is my current game development pc with onboard graphics ) cant ?

Lookup tables are so 1990's. Think of the cache. Processors have become lightning fast since then while ram speed has not.

Also the line "i,m telling you : games are not playable with functions like sin() and cos() and sqrtf() ( i still need to get some fast sqrtf function by the way )." had me a retro-chuckling.

actually, IME, lookup tables *can* be pretty fast, provided they are all kept small enough to mostly fit in the L1 or (at least) L2 cache.

for example, a 256-entry table of 16-bit items: probably pretty fast.

OTOH, a 16k/32k/64k entry table of 32 or 64 bit items... errm... not so fast.

as for sin/cos/sqrt/...

probably not worth worrying about, unless there is good reason.

the performance issues with these, however, are not so much with the CPU as with how certain compilers handle the C library math functions.

but, in most cases, this should not matter (yes, including in the game logic and renderer).

I would not personally recommend sin or cos tables as an attempt at a "general purpose" solution, as this is unlikely to gain much (and if done naively will most likely be slower, more so if int<->float conversions and similar are involved).

for special-purpose use cases, they can make sense, but generally in the same sort of contexts where one will not typically be using floats either.

I was looking for some info specific about what is called branching, it is still not clear to me :

if i use only the "if", and not the brackets after, is it still branching ?

and what if i only use the brackets like this, without the if :

{

// code here

}

does that also count as branching ?

greetings

If you have to ask questions like this, you're not really ready to do any low-level optimizations.

Also, going branchless isn't always a win. I've worked on optimization for some platforms where I actually got speed improvements by changing from heavily-optimized branchless floating-point math into the most basic, beginner-friendly if/else code possible. The previous optimizations had turned out to be very platform specific, and on some slower, simpler processors, branching wasn't relatively as bad as caching the extra instructions and performing redundant math.

Of course I only even tried this because the code I modified had showed up in a profile as something I should look at. Now, there's usually some platform-specific thing you can do to speed up your math, but I always prefer to start from the simplest possible reference implementation, and that implementation should be kept around as a compile option. You can also use a reference implementation to test whatever faster math you create.

Let me tell like this : i have tested all this, get the time, repeat 1000 times, then get the time again.
Test showed me the simplest if was faster then functions, it was a while ago, i should test it again on my new pc maybe ?

Meaningless benchmarks will get you meaningless results.

You can't just test if statements vs. function calls and then apply those results everywhere in your code; you need to test each particular if statement against it's equivalent function, as sometimes one will be better, but in other cases that won't be true.

You also need to do your tests in release mode with optimization enabled, in which case the compiler may inline your function call or even leave code out entirely if it detects that it isn't needed or used. You need to test real code samples, not artificial things like functions vs. if.

1,000 items on screen isn't a big number, you should stop touting it like you have some crazy unusual performance needs.

VS Express 2005 is almost 10 years old, it's probably time to update. That being said, it's still smart enough to optimize many of the situations being discussed.

The reason you got downvoted is because you worrie too much about meaningless micro-optimization. Those kind of optimization might had their use in the 80's, maybe even 90's, to a much lesser extend, but are all but useless nowaday. Your game wont run slower because you choose to use an if or a math function, i can garranty you.

I have learned programming not on school, i also dont know how to use a debugger.

Using a debugger is not hard, and as i always says, it's the programmer's best friend. I couldn't do much without a debugger to be honest, all i would do it guess what's wrong, until i ragequit and punch my computer . Seriously tho, this is really something you should learn to use, fast.

And i will do for every function a test, not test just 1 function and say its faster or slower, ofcourse.

Testing functions is, to put it bluntly, pointless. The only performance testing you should be doing is on the entire application. Develop your 1000 objects running around on the screen and benchmark that. If it isn't running fast enough, profile the code to see where it's spending most of it's time. This will tell you what areas of code your application spends most of it's time in, which tells you what areas of code you should focus on optimizing to get REAL performance increases.

If an application spends 1% of it's time in a particular function, and you rewrite that function to execute 50% faster, you've gained no real increase in performance. If the application spends 50% of it's time in a particular function and you rewrite it to execute 50% faster (not a very realistic scenario), you've gained a significant increase in performance.

Seriously, it's entirely likely you're spending most of your time worrying about "optimizations" that are completely irrelevant on modern computers, even low end ones. The only way to really know what optimizations are truly worthwhile for a given application is by profiling. Without that information, you're largely shooting in the dark.

I suggest that if you're THAT interesting in LEARNING optimization, start coding in assembly.

I'm talking about implementing matrix 4x4 concatenation using assembly, a transform & lighting pipeline in assembly, a DCT (Discrete Cosine Transform).
You may or may not write something faster than a well written C code, but the learning experience is rich.

For example, when I implemented my own matrix 4x4 concatenation in assembly, I learned about subtleties of the architecture of the code and language. My function was failing when I was doing concat4x4( &result, &matrixA, &matrixA ); but worked correct when doing concat4x4( &result, &matrixA, &matrixB );

After lots of debugging, I realized my math was correct, but my code assumed that matrixA & matrixB weren't pointing to the same memory and I was overwriting the values as I wrote them. I had to clone the matrix before I made the concat; unless I could guarantee matrixA & matrixB arguments wouldn't overlap. That was far more valuable than any micro-optimization and stuff like this can really hurt your application's performance. I was learning, by myself (and by accident), the concept of the restrict keyword

Like I said, modern architectures are very complex. And your "lab" experiments of benchmarking code without context is useless (because neither branch predictors nor caches are stateless).
To picture how complex modern CPUs are, here's an example of an Intel i7 anomaly, where adding a call instruction actually made the program run faster.
One would think, adding an instruction should make the program slower. Most likely the reason behind this are loads blocked due to store forwarding. Considering you didn't know what a branch was, I won't explain what load-store forwarding is, as it is extremely low level (and I'm actually guessing, the reason behind the anomaly could be something else).

Let me tell like this : i have tested all this, get the time, repeat 1000 times, then get the time again.
Test showed me the simplest if was faster then functions, it was a while ago, i should test it again on my new pc maybe ?

My tests also show me that the sun probably moves around the earth, that doesn't mean I'm right.

Is that a problem ?, i thought questions are never dumb, i skip learning everything that is not needed to get result, if i need something i can Always ask it.

We value smart, well asked questions. But that attitude won't get you anywhere. We value MUCH more people who can solve problems by themselves and ask questions to others as a last resort, after you've exhausted all your other alternatives (trial and error, books, manuals, papers, other people's code i.e. open source, google and wikipedia).

But if you defending your own business, ofcourse you dont wanto tell the competition how to get your games optimized,

This industry wouldn't be anywhere if people hadn't share their experiences and explain to others what they did in detail. Many companies and individuals share their latest next-gen techniques on GDC and SIGGRAPH for everyone to reproduce (that includes big names such as CryTek, Unreal, Ubisoft, Naughty Dog, Eidos, Square Enix, Valve, Microsoft, Sony, AMD, NVIDIA, Intel) and that helped moved the industry forward.
You're thinking it backwards.

I suggest that if you're THAT interesting in LEARNING optimization, start coding in assembly.

I'm talking about implementing matrix 4x4 concatenation using assembly, a transform & lighting pipeline in assembly, a DCT (Discrete Cosine Transform).
You may or may not write something faster than a well written C code, but the learning experience is rich.

interestingly, there is a fast way to implement DCT, a slow way to implement DCT, and a dead slow way to implement DCT.

I remember once reading a paper which was talking about ways to efficiently implement sin and cos, then one of the cases they gave for needing a fast cos operator was for "high speed DCT transforms used in video compression...".

seeing this line was a "WTF?! FFS!! LOLZ!" moment.

basically, this statement itself made it pretty obvious that they had probably never written a video codec.

(ADD: basically, they tend to sidestep the use of cosines altogether).

nevermind the rest of the paper was just an overly long way of saying "use a lookup table".

Let me tell like this : i have tested all this, get the time, repeat 1000 times, then get the time again.
Test showed me the simplest if was faster then functions, it was a while ago, i should test it again on my new pc maybe ?

Meaningless benchmarks will get you meaningless results.

You can't just test if statements vs. function calls and then apply those results everywhere in your code; you need to test each particular if statement against it's equivalent function, as sometimes one will be better, but in other cases that won't be true.

You also need to do your tests in release mode with optimization enabled, in which case the compiler may inline your function call or even leave code out entirely if it detects that it isn't needed or used. You need to test real code samples, not artificial things like functions vs. if.

1,000 items on screen isn't a big number, you should stop touting it like you have some crazy unusual performance needs.

VS Express 2005 is almost 10 years old, it's probably time to update. That being said, it's still smart enough to optimize many of the situations being discussed.

(Posted from mobile.)

It is also important to test your benchmarks in a real world application...

Benchmarking a loop, function call, or anything similar in isolation gives you meaningless results. It doesn't tell you which is ACTUALLY faster. Just which happens to be faster in a random timing segment.

Furthermore, as far as optimization goes: It is better to optimize things at a high level over a low level the vast majority of the time. That is, you will see more significant performance gains through the change of a data structure or algorithm than you will with micro optimizations most of the time.

As a final note, doing less work often does not mean doing work FASTER. There are many, many cases where doing more work is often faster than a similar case except doing less work. This is especially true on the PS/XBox platforms. A good example of this is Battlefield 4/Frostbite culling changes from BF3. Prior to frostbite they were using a hierarchical culling system, which does less work (i.e. touches fewer objects), however by switching to a brute force culling system, that checks visibility for all objects they observed a significant speed boost. Or, in other words, an algorithmic change resulted in a roughly 3 fold increase in performance. By then applying some basic data restructuring (to localize the information necessary for culling better) along with reworking some basic math operations to eliminate branches and apply SIMD they were able to get another 35% increase in performance out of it.

Now, think of those numbers, they went from roughly 3.9ms to 1.14ms for culling, and then they reduced that by another 35% (roughly). Which gain was greater? The answer is quite obvious when you do the math (take the difference between the values) going from 3.9ms to 1.14ms is a gain of 2.76ms, while going from 1.14ms to .74ms is only .4ms. So clearly the algorithmic change won. Does that mean they shouldn't have applied SIMD and the complexity of re-ordering their data? No, not at all. They determined after profiling the initial results that they could do better and that extra half a millisecond is a significant amount of time. But it is quite clear which of the changes resulted in the greatest performance gain, and the micro-optimization of using SOA and SIMD was not it.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.ScapeCode - Blog | SlimDX

I test a few times, so its not that random, just a bit random, not every time exacts the same, but close enough to see its faster.

Because of things like the cache, using for example a lookup table is alot faster if you only test accessing the lookup table(in a loop for example) since it will most likely be cached after the first access, if you instead test real production code that does more varied tasks the lookup table can get pushed out of the cache frequently and performance will drop dramatically, you almost never get useful results by testing a single function in isolation these days.

I don't suffer from insanity, I'm enjoying every minute of it.The voices in my head may not be real, but they have some good ideas!

a good compiler will most likely inline functions such as min and should be able to insert constants for you in inlined code so your examples result in pretty much the same code, the min function might even be faster once inlined since it can be implemented without branches, (and many CPUs are far better at basic arithmetics than they are at dealing with branches)

i.e the function: int min(int a, int b) { return a*(a<b) + b*(a>=b); } if called as x = min(x,100) could probably be inlined by the compiler to x = x * (x<100) + 100*(x>=100); which then outperforms your optimization attempt on any architecture that handles arithmetics better than branching while still being free to use a branching implementation if the exact same code is compiled for an architecture that handles branching well (compared to the extra arithmetics)

There is very little to gain and very much to lose by blindly trying to optimize things by manually inlining them.

if your standard library has functions that perform badly on your target platform, provide your own functions but leave the inlining to the compiler (that way you can still replace the implementation easily if you have to port the code to a different architecture)

also, make sure you have all optimizations enabled in your build settings before you try to measure performance. (if you try to measure things with a debug build you will get very misleading results)

Edited by SimonForsman, 02 January 2014 - 10:37 AM.

I don't suffer from insanity, I'm enjoying every minute of it.The voices in my head may not be real, but they have some good ideas!

High level: Finding better algorithms so you don't even have to call those things 1000s of times.

I don't know the code, so I'm just guessing.

If you find out that you spend 50% of the time in a function that is called millions of times your first idea might be to make the function faster, while it might be even better and easier to make sure the function isn't called that often in the first place.

Can you try http://www.codersnotes.com/sleepy (Very sleepy) ? It's a simple stochastic profiler that you can use to measure your program performance. It shows you were your program is spending most of its time.

If you like you can post screenshots of the results, so maybe we can discuss what's going on there.