When I tried to comment these lines out, the game was working almost perfectly (just some primitives were off, but it is solvable) and it proved to be noticable performance boost as almost all sprites are drawn this way.

Another thing, most of the pictures are also not rotated, so if I made optimised version of the function, that just doesn't count with rotation and used that in these cases, it might help etc.

I'm looking for tricks like this to push the performance to the limits, it can be nasty tricks, it can modify allegro code, it can be heavily customised.

If performance is critical I would suggest manipulating the array yourself since it's a 4x4 array but you only need to modify the 2x2 part for 2D, and those functions I mentioned probably multiply two 4x4 matrices together.

EDIT: for sprites not using rotation and scaling, you can simply skip all the matrix stuff and al_use_transform

I made a primitives addon replacement for held drawing that did not fiddle with transformations: https://github.com/SiegeLord/FastDraw. I found it to be faster than held drawing (3x as much on my machine). I am implementing vertex buffers for Allegro and will try them for the same purpose. In a different test, they are 1.5x faster than al_draw_prim. So, it might be the case that I'll get nearly 5x faster drawing than default Allegro's by using this approach.

One of the time consuming tasks in our render preparation is the sorting of the sprites to be drawn to have the isometric view.If I understand it correctly, the 2d is rendered in fact as 3d, couldn't we just use some trick, to set the depth of the bitmaps by some formula to avoid sorting of these? So it would be sorted by the hardware almost for free.

kazzmir:We compile allegro from source as part of the project (our allegro is already modified), so yes.

ph03nix:This seems to be good idea

SiegeLord:This looks interesting!I will probably need to study how the internals and transformation work to not ask potentially stupid question in the future: Could it be extended to support rotating of bitmaps as well?

If I understand it correctly, the 2d is rendered in fact as 3d, couldn't we just use some trick, to set the depth of the bitmaps by some formula to avoid sorting of these? So it would be sorted by the hardware almost for free.

Draw the bitmaps on different Z values with depth buffering enabled. Don't ask me how to do that though, I barely know anything about DirectX and even less about OpenGL. I do know that you'll still have to sort any transparent bitmaps yourself.

Ok, so after whole day of profiling, digging and fiddling with the code, I achieved 2.5X speed improvement of the rendering method (the call to al_draw_tinted_scaled_rotated_bitmap_region).Now, I can go up to 40k sprites while keeping 60FPS, and that is only because the sprite preparation and sorting for the render is now slowing it down the most, so after some other changes, I think it could go to 60k or more.

Some of these optimisations were very custom and result of tighter integration of our rendering method with allegro, but big part of it could be applied to allegro to make it generally faster, I believe.I might propose a patch later.

These changes were the most important

The backup could be easily removed as long as I initialised the identity transform in drawing of primitives, but the overall gain is big

in d3d drawing, it checks if the VERTICAL/HORIZONTAL flip is active, but that is already dealt with in the allegro method (and the flag is turned off), so these ifs are always off and can be removed

The blender was used on every sprite, I put it away completely because of the integration, but some simple condition that would check if it should be applied would help anyway (it takes time)

The internal quad drawing called al_get_current_transform and the color converting functions 4 times in a row, while it could just get it once and use it (it really speeds up a lot)

The internal quad function could be integrated into the d3d drawing function

Custom optimisations were mainly:I created all the needed functions with postfix "_optimised" and used some other things, like using global transform and bitmap_target objects (not using the apply_transform methods etc, it also slows thing down), I know it is ugly, but it just helps.

I diminished the method calls by using the internals of the public allegro draw method and the drawer method and used its code directly in my system draw routine, as well as connecting some methods.

All my sprites are sub-bitmaps (parts of atlases), so I could remove those ifs that check for sub-bitmaps.

I removed all the branches we never use (non-accelerated drawing, drawing from backbuffer and similar), smaller functions are better for cache hits.

* The backup could be easily removed as long as I initialised the identity transform in drawing of primitives, but the overall gain is big

I'm not sure I understand... does this work only if you don't use transformations, or will this work if the user has non-identity transformations set? If not, maybe we could detect the identity transform and do a "fast" path if its active.

Quote:

* in d3d drawing, it checks if the VERTICAL/HORIZONTAL flip is active, but that is already dealt with in the allegro method (and the flag is turned off), so these ifs are always off and can be removed * The internal quad drawing called al_get_current_transform and the color converting functions 4 times in a row, while it could just get it once and use it (it really speeds up a lot)

These two probably can be applied to the non-optimized Allegro functions, no?

Quote:

* The internal quad function could be integrated into the d3d drawing function