Uusually when I have a question I can find an answer in the docs or on the forums because someone asked it before me, but now I'm kind of stumped. I've been browsing the forums looking for solutions for a while, and I'm nearly positive the error is in my code and not the system because a few months I implemented something similar that had no performance issues at all using the same version of allegro (5.0.8).

I'm drawing to the screen for a tile atlas, organized into an array of sub-bitmaps. It is extremely slow. Like, 15/16fps slow on windows and 26fps slow on Arch. This is on a machine that can run Battlefield 3 on Ultra at 60fps just to put in perspective exactly how wrong I went somewhere in my code.

x and y indicate position on the map, and tihi abbreviates "tiles high" (screen size) and tiwi abbreviates "tiles wide". fg and bg are numbers, an id for what tile to draw, and then scale is the size to draw them at.

No, using the draw region doesn't change performance in any measurable way. I will test the transformation method. Since all the tiles drawn in a given frame would be scaled to the same way, I figure I set it for the target bitmap and then unset it once everything is drawn?

It seems you are passing the block objects by value instead of reference or as a pointer, meaning you are making copies of the block objects. If you do this hundreds of times per iteration, it could cause your program to slow down (and even malfunction)

It seems you are passing the block objects by value instead of reference or as a pointer, meaning you are making copies of the block objects. If you do this hundreds of times per iteration, it could cause your program to slow down (and even malfunction)

The block struct currently only has one field, which is an unsigned 8bit integer, so I'm not sure that's the issue, but I'm running a test now.

To me it seems like biggest reason for this thing to work slow. Use pointers anyway.

Also, make sure you don't call your render functions too much. Do not assume that you can simply render entire scene at once just because it will get clipped to display area. That was already there, haven't noticed, sorry. Anyway, even something as simple as rendering 64x64 bitmaps grid @60 fps causes big time performance problems.

The bitmaps are 64x64 pixels naturally, but I want their render size to be adjustable. Is it still hard to render them even when resized because their natural resolution is 64x64? For context, it renders 30 tiles across, and 10 tiles down (although I hoped that when I figured this out, I could render a bigger space).

Rendering 300 64x64 bitmaps should not slow down to 15 fps from 60, or be slow at all. Perhaps you are somehow converting the bitmaps to memory bitmaps. Try calling al_get_bitmap_flags on your tiles to see if they are memory bitmaps

@ph03nix You got it. They are somehow becoming memory bitmaps. I don't know how though. I will update soon.

Update: The tile atlas was loaded after setting the new bitmap flags to ALLEGRO_VIDEO_BITMAP, but before creating the display. I missed something really obvious; sorry for wasting your time guys. I can now display them at their natural size, in an array of 100x200 no problem, as well as export to a 500x500 tile image in under a second.

I meant a grid of bitmaps sized 64x64, of arbitrary bitmaps, that is rendering over 4000 bitmaps at once. Even such relatively small tile grid turns out to be a huge deal. That was real surprise to me, since my computer ran modern games at "ultra" just fine, I was really expected it to much the whole thing like it's nothing. And that's with very plain loop, all there was is incrementing values and drawing "as is", with no computations going along. Now that I think of it, I probably had memory bitmaps, too. Still, if you have a lot of small tiles you're more likely to be drawing too many of them, in terms of computations per tile. For that part I really suggest you to use transformations, as you do a lot of computing within the loop only to do what transformaton does. Even if your video card render it faster than you can supply it, that'll still save you some processing.

Modern hardware relies on preload to run fast, because transferring data via outdated PCI-E interface takes way longer than processing it internally. Thus, transferring data is the bottleneck, and Allegro doesn't handle it well as of 5.0.8 branch. Maybe I was missing something, but last time I checked, instead of using VBO preloading and rendering by demand, Allegro was passing vertex data to the video card via legacy functions, and it was doing it for every single bitmap blitting operation. ---- It was also handling transformation matrices by itself rather than making the video card do it.

It handles it fine. All bitmaps are loaded into video memory as textures, so that doesn't need to be transferred when drawing, and if you set up deferred/held drawing, the geometry is batched up and sent in as big of a batch as possible to cut down on gpu driver calls, and transfers to the gpu. IIRC it uses vertex arrays to transfer it over, and not "legacy functions" (I assume you mean glBegin/glVertex/glEnd). held drawing is known to help performance significantly.

Using a VBO for regular allegro drawing routines isn't likely to work very well. VBO integration is something very specific to the app's own data and structure. I tried coming up with something that would defer all allegro drawing, including blits and primitives. It still wouldn't use a VBO, but even with that, it was incredibly hard, or just down right impossible to do right.

I think SiegeLord implemented deferred drawing for the primitives addon, and it didn't help performance at all.

It's a fact that Allegro's drawing model is suboptimal for GPUs, even with deferred drawing tricks... a more suitable API would be more complicated and probably would be even less well received than the current API . It would require persistent objects (e.g. a persistent bitmap, rectangle etc.) and a scene graph, going completely against the A4 immediate drawing model that A5 essentially replicated.

The manual transformation bit is largely a red herring, incidentally. I've implemented a "fast" drawing library that avoids the unnecessary (for tilemaps) transformation pre-application and it wasn't magically faster... the vagaries of the GPU/driver performance drowned out any gain of that optimization. E.g. on my old GPU it had no effect (or even a paradoxical slowdown), while on my new GPU it's twice as fast.

Anyway, Allegro 5.1 has some preliminary support for VBOs in the primitives addon, so in principle somebody who wants to squeeze more performance out of their GPU could use that.

It's a fact that Allegro's drawing model is suboptimal for GPUs, even with deferred drawing tricks...

Sure, but its still going to be more than fast enough for a lot of people's uses.

Speaking of scene graph type things, I started a 2d canvas lib. Probably going to be a lot faster than using a crap load of allegro primitive calls. But I never bothered to test it and I haven't worked on it in a while.

Quote:

Anyway, Allegro 5.1 has some preliminary support for VBOs in the primitives addon, so in principle somebody who wants to squeeze more performance out of their GPU could use that.

How would a VBO help over a vertex array when you have to fire all of the data at the gpu every time?

Thomas, I would rather suggest uploading vertex data on bitmap loading, and then calling render of certain VBO vertex associated with certain bitmap, since batching or not, Allegro gathers vertex list and then passes it to video card, that's legacy. Instead, gathering should only involve setting up array of preloaded bitmap-associated vertex indices to render within a texture used. I'd implemented an array that would hold booleans for rendering of all bitmaps loaded, so whole gathering is narrowed down to marking certain indices to be rendered and figuring contiguous sectors to render more vertices at once. Of course that won't do since there's more than single render of certain bitmap is possible. I didn't thought it through. So on the second thought I'd implement a bunch of functions along main rendering functions specifically to handle VBO drawing, with persistent data, etc. So rather than telling video card "render this long-ass freshly gathered vertex list with this texture" it should tell a video card "have this long-ass vertex list generated on bitmap load and remember it, because I'll be asking you to render vertices X through Y with texture Z". Calling VBO render function is way faster than transferring a bunch of triangles in terms of data transmission. Texture preloading was considered an industry standard about two decades ago, so that doesn't really count. Still, I haven't looked at the code too properly, so if it's already does precisely that, then I'm sorry for putting it like this.

Thomas, I would rather suggest uploading vertex data on bitmap loading

I'm not sure how that helps? You don't even know where you're going to be drawing it at that point, or how many times.

Quote:

What can possibly be hard with a) uploading vertices to the video card and b) calling VBO render function later on?

Lets say you have 1000 objects, that have to be rendered in a certain order, how are you going to tell the gpu to render things in the right order? We're talking 2d here, but where objects have a proper zindex.

I'm not sure how that helps? You don't even know where you're going to be drawing it at that point, or how many times.

Neither do 3d games know at what position and how many times will they draw their 3d models, but that doesn't stop anyone from using VBO. You must be don't getting the principle and idea of VBO if you have this kind of questions raised by it. Anyway, as for using VBO specifically for 2d bitmaps, here's my idea of it: when you load a bitmap (or create sub-bitmap) you upload to video card it's 4 vertices positions against it's origin (normally center) and with texture coordinates also specified, which would be 0 and 1 for "full" bitmap and somewhere between those for sub-bitmap. And there you go - you're all set to render your preloaded bitmaps with VBO. Simply call the render function to render vertices 1-4 if you want to render your bitmap 1, 5-8 for bitmap 2, etc. The right texture should be enabled, of course. So existing function that gathers as many bitmaps with same texture in a row as possible would be handy. Yet again, it may be faster to brute-force render it with setting up new texture every time it needs to be changed rather than computing queues like that, so it should be an option enabled with a flag. Or preferably, Allegro should estimate by itself whether or not target machine needs software queue or it can do it hardware just fine.

Quote:

Lets say you have 1000 objects, that have to be rendered in a certain order, how are you going to tell the gpu to render things in the right order?

First obvious thought: orthographic and depth buffer. Thoughtful idea: like normal, with depth buffer disabled, but rather than passing new vertices over and over, simply ask video card to render already uploaded vertices.

You must be don't getting the principle and idea of VBO if you have this kind of questions raised by it.

Or you completely missed my point. You upload vertex data for N bitmaps. Doesn't really do much when you're drawing N*500. The real gain comes when you can put all of your geometry in the VBO.

Quote:

First obvious thought: orthographic and depth buffer. Thoughtful idea: like normal, with depth buffer disabled, but rather than passing new vertices over and over, simply ask video card to render already uploaded vertices.

Please go implement that and let me know how it goes. Start with the Allegro 5 primitive apis (the bitmap drawing, and primitive shape drawing apis), and make sure everything is drawing in the correct order based on the order of the calls, making sure to take into account blending.

And please note, people are going to be calling these functions every frame.

Implying that building an array of N * 500 * 4 vertices every frame and passing it to video card would be anywhat more effecient. Even if not rendered faster, it saves CPU time. Yes, uploading entire drawable thing would be better, but that's not quite possible, and not used in practice: even 3d games with large open spaces split up their maps into small chunks and render them as needed rather than render entire thing at once, so your argument (rendering the whole thing as a single mesh) is not really valid anyway.

Quote:

Please go implement that and let me know how it goes.

Let me check my list. Data structures, UTF-32 internal strings (what kind of moron came up with idea to store unicode strings internally as UTF-8?), network... now I just list in the "VBO" and that'll be it for now. Also sounded to me like an excuse not to do it.

----

Note that the whole point of using VBO is to cut down data transmission rate and CPU computation overhead (also saves a tiny bit of RAM) so that video card doesn't wait while you prepare your data, as this allows model rendering be as simple as calling a single function that unfolds into small bunch (possibly just one) of low-level calls to video card.

A5 is still a 2D library. Why do we need Allegro to be a 3D library? Isn't that what Ogre is for? And as for passing vertexes, you can do that yourself with OpenGL. Allegro doesn't prevent you from doing anything as far as I know. I don't see what the complaints are all about. If you have such great ideas to optimize the allegro library, by all means do so, but stop expecting everyone else to implement your ideas as if they didn't have anything else to do with their time. Stop blabbing, and start coding.