Where do you draw the line for inlining something (macros) vs. making it a subroutine. For example, what about a 16-bit addition (which is ~20 cycles or so). Do you accept to pay 12 cycles (jsr/rts) for it just for the sake of reducing code size or do you use a macro? Where is the cutoff?

You're forgetting about the extra code required for parameter passing if it's a subroutine. That can easily tip the scales in favor of inlining the code when it comes to short snippets like that.

Ah. To answer the more specific question, I usually only make things subroutines when they need ,x or ,y. (So making two objects collide, reading collision data.) I always take the byte hit rather than the cycle hit on 16bit math. My games do a lot of it, because they scroll in both directions.

I'll also make subroutines if it's a generic thing that's likely to make branching over it go out of range.

Thanks for all the answers. The IF_XXX macros are very interesting. I also find beq/bmi/bpl are often hard to follow.

I should have realized that such an opened question would send the discussion in many direction. Ill try to be more specific.

Where do you draw the line for inlining something (macros) vs. making it a subroutine. For example, what about a 16-bit addition (which is ~20 cycles or so). Do you accept to pay 12 cycles (jsr/rts) for it just for the sake of reducing code size or do you use a macro? Where is the cutoff?

It depends of course on the performance requirements at that point, and what straightlining would cost in terms of memory. But consider the following from my web page on macros:

As you write an assembly-language program, you may see repeating patterns. If it's exactly the same all the time, you can make it a subroutine. That incurs a 12-clock performance penalty for the subroutine call (JSR) and return (RTS), but program memory is saved because the code for the subroutine is not repeated over and over.

There will be other times however where the repeating pattern is the same but internal details are not, so you can't just use a JSR. The differences from one occurrence to another might be an operand, a string or other data, an address, a condition, etc.. It would be helpful to be able to tell the assembler, "Do this sequence here; except when you get down to this part, substitute-in such-and-such," or, "under such-and-such condition, assemble this alternate code." That's where it's time for a macro.

Thanks for all the answers. The IF_XXX macros are very interesting. I also find beq/bmi/bpl are often hard to follow.

I should have realized that such an opened question would send the discussion in many direction. Ill try to be more specific.

Mat at this point, you can not blame yourself. Trust me, there is no forum on the internet that loves a tangent more than this on. You can start with the most specific of cases, that strictly defines what you mean and it will still end up 5 tangents away. I mean you can ask about tile maps on a NES and end up talking about the pros and cons of the Megadrive using a Z80 as a sound processor (ok to be fair it has never quite gone that far, but the 'to the MD' has happened )

bleubleu wrote:

TWhere do you draw the line for inlining something (macros) vs. making it a subroutine. For example, what about a 16-bit addition (which is ~20 cycles or so). Do you accept to pay 12 cycles (jsr/rts) for it just for the sake of reducing code size or do you use a macro? Where is the cutoff?

-Mat

For when and why, you have to basically (mentally) profile the code. Do I add this value to a lot of things? No inline it. Do I add this value to a few things that are withing 256 of each other, in a lot of places? Yes Function itDo I add this value a lot of times per frame but only in 2 places? No inline it.At some point making a function pays off, I.e you add 3 bytes per call a 1byte static cost of rts and then eventually its starts to pay off depending upon how big the function is. However you might need to set up an LDX and/or LDY and/or LDA to which the cost gets higher and you need more and more to get a payoff. Something about the size of 16 add rarely pays off in the ZP case, not ZP it starts to get worth it a bit more. Your limits may also affect the cost analysis. For example in a 4K game, if it saves you 2 bytes, its worth it. If you are use compression, inline it, the Compressor will be able to instance it at a lower cost. Have you gone over a bank and want to get back into a bank?

For code size, I have not had problems with it. With various optimizations (both those that improve speed and those that improve size), and use of unofficial opcodes, etc. I also use tail call optimization, and have modified the ppMCK driver to use tail call optimization. Tail call optimizations will improve both speed and size. The level data and so on usually takes up more space than the code even though the data is compressed.

I use macros mainly to auto fill tables and deal with the banking and so on. I may also sometimes use macros for short stuff that doesn't need to be in a subroutine and that can benefit from improved speed, although sometimes the macro won't do because the optimization crosses the borders of the macro. Some macros for this purpose may take parameters, such as in a Z-machine implementation where one macro has a short (and fast, due to the mapper) sequence of instructions to read the next byte of the interpreted program, but such macro has parameters to control which registers to use and which instruction is used to load it, in order that you do not have to then move it to another register afterward, or save registers before, etc.

I think ratio of code to data depends a lot on the scope of a game. The bigger your game is, less of it will be code, proportionally. (There are some rare exceptions to this, e.g. games with a lot of procedural generation, but I think it's more or less universally true for NES.)

My answer to OP's question though is, unhelpfully and probably unsurprisingly, "as careful as I need to be." I've made a bunch of ROMs where code size didn't matter at all (i.e. my goals were drastically smaller than the available size), and several where it mattered a lot. The answer to this question absolutely depends on the situation it is applied to.

My answer to OP's question though is, unhelpfully and probably unsurprisingly, "as careful as I need to be." I've made a bunch of ROMs where code size didn't matter at all (i.e. my goals were drastically smaller than the available size), and several where it mattered a lot. The answer to this question absolutely depends on the situation it is applied to.

This hits the nail on the head. Sometimes you have to optimize for space in one part of your code, and optimize for speed (at the expense of space) in another, in the same codebase.

More than either of those, though, my default is to optimize for ease-of-development. That means optimizing for readability and getting-it-done. I don't tend to optimize for size or speed until I have to.

More than either of those, though, my default is to optimize for ease-of-development. That means optimizing for readability and getting-it-done. I don't tend to optimize for size or speed until I have to.

Yes, I would say part of effective budgeting is leaving yourself room to optimize later if you need to.

If you know what you're doing, in a lot of cases you can work faster by making the code inefficient, which at the same time gives you space that you can reclaim with more work later. If you placed your bets wisely, you won't need to do that work later anyway, but otherwise it will help tremendously when you do run out of space and have this intentional buffer of inefficient code to take up. Like most things, this takes experience to be able to do well.

The other thing is that it's very worth estimating your needs up front. Try and guess how much code you need, how much data you need, how much ROM space you have, etc. Plan this out roughly up front. You won't know all the details but try to guess. Revise your estimates periodically as you go along (or better yet, have your tools automatically generate some stats for you). It's a lot easier to deal with the space crunch if you can see it coming earlier on.

...and in your planning, leave yourself some extra space! It's easier to add a little more at the end than it is to try and scale back a project that's overbudget.

If Zelda is a 128kB game, wouldn't that be like 3 or 4 banks of code? That must be a bankswitching nightmare. Why does Zelda need so much more ROM space than Super Mario Bros?

It was ported from the FDS, and my guess is 128K was the next size up they could find for rom chips. As for overall size, the enemy mechanics are a good deal more complicated than what SMB deals with, and there's a fair amount of text. Also, it's a CHR-RAM game, so all of the tiles are stored in the PRG.

Loose memories from running it through IDA a bunch suggests that there might be even more than 25% unused space, but it's been a while since I looked at it.

The bankswitching isn't that much of a mess, but they also have a sizeable chunk of code they copy over to the SRAM area, as the 3 saves don't need the full 8KB.

Who is online

Users browsing this forum: No registered users and 6 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum