In 6502 Assembly, I am looking for a routine which will multiply an 8 bit number (0-127 only in this case for my tiling system) by 16, leaving a 16 bit number. (The missing bit of data is used for other purposes).

Although my low to moderate 6502 skills could probably write this routine, they probably won't give me an efficient version of it.

If the original 8 bit number is in location 128 and the results are stored in 129 and 130, does anyone know a good way to do this?

The multiply by 16 can be hard-coded, it doesn't have to be a general 16 bit mathematics function.

Could any suggestions please be in standard 6502, not Mads or any other compiler specific notation?

An alternate faster method using shifts both ways. Depdending on what value you're multiplying by you might save cycles by using table-lookup. But shifting 4 times is 8 cycles, no saving by using table-lookup in this case.

Multiply value in A by 16 (cumulative cycle count included). Y register used as temp storage:

This is an art form, and it's why I don't buy the argument that a "compiler / assembler is good enough"

For 8 and 16 bit processors I think you're right. A dev who has spent years counting cycles and collecting efficient algos is going to get way tighter than a compiler. Besides which, the Motorola and Motorola inspired products have been designed with human low-level coders in mind. Modern 32 and 64 bit procs are designed to be targeted by compilers. By the time a coder can consistently outdo a compiler enough to be worth it, the next generation of products with different constraints and optimizations is out. Besides which the scale of the code run these days isn't too practical to code everything in assembly. At best, one can profile the code and surgically shave cycles with bits of assembly where it has the highest payoff. It isn't much use coding a menu in assembly. Recoding the portion of a renderer 60% of the processer time is spent in is another matter altogether. The art there is knowing where meticulous assembly is best applied.

This is an art form, and it's why I don't buy the argument that a "compiler / assembler is good enough"

For 8 and 16 bit processors I think you're right. A dev who has spent years counting cycles and collecting efficient algos is going to get way tighter than a compiler. Besides which, the Motorola and Motorola inspired products have been designed with human low-level coders in mind. Modern 32 and 64 bit procs are designed to be targeted by compilers. By the time a coder can consistently outdo a compiler enough to be worth it, the next generation of products with different constraints and optimizations is out. Besides which the scale of the code run these days isn't too practical to code everything in assembly. At best, one can profile the code and surgically shave cycles with bits of assembly where it has the highest payoff. It isn't much use coding a menu in assembly. Recoding the portion of a renderer 60% of the processer time is spent in is another matter altogether. The art there is knowing where meticulous assembly is best applied.

Hand-done assembly is still applicable to modern CPUs, but you have to recognize where the speed comes from in such a case: global allocation of registers over the entire program. If you figure out what an app needs to do and reserve registers to the task that are maintained across the entire project, you can double the speed compared to any compiler. It's one thing compilers still suck at - global optimization. Anything else other than a tight-loop calculation can be left to a compiler. When I make an assembly app, the first thing I do is make a list of all the registers and what I expect them to hold at different points in the program. That not only allows you to maintain global registers, but you can avoid saving/restoring registers where it's not needed (at some points in the program, some registers which are normally saved according to the ABI may be safely treated as volatile if you know what all the registers are being used for).

If, as you say, bit 7 of the value in 128 is for other use and should not be part of the calculation, you do not even need to mask it out before using it as an index. Just ignore it when calculating the tables. If you do mask it out before you use it as an index (lda 128, and #127, tax) you can reduce the tables to 128 bytes each.

If, as you say, bit 7 of the value in 128 is for other use and should not be part of the calculation, you do not even need to mask it out before using it as an index. Just ignore it when calculating the tables. If you do mask it out before you use it as an index (lda 128, and #127, tax) you can reduce the tables to 128 bytes each.

Yes, bit 7 isn't in use at this moment and by the time that it gets to this stage, it won't be set at all. You've all given some great suggestions and to be honest, I'm finding it difficult to choose one particular method. Memory is tight, yes, but I can find 256 bytes. Speed isn't a massive issue as I've used Rybags' method so far and even with Altirra on 1% speed, it's still quick. (4x4 tiles, 6 tiles across, 4 down and a status bar for 4 rows). I wouldn't mind using the illegal opcode method, but I want to make my code as portable as possible. Do all of the (standard released) A8 machines support these illegal opcodes? I'm not too bothered if it doesn't work on somebody's hacked together Atari with non-standard architecture.

I think some of the mnemonics are different in MADS. They're listed in the manual, but I had to put at least one opcode in using a .byte statement when I was experimenting with them, so perhaps a couple are missing.

I think it is a really bad idea to code anything with illegal OP codes. It won't be long before 65816s will be much more common than they are now and you are just killing yourself in that part of the market. If you need speed, the 65816 is your friend. Don't isolate yourself in the 6502C.

By the way: if you are doing things that wrap $0000, like LDA $FF30,X, consider that a 65816 with linear memory will access $100xx when you wrap, not $0FFxx like the 6502 does.