Write a benchmark program that repeatedly calls your routine with a variety of input values that may be encountered.
The benchmark should set up an interrupt that takes place at a regular interval. Install an interrupt handler that counts the number of interrupts that take place before he benchmark completes.
If you look at the line drawing benchmark it's a good example of this.

However, if you want to maximize the speed of an assembly routine, you will probably have to count clock cycles for each instruction in key parts of the routine. Then you can compare different bits of code at the clock cycle level to see if you can improve them. Once you have an optimization in place you can verify the speed improvement with the benchmark.
The line drawing routines were optimized in this way as well.

<edit>
See the thread "Line routine - Second attempt" in the 6502 assembly coding area (this area) and download the code for it.

Set up a program that calls the routine.
Run Oriculator, hit F2 to enter the debugger, and set 2 breakpoints; one at the start and one at the end of the routine.
Hit F2 to exit the debugger.
Load and run the program. You'll end up back in the debugger at the start of the routine.
Hit F9 to reset the cycle counter.
Hit F2 to exit the debugger.
It'll return to the debugger with the cycles executed in the cycle counter.

Godzil wrote:Do you want an edit tool or could a tool that you run, like the asm and provide you this information is acceptable?

Anything is fine
The idea for instance, is to copy/past some ASM listing and know the speed of each instruction (the speed of the whole code can't be evaluated unless executed of course, varying with datas, loops and so on).
I'm beginning to think that a simple Excel sheet could do it.

Godzil wrote:Do you want an edit tool or could a tool that you run, like the asm and provide you this information is acceptable?

Anything is fine
The idea for instance, is to copy/past some ASM listing and know the speed of each instruction (the speed of the whole code can't be evaluated unless executed of course, varying with datas, loops and so on).
I'm beginning to think that a simple Excel sheet could do it.

That would be a nice addition in Oric Explorer too

I don't know any tool that do this, but I could quite easily make a tool to do this from a binary file, and I may be able to extend it from an assembly file

Would it be possible to add a cycle-count feature to even one of the simpler onboard Assemblers/Disassemblers? I just checked ASMOS and it doesn't have it, but its written in assembly and I haven't figured out how to disassemble .TAP files yet .. but maybe one of the BASIC assemblers has room for a cycle-count field?

Would it be possible to add a cycle-count feature to even one of the simpler onboard Assemblers/Disassemblers? I just checked ASMOS and it doesn't have it, but its written in assembly and I haven't figured out how to disassemble .TAP files yet .. but maybe one of the BASIC assemblers has room for a cycle-count field?

It depends if you want to give accurate or nearly accurate cycle count or just the generic value. For the later a simple 256 byte table is more than enough, but for a more precise count it could need a more deeper understanding of the code as there are a lots of special cases.

I think this is a really difficult task, if you want to have meaningful values. Not only that some instructions will take more cycles if a page boundary is crossed, or that branch instructions take different cycles if they perform the branch or not, but the fact that loops are the dominant cycle-eaters.

With very well structured (and simplistic) code, you can guess where the loops are and evaluate their impact (respect to the number of times they are performed) but that is not always the case. You will need to analyze the code and evaluate the possible paths, which is not an easy task.

I remember when people helped with the line routine for 1337 that the timing of each instruction was put as a comment just to the right of the code; something like ; 3+1 meaning it takes 3 cycles at least and 4 cycles at most. At given points a cumulative sum was performed and this was quite helpful to identify where cycles could be saved.

Maybe an editor which could add this information when requested could be handy, but I don't see an easy way of doing it.

There is one tool which was used for optimizing 6502 asm code (opt65) which I think is included in the OSDK. It may do some work counting cycles, etc. Maybe it is worth a look.