by Vasily Koudymov

In programming the PIC16 family of microcontrollers, it is sometimes necessary
to do absolutely nothing for a certain number of cycles thereby causing a
real world delay of some amount of time. This can be useful if one is programming
a clock or frequency generator for example, and it is often easier to implement
a delay loop rather than using a built in TMR timer.

This document will use the following PIC16 assembly instructions:

decfsz f,d:

Although f refers to a memory location, when referring to operations
performed upon f it is easier to say something like [ f = (f -
1) ] to refer to the idea of the value at the address which is
called f will now be equal to the original value at address f
with one subtracted from it. I only say this as any reader who is feeling
particularly anal will say that I am in error when I say, The new value
of f will be (f - 1). Note that although I will say that,
it refers to the longer idea, it just makes following the operations much
easier. Let's return to the original topic.

The instruction, decfsz performs the operation (f - 1), and
if d is substituted with the character 'W', the result (f -
1) will be placed in the Working Register. If d is substituted with
the character 'F', it will be placed back into f such that [ f
= (f - 1) ].

With regards to the cycle time of the decfsz instruction, it goes as follows
one cycle if the (f - 1) is not equal to 0 and execute the next
instruction, and one cycles if (f - 1) is equal to 0 and discard the
instruction immediately after. This is irrespective of whether or not
d is 'W' or 'F'.

This command goes to the address k in the program memory. Although in higher
level languages, the goto statement is looked down upon, in assembly
languages, it is necessary to complete most programs. What is great about
Microchip's assembler, which we will use to compile our PIC16 code, is that
it allows using relative addresses for k rather than absolute addresses.
For example, 0x1234 is an absolute address which refers to the address 0x1234
in the PIC16's program memory. Please do not confuse program memory with
RAM. Ram is where file registers are located and they can be modified at
runtime. However, program memory is the actual programming code which you
upload to the microcontroller. In all cases while the program is running,
program memory is read only. In contrast, relative addresses are as follows:
If the character '$' is substituted for k, then it would refer to
the current address of the goto instruction (an infinite loop resulting
from this code). If $+1 is substituted for k, then it
will go to the code one instruction below the goto. If k is replaced with
$-5, it will go to the code 5 instructions above the
goto. In all cases, goto takes two cycles to execute.

In the above case, movlw 0x00 is skipped over due to the
goto, and it takes 3 cycles altogether.

Lets combine the the decfsz and goto instructions to create the simplest
loop:

decfsz aa, F
goto $-1

We will refer to this as a one stage delay loop, but what would this do?
So long as the instruction decfsz does not produce a value of zero,
it will take 1 cycle to execute, and then will execute the goto statement
after it which takes 2 cycles to execute. In sum, this segment of code will
take 3 instructions each time decfsz does not produce zero. When
it produces zero, decfsz will discard the goto instruction and will
then take 2 cycles. So in short: 3 cycles if not resulting in zero, 2
cycles if resulting in zero.

Lets plug in some values for aa:

aa = 1 which yields zero, and makes the total cycles 2

aa = 2 which does not yield zero, making the first cycles 3, then it goes
again with aa = 1, which takes 2 cycles, thus making a total of 5 cycles.

aa = 3 which does not yield zero on the first pass, takes 3 cycles, then
follows the cycle established by aa = 2, thus making a total of 8 cycles.

Or, in general [3(aa - 1)] + 2 cycles for this delay loop, because eventually
there will be one value for aa which yields zero, while all of the rest do
not. As such, if we had aa = 32, 31 of the values will take 3 cycles, and
1 of the values will take 2 cycles. We can simplify this function as follows:

[3(aa - 1)] + 2
3aa - 3 + 2
3aa - 1

What about when aa is initialized to zero?

when aa = 0, it will be decreased by one first, which will yield 255, which
is a non-zero value. Effectively, initializing aa = 0, is like initializing
aa = 256. Note that 256 is not a valid value for an 8 bit number, so in order
to initialize aa = 256, one has to use aa = 0.

What are the limits of the delay loop?

Given that aa can effectively be set from aa = 1 to aa = 256(by way of
initializing aa = 0), the minimum and maximum number of cycles which can
be generated using this delay loop are found by plugging those values into
the derived equation:

min: 3(1) - 1 = 2 cycles

max:3(256) - 1 = 767 cycles

Knowing this range will become important when we want to figure out how many
variables we need in a delay loop. In order to simplify its presentation,
the minimum and maximum number of cycles for a delay loop will be written
as [2 ~767] which indicates that that using the loop with that specified
range will only allow for at least 2 cycles and at most 767 cycles.

If we want to make a delay loop that includes more cycles, we would simply
add another variable as follows:

decfsz aa,F
goto $-1
decfsz bb,F
goto $-3

Before we delve into what this specific code does, we will introduce new
notation. cyc() will be used to refer to a delay loop cyc(aa) would mean
that the delay loop only has one eight-bit variable in it (called aa), and
we refer to it as a one stage loop, its output being a number of cycles.
cyc(aa,bb) would mean that the loop involves two variables(called aa and
bb) respectively, and we refer to it as a two stage loop. This pattern continues
onward to however many variables you use. From this point onward, the following:

cyc(aa) = 3aa - 1 u [2~767]

Will be taken to mean a one stage delay loop with the formula (3aa - 1) which
has one 8 bit variable called aa within it, and which can generate between
2 and 767 cycles inclusively. Note that the range for all inputs variables
(such as aa) is actually [0~255] with 0 equating to 256.

With regards to what the above code is, it is a two stage loop. If time is
taken to analyze the code, the following pattern will emerge:

[3(aa - 1) + 2] + { [(max of one stage loop) + 3](bb - 1) + 2}

The reason for this pattern is beyond the scope of description, but to actually
derive it, it would be prudent to sit down with an empty sheet of paper,
and go through a few iterations of a two stage loop. Now to actually plug
in the values for the above derivation:

These formulas are great, so long as you copy and paste the code each time
you need a delay. However, in the real world, it's more efficient and often
much easier to use subroutines. A subroutine in assembly language is analogous
to a function in a higher level language. Instead of typing out your code
each time you need a delay, you can use subroutines. Here's how they appear
in code:

; this is an excerpt from the main code section of an assembly program
call delay ; 2 cycles, call the subroutine which we call 'delay', it's like calling a function
; the actual code for a one stage delay loop subroutine
delay:
decfsz a,F ; use the formula for next two lines
goto $-1
return ; 2 cycle return

Or, in general:

; this is an exerpt from the main code section of an assembly program
call delay ; 2 cycles, call the delay subroutines
; the declaration for the delay subroutine
delay:
<code for an N-stage loop>
return

You may notice that before, we were just using the code for an N-stage loop,
however, when we turn it into a subroutine, we add on four more cycles. This
applies to any number of stages. In order to adjust the stage delay loop
formulas for these addition cycles, we add four to both the formula and the
limits. Such that:

Although the formula is becoming more and more proper, there is still one
last step before our formula is complete. It involves the role of initialization
a loop with specific values so as to get obtain the desired number of cycles.

It is best to initialize the loop within the routine as it makes the code
less clunky and it is easier to use conditional code (such as btfss, decfsz,
and incfsz). The reason being that if the loop is initialized after the call
statement, code like this can be used:

btfss STATUS,Z ; check if the previous operation yielded zero
call delay ; delay for some amount of time

while in contrast, this would not be possible if the loop was initialized
outside of the subroutine:

As you see by these two examples, only in the first example is the delay
conditional. The second example is not equivalent to the first as the only
part which is conditional, is the clrf aa. If the previous operation does
not yield zero then do not clear aa is not equivalent to call the delay routine
if the previous operation does not yield zero.

As to initializing within subroutines there are two choices, static delay
loops and variable delay loops

From here, it should be noticed that for static delay loop subroutines, that
if the loop involves N variables, it will require 2N cycles to initialize
a variable. What this translates to is that for a one stage loop it takes
2 cycles to initialize, for a two stage loop it takes 4 cycles to initialize,
and for a three stage loop it takes 6 cycles to initialize. Similarly to
the adjustments required in turning simple delay loops into a subroutines,
we must also add to those formulas the following additional cycles as follows:

Where aak, bbk, and cck, and the values you need to initialize only once.
Afterward, whenever you call the delay subroutine, it initializes the loop
with aa = aak, bb = bbk, cc = cck, or mnemonically aa(variable) = aa(constant).
Should you desire to change the number of cycles to delay for, all that is
needed to be changed are the constant values.

With regards to the number of cycles that variable delay loops take, it is
exactly the same as static delay loops, thereby making the formulas (with
the exception of the constant values playing a role):

as aak can only be a whole integer, we assign 198 to it. We absolutely do
not round up or down, we always truncate. What we do with the fractional
part 0.3333 is multiply it by 3 to convert back to who many cycles are needed
in addition to what the loop can supply, essentially, we are finding the
remainder:

We round at the very last step to the nearest integer, thereby this remainder
will tell us how short of our desired number of cycles we are if we use the
value 198 for aak so as to obtain a cycle count of 600. Therefore, as 0.9999
rounded to the nearest integer is 1, we are short 1 cycle which indicates
that the delay loop only generates 595 cycles. In practice, after the final
division, before multiplying by three a 0.3333 excess indicates one cycle
short, while a 0.6666 indicates that it is two cycles short. To remedy this
deficiency, we recommend the following:

Note that for the second solution, embedding a one cycle null operation within
the delay loop will add one more cycle to this subroutine making the formula
change from:

cyc(aak) = 3aak + 5 u [8~773]

to this:

cyc(aak) = 3aak + 5 + 1 u [8 + 1~773 + 1]

which simplifies to:

cyc(aak) = 3aak + 6 u [9~774]

Therefore, in case you decide to embed the nop instruction within your delay
loop subroutine, make certain to modify the formula as well.

Before we continue, I will now explain the procedure for finding the remainder
on a calculator:

We want to take 50002 and divide it by 35. We require both the quotient and
the remainder and begin by dividing this expression in our calculators:

50002/35 = 1428.628571

This makes the quotient 1428, an now to find the remainder:

1428.628571 - 1428 = 0.628571

We take this and multiply it by the number we divided by:

0.628571 x 35 = 22

Therefore, the final answer is:

1428 remainder 22

or in the shorthand we will use (where ex is excess) throughout this document:

1428 ex 22

With this knowledge in mind, we can now solve for a three stage delay loop
for 600 cycles:

cyc(aak,bbk,cck) = 3aak + 770bbk + 197122cck - 197879 u [16~50463241]

The procedure should be fairly intuitive if followed. Its rules are nearly
the same as above, but remember to bring the constant (integer at the very
end) to the other side, and to start with the largest coefficient in division:

We must also discuss the topic of picking stages. Clearly with the three
formulas we have derived before, any of them will work for a 600 cycle loop,
however, in practice it is better to use the loop with fewest variables where
possible as it conserves memory. Consider that a one stage loop requires
two bytes of ram for aa and aak, a two stage loop requires four bytes of
ram for aa, aak, bb, and bbk. As you can see by the pattern a three stage
loop will require six bytes of ram, and in general an N-stage loop will require
2N bytes of ram. In addition, to initialize the constants aak, bbk, and cck,
it takes two cycles per each variable, so for a six stage loop, this means
6 cycles. This may be wasteful if your application barely fits into the
microcontroller.

Another concern when using delay loops is offset errors. You may be tempted
to use a 5000 delay loop if you need your instruction to execute every 5000
cycles, however this would create an offset error since your instruction
would take at least 1 cycle, thereby making your instruction would now execute
every 5001 cycles.

In order to avoid this, pad your time critical routine or set of instructions
so that it always takes the same amount of cycles to process, and set the
delay loop to be equal to whatever your desired value is with the number
of cycles your instructions or routine takes.

Sometimes, it is important to use delay loops to create real time delays.
What this means is that sometimes it is necessarily to calculate how many
seconds a delay loop will take. To do this, we include the following:

The PIC16 architecture is odd in that when you use a crystal oscillator
oscillating at 4.000 Mhz, the number of instructions per second with simple
instructions (such as nop, movlw, bcf, etc.) is actually 1.000 MIPS
(million instructions per second). Therefore, the number of MIPS can be found
by taking the frequency in megahertz, and dividing it by 4, and the number
of instructions per second(IPS) can be found by multiplying this answer by
1,000,000. Thus:

[ips] = [xtal frequency in Mhz] * 1,000,000 / 4

[ips] = [xtal frequency in Mhz] * 250,000

Now that we know how many instructions per second it executes, we can figure
out how many seconds a number of instructions will take by dividing the number
of cycles by the number of instructions per second.

After you find an appropriate page, you are invited to
your
to this massmind site! (posts will be visible only to you before review)
Just type in the box and press the Post button.
(HTML welcomed, but not the <A tag:
Instead, use the link box to link to another page.
A tutorial is availableMembers can
login
to post directly, become page editors, and be credited for their posts.

Link? Put it here:
if you want a response,
please enter your email address:
Attn spammers: All posts are reviewed before being made visible to anyone other than the poster.