Failure and Run-Time Error Recovery

This chapter describes a variety of useful hardware
features of the 68HC11F1:

The processor’s external hardware interrupt /IRQ, may be used by external devices to request immediate service.

Three nonmaskable interrupts cause a hardware reset: the external reset, the COP, and the clock monitor. The main reset is
activated on power-up or when the /RESET pin is pulled low for more than 4
machine cycle. Enabling the computer operating properly circuit, COP, sets up
a watchdog timer that resets the processor unless a special register is periodically
updated. This provides a means of recovering from crashes in an embedded
application. Use of the COP feature requires installation of an autostart
routine which services the COP. The clock monitor backs up the COP by
resetting the machine if the system clock fails.

STOP and WAI instructions are available to put the CPU in low power modes with different degrees of power savings

Finally, an on-board jumper allows selection of the standard operating mode or the special cleanup mode.

External Hardware Resets

The main reset interrupt of the 68HC11 processor is
activated upon power-up or when the active-low /RESET signal is pulled low.
The processor does not distinguish between a power-on reset and a reset caused
by a low level on the /RESET input pin; both result in the same hardware
initialization and software restart sequence.

The /RESET line is normally held high by a pull-up
resistor. You can pull the /RESET line low by pushing on the reset switch.
Moreover, any peripheral device can reset the processor by driving the /RESET
signal low for at least 2 microseconds using an open-collector output.

The active-low /RESET signal is controlled by the power
monitor circuitry. On power-up, the monitor asserts the reset signal until the
positive supply has stabilized above 4.5 Volts.

Internal Resets

The 68HC11 resets itself when a failure condition is
detected by either the computer-operating-properly (COP) or the clock monitor
circuit. When either of these failure conditions occur, the processor drives
the /RESET line low for less than 4 machine cycles to reset itself and any
peripherals that are connected to the /RESET line. The processor then
determines which failure (COP or clock monitor) caused the reset, and branches
to the associated service routine. QED-Forth initializes the interrupt vectors
for the COP and clock monitor to perform the standard restart sequence, and the
programmer may change the vectors if desired (see the “Special Reset-Type Interrupts” section in the
“Interrupts and Register Initializations” chapter of the QED Software Manual).
The operation of the COP and clock monitor are described in the following
sections.

Crashes

A computer “crashes” when it executes a set of
instructions that it is not supposed to. This can cause the processor to write
over memory locations that are not write-protected. The processor may get into
an infinite loop of legal instructions (in which case it will not respond to
your commands), or it may eventually execute an “illegal opcode”. Illegal
instructions are detected by the processor’s illegal opcode trap and result in
a restart , in which case you will see the QED-Forth startup message on your
terminal, or execution of the autostart program, if present.

The best response to a crash during program development is
to push the reset button. This initializes all of the registers and performs a
restart. In most cases a “warm restart” will be performed, which should allow
you to continue programming with access to all of the words that you have
defined. In other cases, the state of the user area or the dictionary may be
corrupted. If QED-Forth detects the corruption, it will automatically execute
a “cold restart”; otherwise you may execute COLD which performs the restart.
The cold restart re-initializes all of the user variables that control
QED-Forth’s operation.

To clarify the discussion of crashes, some terms must be
defined. A “reset” is an initialization process invoked by the hardware of the
68HC11, while a “restart” is an initialization process controlled by software.

A reset can be caused by any of four events:

power is applied to the processor

the reset button is pushed

the clock monitor detects a clock failure

the computer operating properly (COP) circuit detects a
failure

The hardware of the 68HC11 is configured by a set of
registers that reside at locations 8000H through 805FH. (These hardware
registers should not be confused with the programming registers D, X, Y, etc.)
The reset initializes essentially all of the registers, and then initiates an
interrupt response sequence. The interrupt calls a specified response program
whose address is stored in an interrupt vector near the top of memory. The
power-on and reset-button resets share the same interrupt vector at FFFE. The
clock monitor and COP resets are re-vectored to addresses in EEPROM where the
programmer can install customized service routines, if desired. All of these
service routines are initialized to perform the default restart routine.

A “restart” is an initialization process performed by
software. After a (hardware-invoked) reset, the 68HC11 calls a restart routine
which re-initializes some of the registers to accommodate QED-Forth, and
initializes other memory locations including all or part of the user area. A
restart can also be invoked solely via software, by executing the kernel words
COLD or WARM. When the illegal opcode trap detects an illegal instruction, it
calls a restart routine, but does not perform a hardware reset. Note that a
reset always results in a restart, but that a restart can be performed without
a reset.

COLD is the most comprehensive software-invoked
initialization command. Executing COLD after a crash usually puts the machine
into a well-known state by completely initializing the user area which controls
QED-Forth’s operation. But COLD does not initialize all of the registers.
Therefore, in crashes where the contents of key hardware registers are
corrupted, it may be necessary to perform a hardware reset by pushing the reset
button or powering the machine off and on again.

There are two types of restart: cold and warm. A cold
restart initializes all of the parameters used by the QED-Forth system. These
parameters are stored in the “user area”, which is a 256-byte block of memory
in the common RAM. All of the memory management pointers, format variables to
control numeric conversion, quantities that enable the compilation of local
variables, and many other system values are stored in the user area. COLD
initializes these to default values. COLD also initializes several vital
interrupt vectors so that they will perform the startup sequence if they are
invoked. These vital interrupts --clock monitor, computer operating properly,
and illegal opcode trap-- were discussed in the last chapter.

A warm restart, on the other hand, assumes that most of
the user variables have already been properly initialized. A warm restart
initializes only a few of these parameters, including stack pointers (it clears
the stacks) and some multitasking variables (it makes sure that a single task
is running and that it has control of the serial port).

A warm restart preserves the prior number base (whatever
you had set it to before the restart occurred) while a cold restart always sets
the base to decimal. A warm restart preserves the user’s memory map and
QED-Forth’s ability to find user defined words, while a cold restart sets a
default memory map and forgets all words except those in the original kernel.

The default restart program decides whether to perform a
cold or a warm restart by checking a location in the user area to see if a
specified pattern (1357H) is stored there. If the correct pattern is present,
the restart program assumes that the user area is already properly initialized,
so it performs a warm restart. If the location does not contain the proper
value, the restart program assumes that some event (perhaps a crash) has
corrupted the user area, so a cold restart is executed to force the system to a
known state.

Because the Handheld’s common RAM is battery backed
(except for the 1K of RAM at B000H-B3FFH on the 68HC11 itself), the user area
(including the location where the startup pattern is stored) maintains its
contents even when it is powered down. Thus a warm restart will be performed
most of the time when you turn on the Handheld. This is convenient: it means
that access to the words you defined, your memory map, and the contents of the
user area are not altered by removal of power. It also means that pushing the
restart button and powering the machine off and on again have similar effects,
except that powering the machine off loses the contents of the 1K of RAM on the
68HC11 at addresses B000H-B3FFH.

If a crash over-writes the user area, the next restart
will be a cold restart. QED-Forth signals a cold startup by printing a
COLDSTART statement before the QED-Forth V4.4x startup message is printed. If
the crash did not corrupt the startup pattern in the user area, a warm restart
would be performed, and you could continue debugging. In most cases, all of
the words that you defined would still be accessible. If the machine is
behaving in an unpredictable manner, however, it may be necessary to reset the
machine and perform a cold restart to establish a known initialized state.

Recovery Tricks

Some crashes may be difficult (but not impossible!) to
recover from. For example, if the name area of the dictionary is corrupted,
QED-Forth may not be able to find even the most basic commands in the
dictionary. If every command you give is met with the ? error message, try
executing COLD. The FIND word in the interpreter is programmed to always
recognize the word COLD, even if the dictionary is corrupted.

If All Else Fails, Use the Special Cleanup Mode

These recovery techniques may not work if you have a buggy
autostart word or a major crash. If typing COLD or pressing the reset button
does not greet you with the standard “QED-Forth V4.4x” prompt, you may need to
use the special cleanup mode to restore your system to a proper state. This
involves installing Jumper J1 and then pressing the reset button. The special
cleanup procedure places the Handheld in the same state it was in when it was
shipped from the factory.

In many embedded control applications, it is important
that processor crashes be detected quickly so that the system can rapidly be
returned to a proper operating condition. The Computer Operating Properly
subsystem, also known as a “watchdog timer” or “COP”, provides this
capability. It gives the programmer a way to force a processor reset if an
application program crashes or gets lost. When enabled, the COP resets the
processor if the application program fails to periodically update a specified
register within a predetermined time-out period. The COP time-out period is
programmable to any of four values between 8 msec to 0.5 seconds.

To use the COP, design and debug an application program
that, in addition to performing all of its normal tasks, periodically writes a
2-byte pattern to the COP reset (COPRST) register as described below. The
specified pattern must be written before the COP “times out”. Then install the
application as an autostart routine using the QED-Forth word AUTOSTART or
PRIORITY.AUTOSTART, and enable the COP.

If the application program ever allows the time-out period
to be exceeded without writing the specified pattern, the COP resets the
processor. Presumably the pattern will not be properly written if the
processor crashes for any reason, so the COP provides a way of automatically
resetting the processor to recover from crashes. Then, because the application
program has been installed as an autostart routine, the application is
automatically restarted when the COP forces a reset.

Be Careful with the COP

Before enabling the COP, make sure that a debugged
application program that properly updates the COPRST register has been
installed as an Autostart() AUTOSTART or PriorityAutostart()
PRIORITY.AUTOSTART routine.
If the startup program is improperly designed so that it is unable to service
the COP on time, the COP will reset the machine, thereby invoking the startup
program again, and leading to an infinite series of COP resets.

If you find yourself in this situation you can return the
Handheld to its “pristine” state by entering the special clean-up mode: install
Jumper J1 and then press the reset button to resume normal operation with the
COP disabled and any autostart routine removed.

The COP feature should prove trouble-free as long as the
application program is:

fully debugged;

capable of updating the COPRST in a timely fashion;
and,

installed as an autostart routine.

Configuring the COP

Three bits are used to configure and enable/disable the
COP. They are named CR0, CR1, and NOCOP. CR0 and CR1 are located in the
OPTION register. These bits determine the amount of time which can elapse
between updates of the COPRST register by the application program. If the
time-out period is exceeded, the COP forces a reset. The four available
time-out periods are:

Table 7‑1 COP Time-out Period

CR1

CR0

Time-out Period

0

0

8.192 ms

0

1

32.768 ms

1

0

131.07 ms

1

1

524.5 ms

The CR1 and CR0 bits in the OPTION register may be
modified only during the first 64 cycles after a reset. The function
InstallRegisterInits() makes it easy to specify a value that will be
automatically stored into the OPTION register after every reset; consult its
glossary entry for details.

The third control bit is called NOCOP and is located in
the CONFIG register. The Handheld is shipped with this bit set so that the COP
is disabled. To enable the COP, clear this bit. The CONFIG register’s
contents are non-volatile, and so are maintained even after the processor has
been powered down.

Servicing the COP

Servicing the COP is accomplished by writing 55H and AAH
to the COPRST register. Although the order of the writes is important, the
number of intermediate instructions between them is inconsequential. The two
writes must be performed before the time-out period has elapsed. Once AAH has
been stored, the COP will need to be serviced again before the next time-out period
has elapsed.

Now select a
time-out period within which you can guarantee timely updating of the COPRST register,
and code a properly working autostart word that periodically calls the
COP.SERVICE routine to avoid a COP time-out.

You are now ready
to configure and enable the COP. To set the time-out period, the configuration
bits CR1 and CR0 in the OPTION register must be set using
INSTALL.REGISTER.INITS. To enable the COP, the NOCOP bit in the CONFIG
register must be cleared and, as mentioned above, this requires a special
procedure. Briefly, the CONFIG register’s contents are so important that they
cannot be modified unless a special bit called PTCON (protect config) in the
BPROT (block protection) register is cleared. To clear the protection bit,
INSTALL.REGISTER.INITS must be used. Once this protection bit is cleared, the
CONFIG register may be modified using the (EEC!) command; this is because the
register is implemented as a non-volatile EEPROM byte. Once enabled, the COP
will be active on all subsequent power-up restarts until the NOCOP bit in the
CONFIG register is explicitly set.

After the COP is
enabled, the next reset activates it. You must ensure that a proper AUTOSTART
or PRIORITY.AUTOSTART routine has been installed before the COP becomes
active. The following 6 steps show how to set up and activate the COP.

0. Define
some useful register and time-out constants, and calculate an appropriate value
for the CR0 and CR1 bits in the OPTION register to set up the desired time-out
period. This can be accomplished using the following words:

HEX

\ define time-out
constants assuming an 8 MHz crystal frequency:

0 CONSTANT 16.384MS

1 CONSTANT 65.536MS

2 CONSTANT 262.14MS

3 CONSTANT 1.049SEC

\ Now define all needed
register names.

803F REGISTER:
CONFIG \ CONFIG contains bit that enables the COP

8039 REGISTER:
OPTION \ OPTION contains CR0 and CR1 time-out bits

8035 REGISTER:
BPROT \ BPROT register holds CONFIG protection bit

\ TMSK2 and BAUD
contents are needed by INSTALL.REGISTER.INITS

8024 REGISTER:
TMSK2 \ define name for TMSK2 register

802B REGISTER:
BAUD \ define name for BAUD register

\ define a word to
calculate the desired contents of the OPTION register

: OPTION.CONTENTS (
time.out.constant -- option.register.contents )

OPTION C@ FC AND \
clear CR1 and CR0 bits

OR \ set
CR1 and CR0 as specified

;

For example, to
calculate an appropriate OPTION register value for implementing a 1.049 second
time-out period, execute the following:

1.049SEC
OPTION.CONTENTS

0. Use
INSTALL.REGISTER.INITS to install the proper values for the OPTION and BPROT
registers. We have just calculated the desired contents of the OPTION
register. Bit 4 of the BPROT register is the PTCON (protect CONFIG) bit; it
must be cleared so that we can write to the CONFIG register to enable the COP.
Both OPTION and BPROT must be initialized during the first 64 machine cycles
after each reset, and the QED-Forth word INSTALL.REGISTER.INITS can accomplish
this. It stores the desired values for the 4 special registers OPTION, TMSK2
(lowest 2 bits only), BPROT, and BAUD in EEPROM, and automatically installs the
specified values in the registers after each reset.

The following
command sequence installs the desired contents of the registers to establish a
1.049 second time-out period for the COP (assuming an 8 MHz crystal frequency),
and allows writes to the CONFIG register:

1.049SEC
OPTION.CONTENTS \ put OPTION contents on data stack

TMSK2
C@ \ specify current values of PR0 & PR1

BPROT C@ 0F
AND \ preserve bits 0-3, clear PTCON bit

BAUD
C@ \ specify current baud rate

INSTALL.REGISTER.INITS
\ install register initialization values

Push the reset
button to put the new register values into effect.

0. Install
your application program (a QED-Forth word) as an AUTOSTART routine in EEPROM
or as a PRIORITY.AUTOSTART routine in page 4 memory (see the “Autostarting”
section of the “Program Development Techniques” chapter in the QED Software
Manual). The application program must periodically execute COP.SERVICE to update
the COPRST register before the time-out period has elapsed. Execute:

CFA.FOR <name
of your application program> AUTOSTART

or

CFA.FOR <name
of your application program> PRIORITY.AUTOSTART

Your autostart
word will now be executed after every power-on, reset, or abort. This ensures
the COP will always be properly serviced.

0. Enable the
COP by clearing the NOCOP bit of the CONFIG register. This is accomplished by
executing:

CONFIG C@ FB
AND CONFIG DROP (EEC!)

which clears
bit 2 (NOCOP) while preserving the other bits in the CONFIG register. Note
that (EEC!) is used because the CONFIG register is implemented as an EEPROM
byte in the hardware register area.

0. Before
resetting the machine to activate the COP, it is advisable to again
write-protect the CONFIG register. This is accomplished by repeating the
INSTALL.REGISTER.INITS as described in step 2 with the exception that the PTCON
(protect CONFIG) bit in the BPROT register is set instead of cleared:

OPTION C@ \ keep
current value of OPTION

TMSK2 C@ \ keep
current values of PR0 & PR1 in TMSK2

BPROT C@ 10 OR \ set
PTCON bit in BPROT

BAUD C@ \ keep
current value of BAUD

INSTALL.REGISTER.INITS

0. Now reset
your machine. The COP will be enabled and your autostart routine will be
executed automatically.

Although the COP
subsystem requires special care during installation and implementation, it provides
an ability to recover from crashes that is necessary for many applications.
Another feature that helps to ensure proper operation of the processor is the
clock monitor.

*************

A COP (Computer
Operating Properly) service routine has been defined and is pre-installed upon
every COLD restart if INIT.VITAL.IRQS.ON.COLD has been executed. When enabled,
the COP feature resets the processor if the application program fails to update
the COP register within a specified time; this provides a means of
"bullet-proofing" application programs. If the programmer has
enabled the COP and if a COP timeout occurs, the pre-defined COP service
routine stores the hexadecimal value 1357 at address 83FA before performing the
standard startup sequence. The application program can check this flag after
every startup to detect if a COP timeout has occurred. It is the application
program’s responsibility for resetting this flag to zero.

The Clock Monitor
(from QED Hardware)

Listing 7‑0 Using
the COP Watchdog Timer

\ Copyright 2002 Mosaic
Industries, Inc.

\ This code
demonstrates a simple method to set up the COP (watchdog timer)

\ in a production
version of the QED Board.

\ It also provides a
method of disabling the COP.

\ Note that a factory
cleanup operation can be used to quickly disable the COP.

\ THREE CONSECUTIVE
HARDWARE RESETS are required to install and secure the COP

\ after the
production PROM is first installed into the board.

\ Note: you can
simultaneously “lock down” (make unwritable) some or all

\ of the contents
of EEPROM by changing the value written to BPROT

\ during the
UNLOCK.CONFIG operation.

\ Note: it’s best to
leave the COP response vector initialized to

\ point to the
standard reset routine as it is by default;

\ see
INIT.VITAL.IRQS.ON.COLD. Otherwise the HC11’s 64cycle timing

\ restrictions for
special registers may not be met.

\ COP performs a
hardware init of all processor registers.

\ The information
provided here is believed to be reliable;

\ however, Mosaic
Industries assumes no responsibility for inaccuracies or omissions,

\ Note that we must
still service the COP during the disabling process

COP.ENABLED? \ only
uninstall if cop is enabled

IF RTI/COP.SERVICE

ENABLE.REAL.TIME.INTERRUPT \ this is the COP service routine

ENABLE.INTERRUPTS \ enable RTI: service COP until it’s disabled

UNINSTALL.COP
\ may take full effect upon the 3rd reset

ENDIF

;

: COP.ON ( -- )

CFA.FOR TOP.COP.WORD
PRIORITY.AUTOSTART \ use this to install COP

;

: COP.OFF ( -- )

CFA.FOR
TOP.NOCOP.WORD PRIORITY.AUTOSTART \ use this to uninstall COP

;

4 PAGE.TO.FLASH

STANDARD.MAP

SAVE

The Clock Monitor

The clock monitor provides a second level of security by
monitoring the main system clock and resetting the processor if the clock
signal disappears or oscillates too slowly. The clock monitor does not
initiate a reset as long as the E-clock frequency is greater than 200 kHz (the
E-clock frequency is one quarter the frequency of the on-board crystal). A
reset is always triggered at E-clock frequencies below 10 kHz, and may be
triggered at frequencies as high as 200 kHz.

The clock monitor is primarily used as a backup for the
COP. The COP relies on the clock’s presence for reliable operation, and the
clock monitor can ensure that the processor is safely reset if the clock
fails.

Enabling the clock monitor is accomplished by setting the
CME (clock monitor enable) bit in the OPTION register. This bit may be set or
reset at any time. A second bit named FCME (force clock monitor enable) is
also involved. When the FCME bit is in its default state of 0, the bit has no
effect, and when FCME is set, the clock monitor feature cannot be disabled
until a reset occurs. We will assume that FCME is 0, and that the CME bit
controls the clock monitor. See MC68HC11F1 Technical Data Manual, p.5-3 for
further details. Note also that if the clock monitor is enabled, a STOP
assembly instruction will trigger a reset because it stops the clock, as
discussed in the “Low Power Modes” section below.

Low Power Modes

The 68HC11F1 has two low power modes. These modes are
enabled by assembly instructions STOP and WAI (wait). The STOP command puts
the CPU into its lowest power-consumption mode by stopping all clocks, thereby
stopping all processing (MC68HC11F1 Technical Data Manual, p.5-17). If the
clock monitor is enabled, a reset will be triggered when the clocks stop due to
a STOP instruction. To use a STOP instruction when the clock monitor reset is
enabled, disable the monitor before the STOP instruction, and re-enable it
after returning from the STOP.

Pulling either /RESET or /IRQ low wakes the processor up
after a STOP instruction. Pulling the reset line low awakens the CPU and
performs the standard reset startup sequence. For the CPU to be awakened by
the /IRQ line going low, the I bit in the CCR register must be clear so that
interrupts are globally enabled. When /IRQ goes low and the I bit is clear,
execution begins with the /IRQ handler and then executes the code following the
STOP instruction.

The STOP instruction is executed as a NOP unless the S bit
in the CCR is cleared. After clearing the S bit, any occurrence of a STOP
instruction puts the CPU into its lowest power mode. After each reset or
restart, QED-Forth leaves the S bit in the CCR in its default set position,
meaning that the STOP mode is disabled.

The following
routines illustrate how to enable and disable the STOP instruction via the S
bit. They also provide a general purpose word that can be called to enter the
low power mode. This code relies on some definitions that were presented in
the previous section.

WAI Low Power Mode

The WAI instruction also puts the 68HC11F1 in a low power
mode. However, clocks are not disabled in the wait mode, so power consumption
is greater than the STOP mode. After a WAI instruction, the machine state is
stacked and processing stops. Power savings can be increased by setting the I
bit in the CCR and disabling the COP. Further savings can be achieved by
disabling the on-chip subsystems, including executing A/D8.OFF to turn off the
A/D (MC68HC11F1 Technical Data Manual, pp.5-17...5-18).

The WAI low power state can only be exited by an unmasked
interrupt or by pulling the /RESET pin low. When an unmasked interrupt occurs,
(for example /IRQ goes low, the COP is not serviced, clock monitor failure or
reset occurs), the appropriate interrupt handler is executed and then processing
continues with the instructions following the WAI. Implementing the WAI lower
power mode is accomplished by simply executing WAI.

For example:

CODE ENTER.WAI.MODE (
-- )

WAI \ execute WAI

RTS \ return

END.CODE

Summary of Low Power Modes

In sum, power can be saved by putting the CPU in a low
power mode while processing is not required. The 68HC11F1 has two low power
modes with different degrees of savings. Both modes are terminated by unmasked
interrupts. While the WAI instruction can be called without any preparation,
the STOP instruction must be enabled by clearing the S bit of the CCR register.

Operating Modes of the 68HC11F1 CPU

The 68HC11F1 microcontroller has four operating modes:
expanded nonmultiplexed, special test, single chip, and special bootstrap modes
(M68HC11 Reference Manual, Section 3 and MC68HC11F1 Technical Data Manual,
pp.4-1...2). The standard operating mode is expanded nonmultiplexed, meaning
that the processor has access to expanded memory beyond its on-chip memory, and
that the address and data lines are not multiplexed together (as they are on
other members of the 68HC11 family). The Handheld also makes use of the
special test mode, renaming it the “special cleanup” mode. This mode makes it
possible to rapidly recover from any programming error that causes repeated
machine crashes. The single chip mode takes away the ability of the processor
to address external memory, and special bootstrap allows startup code to be
inserted into the processor; these two modes are not used on the Handheld.

The processor’s operating mode is determined by the states
of two pins named MODA and MODB (refer to the schematic in Appendix C). On the
Handheld, MODA is always high and MODB may be pulled LOW by installing Jumper
J1; this invokes the special cleanup mode. When Jumper J1 is not installed,
the board is in the standard operating mode.

Special Cleanup Mode

The Special Cleanup Mode is useful if a buggy startup
routine has been installed (using the AUTOSTART or PRIORITY.AUTOSTART words) or
if invalid register initializations have been specified (for example, using the
InstallRegisterInits() word). To recover from these problems, simply enter the
special cleanup mode by installing Jumper J1 and pressing the reset button.
This completely re-initializes the system software to its “pristine” state, and
displays the QED Forth startup message at your terminal.