ABSTRACT: The notion of a truly reversible computer offers many potential applications. One such application is reversible debugging, also known as bi-directional debugging. Modern debuggers allow one to pause the execution of a program, generally for the purpose of trying to identify an error within the program. In the early stages of attempting to locate the cause of an error, the debugging process would be more intuitive if one could undo the effects of the program on the state of the environment. That is, execute the program in reverse. The potential benefit of reversible debugging is that less time is spent on debugging, which is a process that is normally overly tedious due to the inefficient manner in which it is performed. Further, some programs, such as those that are nondeterministic and operate in real time embedded environments, cannot be debugged by traditional means. The use of state change recording, or history logging, is one means by which such programs could be debugged. This thesis describes a set of techniques that, when used together, provide an efficient means of recording state changes. A case study is included that demonstrates the use of these techniques within a fully functional MIPS simulator implemented in Java. However, the techniques described are hardware and language independent. These techniques rely on both state change recording and program re-execution. State changes are recorded incrementally during each cycle of execution. Periodically, the current set of state changes is accumulated into a checkpoint structure.

Summary:

ABSTRACT (cont.): A stack of checkpoints is maintained, allowing the executing program to be reversed indefinitely. The primary concern of such history logging techniques is the unbounded growth of history information. A checkpoint culling algorithm is used to bound the size of the state history to a logarithmic function, as well as maintain a linear run time for undo operations.

The techniques and issues regarding how and when to create checkpoints are

discussed in earlier chapters. While these techniques are sufficient to support reversible

debugging, the growth of the state history information remains linearly unbounded. This

chapter discusses the Checkpoint Culling Process (CCP), which is a technique used to

maintain logarithmic growth of state history information.

Should the size of the state history information consume all available space, or at

least a significant portion thereof, there are a number of obvious action that can be taken:

(1) Signal an error to the user and indicate that additional storage capacity is needed. The

user would then need to increase the capacity of the storage medium, such as main

memory, and restart the program. (2) Erase all current state history information and start

recording over from the current program position. (3) Erase some of the state history

information (such as oldest first), allowing for additional state changes to be recorded.

The third action describe above is essentially what the CCP approach does, except

it does not wait until the available space has been exhausted. The idea is analogous to

what people tend to remember. That is, people tend to forget details of events that took

place long ago, while remembering more details of recent events. With regards to

checkpoints, it is more likely that only recent checkpoints will actually ever be used.

The Checkpoint Culling Process is applied following the creation of each

checkpoint. This processes uses an algorithm that maintains recent checkpoints, but

gradually removes old checkpoints. The Checkpoint Culling Algorithm1 is described in

Figure 5.1.

Algorithm CullCheckpoints(C, t, z) {
// C is the current set of checkpoints
// t is the current cycle index
// z is the cull rate (2, 3, 4, ...
if (t 0) then
return
for (0
low = max[0, (t zk+)+1]
high = t -zk
Mark all checkpoints in C whose time index is between
low and high (inclusive).
Remove all marked checkpoints from C, except the one
that has the earliest time index.
Unmark the checkpoint that was not removed.
}

Figure 5.1: Checkpoint Culling Algorithm

The CullCheckpoints algorithm provides a logarithmic bound on the size of

the state history. The Table 5.1 shows which checkpoints would be culled out over time,

using the CullCheckpoints algorithm with a z value of 2. 0. Larger values ofz

result in a slower rate of state history growth, though increases the average distance

between culled checkpoints.

1 Dr. Michael P. Frank and I formulated this algorithm in personal discussions during the
development of this thesis.

COLUMN I of Table 5.1 shows a diagram that represents the current set of

checkpoints created over time. A dash (-) represents a checkpoint that was removed,

while an X represents a checkpoint that has been retained. Notice that over time

(increasing values of t), new X marks are added while at certain times earlier X marks are

removed. Furthermore, in each row, the initial state checkpoint and the two previously

constructed checkpoints are always present. This suggests that some optimization can be

made by not checking to see if those checkpoints should be removed during the culling

process. COLUMN II simply shows a graph representing the size of the state history

information over time. That is, it is the same as the previous column, except the dash

entries are removed since they do not occupy any memory. Notice that the graph grows

logarithmically. COLUMN III shows the set of low and high values calculated by the

algorithm. As specified by the algorithm, all existing checkpoints within each of these

regions are removed, except the checkpoint that occurs the earliest.

Typically, checkpoints are not created during every cycle (as suggested by Table

5.1). If the average distance between each checkpoint remains constant, then the value of

t can instead be the cycle counter. Alternatively, each checkpoint can be marked with a

static index value. For example, the first checkpoint created is marked with index 0, the

second checkpoint created is marked with index 1, the third checkpoint created is marked

with index 2, and so on. As checkpoints are removed, these index values do not changed.

With this approach, then the value of t is not a cycle index, but instead a checkpoint

index.

The remainder of this discussion describes the motivation behind the Checkpoint

Culling Algorithm. Let t be the current cycle index, u be the number of steps to be

undone, and f(t, u) represents the worst-case number of steps required to go backwards u

steps from cycle t. Keep in mind that, most often, u << t. That is, the number of desired

undo cycles is much smaller than the total number of previously executed cycles. The

definition of f(t,u) depends on what state history information is available, as described

by the following four cases.

CASE 1: No state history recorded. Since there is no state history information
available, we would need to perform re-execution in order to undo u steps. We
would start from the initial state (perhaps by resetting the environment and
reloading the program), then re-execute t -u cycles forward. Therefore, f(t,u)
is O(t -u). Since u << t may mean that u = o(t), then f(t,u) = Q(t), which is a
poor time complexity since t is often very large. On the other hand, the space
complexity is not affected in this case.
II
I I I
B A

CASE 2: Complete incremental history recording. The state changes for each
and every cycle of execution is recorded, thus we can readily reverse u steps by
applying the available state history in reverse. Therefore, f(t,u) is O(u), which
is reasonable since often u << t. However, this case requires Q(t) space to store
the incremental state changes (see Chapter 2).
I l I
l------------------- kII I I I I I I I --------------------

CASE 3: Complete checkpoints recorded at constant intervals. Checkpoints
are created periodically, at some relatively constant time interval. In this case,
f(t, u) is 0(1). Undoing u steps involves first finding the checkpoint whose
cycle index is closest to (but does not exceed) t u. The state is set to said
checkpoint, then we perform (at most) a constant number of steps to reach the
actual desired cycle time. While a constant time complexity is very desirable, this
technique requires Q(ts) space (where s is any checkpoint size).
I t I

CASE 4: Checkpoints culling. Checkpoints are created as in the previous case,
but the Checkpoint Culling Algorithm is applied following the creation of each
checkpoint. This acts as a compromise between the time-inefficiency of case 1
(re-execution) and the space-inefficiency of case 2 (complete history recording).
The result is that f(t,u) is O(u) with a space complexity of O(logt).

Su I-

The Checkpoint Culling Algorithm enforces the following invariant: there is

always at least one checkpoint between each low = (t zk) ++1 and high = t k range

for all values of k (i.e. 0 k < [logZ t]). This is clear by inspecting the culling algorithm

or observing Table 5.1 (which models the execution of the algorithm).

Suppose the user desires to undo u steps (u t) and that this invariant holds. If u

is less than the current number of incremental state change records, if any, then the undo

operation is O(u) as described in case 2 above. Otherwise, the undo operation requires

re-execution, which (as will be shown) also has a time complexity of O(u).

44

For any given target index r = t u, there exist a pair of checkpoints whose time

index surrounds r. We can call these checkpoints Clow and Chigh, such that tc r tchgh

(where tx represents the time index of X). The number of cycles to be re-executed

would be e = r tc < k u (for some k). That is, the distance from Clo to the target

cycle r is always less than some constant times u, or O(u). Figure 5.2 demonstrates

these arguments.

I U
: ---e--B--

Cw Cbig2-
Sn 7n 6n 5n 4n 3n Zn In

A = Current Cycle Index An Exitng Ch
B = Target Cycle Index (same as r)
e = Necessary Re-Execution Distance to Reach r
n = Average Cycle Interval Between Checkpoint Creation P
that has hee:
r = Target Cycle Index that has bee
I = Size of State History Record Vector
t = Total Execution Time of Program
u = Some User Defined Undo Distance

Figure 5.2: State History Model With Checkpoint Culling

L A
I
u <= ISI

ieckpoint

ated Checkpoint
n culled out

Clearly, the advantage of the Checkpoint Culling Algorithm is to reduce the state

history growth rate from a linear function to a logarithmic function while also keeping the

undo time linear. The runtime of the algorithm itself should not pose any significant

overhead. In the JIMS project, the algorithm was implemented such that the vector of

checkpoint needed to be scanned only once, thus a time complexity of O(n), where n is

the number of checkpoints.

State history recording generally requires a massive amount of readily available

The assembly process typically requires multiple passes. If any labels are

defined, then two passes are required. If any pseudo or synthetic instructions are used,

then two passes are also required. Most programs use both labels and pseudo/synthetic

instructions. Among other benefits, labels are used to give assembly programs much

greater clarity. The pseudo instructions are as listed above, and are essentially macros

that represent a sequence of more fundamental instructions. Synthetic instructions,

however, are more difficult to explain.

Synthetic instructions are inserted into the program during the assembly process.

They are intended to correct assembly code that is otherwise incorrect or meaningless. A

Equal Immediate
Greater Than or Equal
Greater Than or Equal Unsigned
Greater Than
Greater Than Unsigned
Less Than or Equal
Less Than or Equal Unsigned
Not Equal
Not Equal Immediate

If Equal Zero
If Greater Than or Equal
If Greater Than or Equal Unsigned
If Greater Than
If Greater Than Unsigned
If Less Than or Equal
If Less Than or Equal Unsigned
If Less Than
if Less Than Unsigned

$aO

h $al

typical example is the use of the LW (Load Word) instruction. Often, assembly code

includes an instruction such as "LW $t0, x", where it is intended to load register $t0

with the value at the memory address designated by label x. However, used in this form,

the LW instruction can only load from the first 16-bit region of the address space (that is,

$0000 to $FFFF). Typically, program data does not reside in this region. More than

likely, the label x refers to some 32-bit address. To resolve the problem, the assembler

can use a synthetic instruction and modify the specified LW instruction as follows:

LUI $at, hil6(x)
LW $tO, lo16(x) ($at)

While both pseudo and synthetic instructions provide tremendous convenience for

programmers, inserting instructions increases the distance between labels. As a result,

jump targets and branch distances can be difficult to pre-determine. This is another

motivation for the use of labels, which allow the assembler to automatically determine

this information using two passes. The first pass translates any pseudo instructions and

inserts any necessary synthetic instructions. At the same time, symbolic assembly

instruction are converted into their machine code form. Literal label and target addresses

are determined during the second pass, and instructions that use these labels are translated

into their final machine code form. The final machine code form is then output to a file,

to be used by the simulator.

The JIMS assembler uses a proprietary object code format, which is easier for

debugging purposes. One can readily view the object code and verify manually if it is

correct or not. This format is described in a separate document associated with JIMS.

shift amount, etc.). This data is stored in an InstructionDecodeBuffer which is

later sent to the InstructionExecuter during the execution stage.

For the most part, the Simulator layer is responsible for managing state change

records and checkpoints (such as determining when to create a checkpoint, etc.). The

state history data itself is actually stored in the StateHistoryBuf fer and

CheckpointBuffer modules. However, the Simulator does not modify state

values directly.

All state values are guarded by the State sub-layer. As a result, the

Simulator actually makes a request to change a state value by using the State layer.

The request is always granted, however this policy allows the State layer itself to

record all changes in state values, which facilitates the creation of state records (used for

state history recording). For JIMS, state values include main memory, registers, and the

registers of coprocessor 0 and coprocessor 1 (which are each represented accordingly as

separate modules below the State layer).

Highlights of Simulator Source Regarding State History Recording

The remainder of this appendix includes highlights of the main Simulator

source code. These highlights demonstrate the implementation of the reversible

execution feature of JIMS. The functions highlighted, followed by a brief description,

are as follows:

per formCyc le () : Used to perform one cycle of execution for the JIMS simulator. Notice that
the check that determines if a checkpoint should be created is near the beginning of this function.
Furthermore, notice how the state change record is handled at the end of the function.
bTimeToMakeCheckpoint () : Uses the WUA approach described in Chapter 4 to determine
if a checkpoint should be created during the current cycle.
makeCheckpoint () : A wrapper for the doCreateCheckpoint(, though notice the steps that are
done following the creation of the checkpoint.

68

iGetNextCheckpointByteSize () : Estimates the size of the next checkpoint, as described
in Chapter 3.
doCreateCheckpoint () : Performs the actual work of creating a checkpoint. Notice how the
code is very similar to the iGetNextCheckpointByteSize() function.
per formCheckpointCul ing (): Apples the Checkpoint Culling Algorithm to the current
set of checkpoints.
iPerformUndo (long 1UndoDistance): Shows how state change records, checkpoints,
and re-execution are all used together to provide reversible execution.

private void performCycle() throws SimulatorException {
// This is the primary method responsible for simulating
// the CPU cycles. This method should only be called
// from the iPerformStep(int) method.

if (bHistoryRecordingEnabled)
if (bTimeToMakeCheckpoint) {
makeCheckpoint();
// Since we just performed a checkpoint, we do not need to
// record state changes for this cycle. This means
// that stateHistoryBuffer should remain null.
performCheckpointCulling();

else {
if (lCycleIndex > lLastCheckPointCycleIndex)
// We only want to record state changes if the current
// cycle is past (greater than) that of the last checkpoint.
// NOTE:
// If (lCycleIndex == lLastCheckpointCycleIndex then
// a checkpoint was created on the last cycle.
// If (lCycleIndex < lLastCheckpointCycleIndex then
// there is a bug in the system.

private boolean bTimeToMakeCheckpoint()
// This method implements the heuristic that determines
// when a checkpoint is to be created.
// If it is determined that a checkpoint should be created,
// then this method returns TRUE. Otherwise, it returns FALSE.

// Get the memory value at the current address, and add
// it to the string buffer.
byte value = state.loadByte(iAddress);
sb.append(Utility.sAsHexPadded(value, 2) + ");

if (le.hasMoreElements()) {
// No more addresses to process.
break;
}

Increment the address counter. We expect the

71

/ next address to match this incremented value.
iAddress++;

// Get the next address.
i = ((Integer)e.nextElement()).intValue();

if (i 1= iAddress) {
// The new address does not match the address that
/ we expected (i.e. we have a disjoint set).
/ Add the current string buffer value to the
/ command response, and prepare a new string buffer.
cpb.addStateValue(sb.toString());

// Since z was already declared, but is no longer used,
// we re-use it also as a flag to indicate when we first
// encounter a checkpoint in this region (rather than using
// one more variable, like a boolean).
z = -1;

do

// Get the time index of the checkpoint specified
/ by the current checkpoint index.
CheckpointBuffer cp = (CheckpointBuffer)vCheckpoint.elementAt(iChkIndex);
int iTimeIndex = (int)cp.lGetCyclelndex();

if ( (iTimelndex >= iLower) && (iTimelndex <= iUpper) ) {

72

if (z == -1)
// This is the first checkpoint encountered in this
/ region. Keep it and increment the checkpoint
/ index to the next checkpoint.
iChkIndex++;
z = 0; // Clear the "first encountered" flag

}else {
// We already encountered the first checkpoint in ths
/ region, therefore this checkpoint is marked
/ to be deleted. Go ahead and do so.
vCheckpoint.removeElementAt(iChkIndex);
}
}else
// We have encountered a checkpoint whose time index
/ is outside this region. This means there are no
/ more checkpoints that can be in this region, so
/ move to the next region.
break;
}
}while(true);
}// end for k == K down to 1
}// end method performCheckpointCulling()

if ( (iStateHistorySize == 0) && (iCheckpointSize == 0)
// There is no history information. Either we are in
/ the initial state, or all the history information
/ was cleared. Eitherway, we have no information
/ to perform the undo with.
return -1;
}

long lTargetCycle = lCycleIndex 1UndoDistance;
// The target cycle is what execution cycle we are trying
/ to get to, which is simply the current cycle
/ minus the number of cycles we want to undo.

if (lTargetCycle < 0) {
// We can not go to a state that is earlier than
/ the initial state.
lTargetCycle = 0;
}

0) II (lUndoDistance > iStateHistorySize) ) {

if ( (iStateHistorySize

73

// There is no state history information. Or the distance
// of the undo is larger than the size of the state history
// buffer. Regardless, perform the undo using the last checkpoint.

if (lUndoDistance > iStateHistorySize) {
// Clear the state history, since all of the state changes
// recorded in it will not be used. An earlier
// checkpoint will be used instead.
clearStateHistoryBuffer();
}

if (iCheckpointSize > 0)
// Set the state to that of the last useful checkpoint
// that was created.

CheckpointBuffer cpb = null;
long ICheckpointCycleIndex = -1;
// Find a checkpoint buffer to start from.
try
do
cpb = (CheckpointBuffer)vCheckpoint.elementAt(iCheckpointSize-l);
ICheckpointCycleIndex = cpb.lGetCyclelndex();
if (lCheckpointCycleIndex > lTargetCycle) {
// The checkpoint represents a state at a point in time
// AFTER the desired target cycle. It would be better
// to examine the next checkpoint.

vCheckpoint.removeElementAt(iCheckpointSize-1);
iCheckpointSize--;
if (iCheckpointSize <= 0)
// This is a safety check. If no more checkpoints
// exists, then go back to the reset state.
state.reset();
cpb = null;
break;
}
}else
break;
}
while (true);

if (lCheckpointCycleIndex == lTargetCycle)
// The previous cycle (iTargetCycle) was when the
// checkpoint was created. We can remove the checkpoint,
// since if the user goes back further, the top checkpoint
// is no longer useful. If the user steps forward,
// the checkpoint will be recreated.
vCheckpoint.removeElementAt(iCheckpointSize-1);
iCheckpointSize--;
}

}catch (Exception e)
cpb = null;
}

// We found no suitable checkpoint buffer. Either an error occurred,
// or there is no more suitable undo information available.
if (cpb == null) {
return -1; // NO MORE STATE HISTORY CHANGE
}

// Set the state back to the reset state. All memory and
// register values should be defaulted to have the value 0.
state.reset();

ICycleIndex = lCheckpointCycleIndex;
// Set the current cycle count to the time index stored
// in the checkpoint buffer.

// Apply all of the state settings stored in the checkpoint.
Vector v = cpb.vGetStateRecord();

// CRITICAL ERROR (index out of range, null pointer
// indicates either a bug in the simulator, or
// perhaps an out of memory condition).
return -4;

}// end while (performing cycles until reach iTargetCycle)
// *************************************************************

} else {

a buffer for now.

// There is undo information in the state history buffer.
// These are performed first, before checkpoints.

do {

StateHistoryBuffer shb = null;
// Get the last state history buffer instance.
try {
shb = (StateHistoryBuffer)vStateHistory.elementAt(iStateHistorySize-1);
vStateHistory.removeElementAt(iStateHistorySize-1);
iStateHistorySize--;
}catch (Exception e)
shb = null;
}
if (shb == null)
// Either an error occurred, or there was no state history
// buffer information (meaning there is no undo information).
return -1; // NO MORE STATE HISTORY CHANGE
}

// Apply the state values stored in the change buffer.
// These must be applied in reverse, so that the
// state is returned to its earliest state per this
// change buffer.
Vector v = shb.vGetStateHistory();
for (int i = v.size()-l; i >= 0; i--)
String s = (String)v.elementAt(i);
int x = iApplyStateSetting(s);
if (x 1= 0) {
// This is most likely the result of an
// internal bug in the simulator.
return -2;
}

// It is assumed that each state history element represents
// one cycle of execution (because at least $PC will change
// on every cycle). Therefore, undoing this state history
// change decreases the cycle count by one. But to be certain,
// use the cycle index stored in the SHB instead.
lCycleIndex = shb.lGetCyclelndex();

// Decrease the size of the state history by the size of the
// state history buffer that was just applied. This is for
// performance reasons, so that the size of the history buffer
// does not need to be directly re-calculated each cycle.
iStateHistoryByteSize -= shb.iGetSize();

}while (lCycleIndex > lTargetCycle);

// Disable any exception that was caused by the previous command.
// If we don't, then the instruction will be undone, but a pending
// exception will be waiting (e.g. for input) when it shouldn't be.
iExceptionCode = EXCEPTION NONE;