================================================================================
Note 9.0 LSI-11/73 Cache Concepts No replies
JAWS::KAISER 307 lines 25-MAR-1985 09:18
--------------------------------------------------------------------------------
+---------------+ +-----------------+
| d i g i t a l | | uNOTE # 009 |
+---------------+ +-----------------+
+----------------------------------------------------+-----------------+
| Title: Cache Concepts and the LSI-11/73 | Date: 02-JUL-84 |
+----------------------------------------------------+-----------------+
| Originator: Charlie Giorgetti | Page 1 of 6 |
+----------------------------------------------------+-----------------+
The goal is to introduce the concept of cache and its particular
implementation on the LSI-11/73 (KDJ11-A). This is not a detailed
discussion of the different cache organizations and their impact on
system performance.
What Is A Cache ?
-----------------
The purpose of having a cache is to simulate a system having a large
amount of moderately fast memory. To do this the cache system relies on
a small amount of very fast, easily accessed memory (the cache), a
larger amount of slower, less expensive memory (the backing store), and
the statistics of program behavior.
The goal is to store some of the data and its associated addresses in
the cache and all of the data at its usual addresses (including the
currently cached data) in the backing store. If it can be arranged that
most of the time when the processor needs data it is located in fast
memory, then the program will execute more quickly, slowing down only
occasionally for main memory operations. The placement of data in the
cache should not be a concern to the programmer but is a consequence of
how the cache functions.
Figure 1 is an example of a memory organization showing a cache with
backing store. If the data needed by the microprocessor (uP) can be
found in the cache then it is accessed much faster due to the local data
path and faster cache memory than by having to access the backing store
on the slower system bus.
+-----------+ System Bus
+------+ CPU Internal Buses | System | For Memory and I/O Options
| || Bus |
| uP | | | Interface | |
| |
Page 2
A cache memory system can only work if it can successfully predict most
of the time what memory locations the program will require. If a
program accessed data from memory in a completely random fashion, it
would be impossible to predict what data would be needed next. If this
was the case a cache would operate no better then a conventional memory
system.
Programs rarely generate random addresses. In many cases the subsequent
memory address referenced is often very near the current address
accessed. This is the principle of program locality. The next address
generated is in the neighborhood of the current address. This behavior
helps makes cache systems feasible.
The concept of program locality is not always adhered to, but is a
statement of how many programs behave. Many programs execute code in a
linear fashion or in loops with predictable results in next address
generation. Jumps and context switching give the appearance of random
address generation. The ability to determine what word a program will
reference next is never completely successful and therefore the correct
"guesses" are a statistical measure of the size and organization of the
cache, and the behavior of the program being executed.
The measure of a cache performance is a statistical evaluation of the
number of memory references found versus not found in cache. When
memory is referenced and the address is found in the cache this is known
as a hit. When it is not it is termed a miss. Cache performance is
usually stated in terms of the hit ratio or the miss ratio where these
are defined as:
Number of Cache Hits
Hit Ratio = ---------------------------------
Total Number of Memory References
Miss Ratio = 1 - Hit Ratio
The LSI-11/73 Cache Implementation
----------------------------------
The cache organization chosen must be one that can be implemented within
the physical and cost constraints of the design.
The LSI-11/73 implements a direct map cache. A direct map organization
has a single unique cache location for a given address and this is where
the associated data from backing store are maintained. This means an
access to cache requires one address comparison to determine if there is
a hit. The significance of this is that a small amount of circuitry is
required to perform the comparison operation. The LSI-11/73 has an 8
KByte cache. This means that there are 4096 unique address locations
each of which stores two bytes of information.
Page 3
The cache not only maintains the data from backing store but it also
includes other information that is needed to determine if its content is
valid. These are parity detection and valid entry checking. The
following diagram shows the logical layout of the cache and what each
field and its associated address in the cache is used for.
Binary Cache
Entry Address P V TAG P1 B1 P0 B0
+---+---+-------------+---+----------+---+----------+
000000000000 | | | | | | | |
+---+---+-------------+---+----------+---+----------+
000000000001 | | | | | | | |
+---+---+-------------+---+----------+---+----------+
000000000010 | | | | | | | |
+---+---+-------------+---+----------+---+----------+
. .
. .
. .
+---+---+-------------+---+----------+---+----------+
111111111101 | | | | | | | |
+---+---+-------------+---+----------+---+----------+
111111111110 | | | | | | | |
+---+---+-------------+---+----------+---+----------+
111111111111 | | | | | | | |
+---+---+-------------+---+----------+---+----------+
Figure 2 - LSI-11/73 Cache Layout
The Cache Entry Address is the address of one of 4096 entries within the
cache. This value has a one-to-one relationship with a field in each
address that is generated by the processor (described in the next
section on how the physical address accesses cache).
Each field has the following meaning:
Tag (TAG) - This nine bit field contains information that is
compared to the address label, described in the next section on
how the physical address accesses cache. When the physical
address is generated, the address label is compared to the tag
field. If there is a match it can be considered a hit provided
that there is entry validation and no parity errors.
Cache Data (B0 and B1) - These two bytes are the actual data
stored in cache.
Valid Bit (V) - The valid bit indicates whether the information
in B0 and B1 is usable as data if a cache hit occurs. The valid
bit is set when the entry is allocated during a cache update
which occurs as a result of a miss.
Tag Parity Bit - (P) - Even parity calculated for the value
stored in the tag field.
Page 4
Parity Bits (P0 and P1) - P0 is even parity calculated for the
data byte B0 and P1 is odd parity calculated for the data byte
B1.
When the processor generates a physical address, the on-board cache
control logic must determine if there is a hit by looking at the unique
location in cache. To determine what location to check, the cache
control logic considers each address generated as being made up of three
unique parts. The following are the three fields of a 22-bit address
(in an unmapped or 18-bit environment the label field is six or four
bits less respectfully):
21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+ +--+
| | | | | | | | | | | | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+ +--+
|| || BYTE
SELECT
Figure 2 - Components of a 22-bit Address For Cache Address Selection
Each field has the following meaning:
Index - This twelve bit field determines which one of the 4096
cache data entries to compare with for a cache hit. The index
field is the displacement into the cache and corresponds to the
Cache Entry Address.
Label - Once the location in the cache is selected, the nine bit
label field is compared to the tag field stored in the cache
entry under consideration. If the address label and the tag
field match, the valid bit is set, and there is no parity error,
then a hit has occurred.
Byte Select Bit - This bit determines if the reference is on an
odd or even byte boundary. All Q-bus reads are word only so
this bit has no effect on a cache read. Q-bus writes can access
either words or bytes. If there is a word write the cache will
be updated if there is a hit. If there is a miss a new cache
entry will be made. If there is a byte write, the cache will
only be updated if there is a hit. A miss will not create a new
entry on a byte write.
The LSI-11/73 direct map cache must update the backing store on a memory
write. The LSI-11/73 uses the write through method. With this
technique, writes to backing store occurs concurrently with cache
writes. The result is that the backing store always contains the same
data as the cache.
Page 5
Features Of The LSI-11/73 Cache
-------------------------------
The LSI-11/73 direct map cache has a number of features that assist in
the performance of the overall system in addition to the speed
enhancement as a result of faster memory access. These features consist
of the following:
o Q-bus DMA monitoring
o I/O page reference monitoring
o Memory management control of cache access
o Program control of cache parameters
o Statistical monitoring of cache performance
The LSI-11/73 cache control logic monitors the Q-bus during DMA
transactions. When an address that has its data stored in cache is
accessed during DMA, the cache and backing store contents might no
longer be the same. This is an unacceptable situation. The cache
control logic invalidates a cache entry if the address is used during
DMA. This also includes addresses used during Q-bus Block Mode DMA
transfers.
Memory references to the I/O page are not cached since that data is
volatile, meaning its contents can change without a Q-bus access. Since
the cache could end up with stale data, I/O references are not cached.
There are situations for which using the cache to store information for
faster access is not desirable. An example is a device that resides in
the I/O page, and is true in other instances as well. One situation is
a device that does not reside in the I/O page but can change its
contents without a bus reference, such as dual ported memory.
Another situation is partitioning and tuning an application for
instruction code execution versus data being manipulated. In this case
the instruction stream may execute many times over for different data
values. Speed enhancement can be obtained if the instructions are
cached while the data is not cached. By forcing the data never to be
cached it cannot replace instructions in the cache.
The memory management unit (MMU) of the LSI-11/73 can assist in this
situation. Pages of memory allocated for data can be marked to bypass
the cache and therefore not effect instructions that loop many times.
The cache and the MMU work together to achieve the goal of increased
system performance.
The dynamics of cache operation are under program control through use of
the Cache Control Register (CCR), an LSI-11/73 on-board register. This
register can "turn" the cache on or off, force cache parity errors for
diagnostic testing, and invalidate all cache entries. The details of
the CCR are described in the KDJ11-A CPU Module User's Guide (part
number EK-KDJ1A-UG-001).
During system design or at run-time the performance enhancements
provided by the cache system can be monitored under program control.
This is accomplished by using another LSI-11/73 on-board register the
Page 6
Hit/Miss Register (HMR). This register tracks the last six memory
references and indicates if a hit or miss took place. The details of
the HMR are also described in the KDJ11-A CPU Module User's Guide.
Summary
-------
Caches are a mechanism that can help improve overall system performance.
The dynamics of a given cache are dictated by the organization and the
behavior of the programs running on the machine. The LSI-11/73 cache is
designed to be flexible in its use, simple in implementation, and
enhance application performance.
More detailed discussions on how caches work and other cache
organizations can be found in computer architecture texts that have a
discussion of memory hierarchy.