Techniques for matching the speed of a microprocessor to potentially slower external system components. A master clock signal is communicated to a clock generator on the processor chip. The clock generator provides at least one external clock signal, which is communicated to various portions of the system....http://www.google.com/patents/US5734877?utm_source=gb-gplus-sharePatent US5734877 - Processor chip having on-chip circuitry for generating a programmable external clock signal and for controlling data patterns

Processor chip having on-chip circuitry for generating a programmable external clock signal and for controlling data patternsUS 5734877 A

Abstract

Techniques for matching the speed of a microprocessor to potentially slower external system components. A master clock signal is communicated to a clock generator on the processor chip. The clock generator provides at least one external clock signal, which is communicated to various portions of the system. The clock generator includes programmable clock division circuitry that allows the external clock signal to be generated at any selected one of a plurality of fractions of the master clock frequency. The data pattern (the particular cycles in a sequence during which the processor outputs a data word as part of a multiple-data-word sequence) is programmable independently of the external clock programming.

Images(4)

Claims(5)

What is claimed is:

1. In a computer system having a processor chip, an external system, and a master clock generator off said processor chip, said master clock generator providing a master clock signal, the improvement comprising:

means for communicating said master clock signal to a first input terminal of said processor chip;

means, on said processor chip and responsive to said master clock signal and to an incoming synchronization signal on a second input terminal of said processor chip, for generating an internal clock signal at a frequency that is a predetermined multiple of said master clock frequency;

means, on said processor chip and responsive to said internal clock signal, for generating at a first output terminal of said processor chip an external clock signal, designated SyncOut, at the master clock frequency;

means, off said processor chip, for communicating an indication of a desired frequency divisor to said processor chip;

means, on said processor chip and responsive to said internal clock signal, for generating at least one external clock signal at a fraction of said master clock frequency specified by said desired frequency divisor; and

a feedback connection off said processor chip coupling said SyncOut signal between said first output terminal and said second input terminal that feeds back said SyncOut signal to said means for generating an internal clock signal, whereupon said means for generating an internal clock signal causes said SyncOut signal, as received at said second input terminal, to be synchronized with said master clock signal;

wherein said means for generating at least one external clock signal generates a pair of external clock signals in a predetermined phase relationship with each other.

2. A computer system comprising:

a processor chip having a master clock input terminal, a synchronization input terminal, a synchronization output terminal, and a system clock output terminal;

system logic off said processor chip having a system clock input terminal;

a master clock generator off said processor chip, said master clock generator providing a master clock signal at a master clock frequency to said master clock input terminal;

a phase-lock loop (PLL) on said processor chip that receives said master clock signal and provides a PLL signal at a frequency that is an integral multiple of said master clock frequency, said PLL being further responsive to a synchronization signal at said synchronization input terminal;

a programmable clock divider on said processor chip that receives said PLL signal and generates (a) at least one external clock signal at said system clock output terminal at a frequency equal to the frequency of said PLL signal divided by a desired one of a plurality of divisors, and (b) a signal, designated SyncOut, at said synchronization output terminal at said master clock frequency, said SyncOut signal being synchronized with said at least one external clock signal at said system clock output terminal;

a connection off said processor chip coupling said at least one external clock signal between said system clock output terminal of said processor chip and said system clock input terminal of said system logic to allow said system logic to operate synchronously with respect to said at least one external clock signal; and

a feedback connection off said processor chip coupling said SyncOut signal between said synchronization output terminal of said processor chip and said synchronization input terminal of said processor chip that feeds back said SyncOut signal to said PLL, whereupon said PLL causes said SyncOut signal, as received at said synchronization input terminal, to be synchronized with said master clock signal;

wherein said at least one external clock signal includes a pair of external clock signals in a predetermined phase relationship with each other.

3. The computer system of claim 2 wherein said second circuit connection is characterized by a particular delay, which causes said at least one external clock signal to be advanced by an amount equal to said particular delay relative to said master clock signal.

4. The computer system of claim 2 wherein:

said first circuit connection includes a first buffer characterized by a particular delay; and

said second circuit connection includes a second buffer characterized by said particular delay, which causes said at least one external clock signal to be advanced by an amount equal to said particular delay relative to said master clock signal, thereby causing said at least one external clock signal after said first buffer to be synchronized with said master clock signal and compensating for said delay caused by said first buffer.

5. The computer system of claim 2, and further comprising a non-volatile memory off said processor chip containing a representation of said desired one of said plurality of frequency divisors.

Description

This application is a continuation of application Ser. No. 08/353,169, filed Dec. 8, 1994 now abandoned, which is a continuation of Ser. No. 07/942,675; filed Sep. 9, 1992, now abandoned.

BACKGROUND OF THE INVENTION

The present invention relates generally to integrated circuit devices (chips), and more specifically to techniques for coordinating the timing of a microprocessor chip and an external system.

In a personal computer or workstation comprising a microprocessor chip and external (or "off-chip") system components (system logic, memory, peripheral controllers, etc.), timing signals ("clock signals," or simply "clocks") for various parts of the system are typically derived from a single off-chip master oscillator. Computer designers have long recognized that different parts of the system operate at different speeds, and various data buffering, caching, and interrupt strategies have been devised to prevent overall operation speed from being dragged down to that of the slowest subsystem.

While increases in processor speeds have been accompanied by increases (perhaps more modest) in external component speeds, a gap remains. While performance problems relating to mismatches in speed can in principle be minimized by using the fastest components, considerations such as cost and availability may dictate otherwise.

SUMMARY OF THE INVENTION

The present invention provides techniques for matching the speed of a microprocessor to potentially slower external system components.

According to one aspect of the invention, a master clock signal is communicated to a clock generator on the processor chip. The clock generator provides at least one external clock signal, which is communicated to various portions of the system. The clock generator includes programmable clock division circuitry that allows the external clock signal to be generated at any selected one of a plurality of fractions of the master clock frequency.

In a specific embodiment, the clock generator includes a phase-lock loop ("PLL") that generates an internal signal at four times the master frequency, and the programmable clock division circuitry allows the frequency to be divided by 4, 6, or 8 so that an external signal is available at the master clock frequency, 2/3 the master clock frequency, or 1/2 the master clock frequency. In this embodiment there are two external clocks (RClock and TClock), skewed 90° to each other, one for use as a receive clock by registers that sample processor outputs, the other for use as a transmit clock by registers that drive processor inputs. There is also an internal clock (SClock) at the same frequency and in phase with TClock, which is used by the processor to clock its I/O latches (system interface). The particular clock divisor for RClock, TClock, and SClock is typically determined by the system designer, taking into account the various properties of the system. A representation of the divisor is preferably read into a configuration register at boot time.

According to a further aspect of the invention, the data pattern (the particular cycles in a sequence during which the processor outputs a data word as part of a multiple-data-word sequence) is programmable independently of the external clock programming. In the specific embodiment, the processor can output data words at various average rates between the system clock rate and 1/4 that rate, and according to different data patterns. A representation of the data pattern is also read into the configuration register at boot time. The programmable data pattern provides the system designer an additional degree of freedom.

A further understanding of the nature and advantages of the present invention may be realized by reference to the remaining portions of the specification and the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a computer system incorporating the system timing improvements of the present invention;

FIG. 2 is a block diagram of circuitry on the processor chip for generating internal and external clock;

FIGS. 3A and 3B are timing diagrams illustrating the programmable clock division according to the present invention; and

FIG. 4 is a block diagram of the processor chip and external memories.

DESCRIPTION OF SPECIFIC EMBODIMENTS

FIG. 1 is a block diagram of a computer system 10 incorporating the system timing improvements of the present invention. In accordance with known practice, the system includes a microprocessor chip 15, an off-chip master clock generator 17, and additional off-chip system elements. In a specific embodiment, the microprocessor (often referred to simply as the "processor" or as the "chip") is a CMOS superpipelined RISC processor (MIPS R4000). The system elements are shown as including a control gate array device ("gate array") 20, a memory 22, and external reset logic 23. The processor is also shown as coupled to an external secondary cache 25 and a configuration ROM 26, which in the specific embodiment is a serial ROM. Associated with the system elements are various registers, the significance of which will be discussed below. The processor communicates with the external elements via a number of buses and control lines, including a 64-bit system address/data bus 27 (SysAD), a system command/data identifier bus 28 (SysCmd), and a 128-bit secondary cache data bus 29 (SCData).

All processor and system timing is derived from master clock generator 17, which provides a signal, called MasterClock, to the processor. In a particular embodiment, MasterClock is a 50-MHz signal. The frequency of the master clock signal is often referred to as 1x. Certain aspects of the present invention relate to the programmability of the timing and the timing structure of communications between the processor and the external elements. In the specific embodiment, configuration information is read in from the configuration ROM at boot time.

FIG. 2 is a block diagram of clock generation circuitry 30 on processor chip 15. The clock generation circuitry can be considered to include an internal clock generator 32 and an external clock generator 35. The internal clock generator generates a number of clock signals for use by the processor circuitry, while the external clock generator generates a number of clock signals, at least some of which are made available to the off-chip system elements. All the generated clock signals have, by virtue of the way they are generated, a 50% duty cycle.

Certain aspects of internal clock generator 32 are the subject of commonly-owned co-pending U.S. patent application Ser. No. 07/933,467, filed Aug. 21, 1992, titled CLOCK DISTRIBUTION SYSTEM FOR AN INTEGRATED CIRCUIT DEVICE, now U.S. Pat. No. 5,317,601, issued May 31, 1994, which is incorporated by reference. For present purposes, it suffices to note that the internal clock generator includes a phase-lock loop ("PLL") 40, a clock divider 42, and an initial synchronization circuit 45, all located in generally a single region of the chip, and a plurality of remote synchronization circuits 50 located at a corresponding plurality of distributed regions on the chip. The PLL receives MasterClock at a first input and generates a signal at four times the master clock frequency. The clock divider divides this signal and provides signals at the master clock frequency, two times the master clock frequency, and four times the master clock frequency. The signals at these frequencies, referred to as 1x, 2x, and 4x, and a pair of additional signals PLL1x and Sync4x are synchronized and distributed to the remote locations, where they are again synchronized to Sync4x. The signal at 2x, referred to as PClock (pipeline clock), has every other rising edge aligned with the rising edge of MasterClock and is used by the processor's internal registers and latches (but not the system interface). The synchronized PLL1x signal from one of the remote locations is fed back as SyncPLL to a second input of PLL 40 for overall synchronization.

External clock generator 35 includes a PLL 60, a clock divider 62, an initial synchronization circuit 65, a multiplexer 67, a 90° phase shifter 68, and a final synchronization circuit 70. The PLL receives MasterClock at a first input and generates an internal signal at 4x. The clock divider divides this signal by factors of 4, 6, and 8, and provides signals at 1x, 2/3x, and 1/2x. These signals are synchronized and one of the three is selected at the multiplexer on the basis of a desired external clock divisor (programmable at boot time). The selected signal, and a version thereof that is phase shifted by a quarter cycle are again synchronized to generate an internal signal SClock, an external signal TClock in phase with SClock, and an external signal RClock that leads TClock by a quarter cycle, all three at the same selected frequency (1x, 2/3x, or 1/2x). An additional signal 1x(ND) is used to generate two external signals SyncOut and MasterOut, both at 1x, regardless of the selected frequency. SyncOut (at 1x) is fed back via an external circuit board trace to a SyncIn input pin on the chip and then to a second input of PLL 60 for overall synchronization.

The clock generation circuitry optionally includes cycle-down circuitry 72 that cooperates with the clock division circuitry to provide the ability to divide the internal and external clock signals by fixed divisors during machine operation without destroying the state of the machine. In the specific embodiment, division by 2, 4, 8, or 16 is possible, depending on the state of a CycleDown signal (a bit in the processor's status register) and the value of a CycleDown divisor (also programmable at boot time). This is referred to as operation in the cycle-down mode.

In the cycle-down mode, the internal clock signals at 1x, 2x, and 4x are distributed at the further divided-down frequencies, but the Sync4x and PLL1x signals are distributed without being divided. Similarly, the signals 1x, 2/3x, and 1/2x are further divided down by the CycleDown divisor to generate divided-down SClock, TClock, and RClock, but the 1x(ND) signal is not divided so that MasterOut and SyncOut are generated at the MasterClock frequency. The primary benefit of a cycle-down feature is that the processor and system can be slowed down and consume less power when they are not being actively used. The cycle-down feature is not part of the invention, and the discussions that follow will assume that the CycleDown signal is not asserted so that there is no division by the CycleDown Divisor. The cycle-down feature may impose tighter timing constraints, and thus may require that steps be taken to synchronize the two PLLs.

FIG. 3A is a timing diagram showing the various internal and external signals described above for the case where the selected signal is at 1x so that SClock, RClock, and TClock have the same frequency as MasterClock. Also shown are the intervals during which SysAD may be driven by the processor and the intervals during which the processor may receive data on SysAD. FIG. 3B is a corresponding timing diagram for the case where the selected signal is at 1/2x so that SClock, RClock, and TClock have a frequency that is 1/2 that of MasterClock.

The notations tDM, tDO, tDS, and tDO in the timing diagrams refer to the data output minimum, data output maximum, data setup, and data hold times. Data provided to the processor must be stable a minimum of tDS (3 ns) before the rising edge of SClock and held valid for a minimum of tDH (2 ns) after the rising edge of SClock. This setup and hold time is required for data to propagate through the processor's input buffers and meet the setup and hold time requirements of the processor's input latches. Data provided by the processor becomes stable a minimum of tDM (2 ns) after the rising edge of SClock and a maximum of tDO (10 ns) after the rising edge of SClock. This drive-off time is the sum of the maximum delay through the processor's output drivers and the maximum clock to Q delay of the processor's output registers. These numbers are for a 50-MHz master clock frequency assuming a 50-pf capacitive load.

The general properties and uses of the various clock signals can be summarized as follows.

MasterOut is generated at 1x and has its edges aligned with MasterClock. MasterOut is provided for use in clocking external logic that must cycle at the MasterClock frequency, such as external reset logic 23. SyncOut is generated at 1x and is aligned with MasterClock. The feedback connection between the SyncOut and SyncIn pins makes it possible to compensate for output driver delays and input buffer delays in aligning SyncOut with MasterClock.

SClock is generated at 1x, 2/3x, or 1/2x and is used by the processor to sample data at the system interface and to clock data into the processor's system interface output registers. The processor can thus support a data rate of one 64-bit doubleword onto or off of the SysAD bus per SClock cycle.

The RClock and TClock signals are generated at the same frequency as SClock and are used by the external system components shown in FIG. 1, as will now be described. Gate array 20 is shown as having a sampling register 80 and a staging register 82, for transfers from the processor, a register 85 for transfers to the processor, and a control register 87 for controlling transfers to the processor. The gate array buffers RClock and TClock internally. Associated with memory 22 are a sampling register 90 and a memory output register 92.

RClock is a receive clock is used by the gate array as a receive clock, and the buffered version is used to clock registers that sample processor outputs, such as sampling register 80. The buffered version of TClock is used to clock staging register 82, and is used as the global system clock for the gate array's internal circuitry and as the clock for all registers that drive processor inputs. TClock is also used to clock registers 90 and 92, which are not part of the gate array. Since registers 90 and 92 do not buffer the clocks, it is preferred to buffer RClock and TClock (externally with respect to the processor and gate array) using buffers 93 and 94. Insertion of a matched buffer 95 into the feedback path allows the PLL to null out buffer delays.

FIG. 4 is a block diagram of the processor chip and certain of its interfaces, showing the manner that configuration information is used to control system timing. As can be seen, processor 15 includes an internal primary data cache 100, which communicates with the external memory elements (main memory 22 and secondary cache 25) via an internal system address/data bus 105 (ISysAD). I/O latches 107 and 110 couple ISysAD bus 105 to the external memory buses (SysAD bus 27 and SCData bus 29). Cache control logic 120 controls the data flow between the primary cache and the external memory elements. The figure is not intended to be a comprehensive view of the processor, and only a few of the control signals, relevant to the timing, are shown.

A configuration register 130 is loaded from configuration ROM 26 at boot time and provides a number of control bits specifying the fundamental operational modes of the processor. The particular mechanism for transferring the configuration information is not part of the invention, and will not be described in detail. It is noted, however that on reset (power-on or cold reset), the processor reads a 256-bit serial stream on a ModeIn line at a bit rate equal to the MasterClock frequency divided by 256. The information includes such parameters as byte ordering, secondary cache configuration, and package type, which do not directly relate to the invention, and other parameters such as clock divisors and cache data patterns, which do directly relate to the invention. The specific bit definitions are set forth in Table 1.

As described above, SClock, RClock, and TClock are generated at a frequency that is a selected fraction of the PClock frequency. This selection is shown in FIG. 2 as being effected by multiplexer 67, with a notation that the signal at the multiplexer's select input specifies the selected or desired fraction. As can be seen in Table 1, bits 15-17 are encoded to provide a representation of the desired external clock divisor, and these bits are used to provide the selection signal. The selected clock divisor remains fixed once it is read into the configuration register.

A representation of the CycleDown divisor is also loaded into the configuration register at boot time (being encoded at bits 47-49) and remains fixed. However, as mentioned above, the frequency division for the external clock signals is fixed during operation while the division of internal and external clocks associated with the cycle-down feature is controlled by a bit in the status register, which bit can be set and reset during operation. The cycle-down circuitry operates to ensure that all the clock signals have their rising edges aligned when the frequencies shift so that the machine state is preserved.

An additional field in the configuration register is the transmit data pattern, a representation of which is encoded at bits 11-14. The data pattern refers to the set of SClock cycles within a sequence during which the processor outputs data words (actually 64-bit doublewords) in a multiword transfer onto the SysAD bus. As can be seen in Table 1, the possible data patterns range from a doubleword every cycle (pattern D, or equivalently DD) to a doubleword every fourth cycle (pattern DxxxDxxx). The letter D refers to a data cycle while the letter x refers to an unused cycle. These data patterns are primarily used for transfers from the secondary cache to main memory, and the need to limit the rate of data transfers could arise from bandwidth limitations on either external memory. In the particular embodiment illustrated, the secondary cache outputs two doublewords (128 bits) at a time while the internal and external system buses are 64 bits wide. Thus there is appropriate data staging associated with I/O latch 110.

The processor imposes the data patterns by asserting ValidOut* (seen by the main memory) only on the cycles designated by the specified data pattern, and by controlling the Addr and OE, lines to the secondary cache appropriately. It should be understood that the processor can only read from or write to one external memory at a time. Thus a transfer from the secondary cache to main memory would entail latching 128 bits from the secondary cache during one cycle, and writing the data during later cycles, 64 bits on a given cycle, to main memory over the designated pattern of cycles.

The possible data patterns include a number of patterns wherein the doublewords are output at a uniform rate (patterns D, DxDx, DxxDxx, and DxxxDxxx) and a number of patterns having two doublewords output on successive cycles followed by one or more cycles with no data out (patterns DDx, DDxx, DDxxx, DDxxxx, and DDxxxxxx). If the limitation is on the bus speed or on the memory's ability to accept data, pattern DxDx or pattern DxxDxx might be appropriate. A pattern such as DDxx could be appropriate where two interleaved memory elements each require three cycles after accepting data before being ready to accept data again.

In such a case, the processor could output the data on two successive cycles, and free up the second memory sooner.

It is noted that the processor, while it can accept data at the SysAD interface at the rate of one doubleword per cycle, should not receive data destined for the secondary cache at a data rate faster than that at which the secondary cache can accept data. Therefore, an external system should send data at a rate that accommodates the secondary cache's write cycle time. Since the secondary cache is organized as a 128-bit RAM array, the processor will operate most efficiently if the data is delivered to the SysAD interface as two doublewords on successive cycles, possibly followed by a number of unused cycles as necessary to allow proper writing to the secondary cache. This type of data pattern is established by the system for processor reads from the SysAD bus, and is not to be confused with the data patterns, initially supplied by the system at boot time, that are used by the processor to limit the data rate of processor writes onto the SysAD bus.

In conclusion it can be seen that the present invention provides a high degree of flexibility in system design. The programmable clock division for the external clocks RClock and SClock and the internal system clock SClock allow a wide range of system components and configurations. At the same time, the programmable data patterns accommodate certain speed limitations without requiring that the entire system timing be slowed down.

While the above is a complete description of a specific embodiment of the invention, various modifications, alternative constructions, and equivalents may be used. For example, the particular clock divisors could be changed, and a different number could be provided to select from. Additionally, while the configuration information is shown as being provided via a serial interface, there are other possibilities. Therefore, the above description and illustrations should not be taken as limiting the scope of the invention which is defined by the claims.