1. Stage One - Unpipelined -> Must demonstrate this in demo 1

To start, you should do a single-cycle, non-pipelined implementation. Figure 4.24 on page 329 is a good place to start.

For this stage, use the Single cycle perfect memory. Since you will need to fetch instructions as well as read or write data in the cycle, use two memories -- one for instruction memory and one for data.

2. Stage Two - Pipelined -> Must demonstrate this in demo 2

After you have completed the single cycle implementation, you will next implement a pipelined version of the architecture. A good starting point is Figure 4.65 on page 384 of COD fourth edition. Continue to use the single cycle memory.

Be sure that the non-pipelined version is functional before you try the pipelined version. While designing the non-pipelined version, make considerations that will allow easy conversion to the pipelined version.

3. Stage Three: Memory Design

The next few steps will improve our memory and make it more realistic.

3.1 Aligned memory

At this step, replace the original single-cycle memory with the Aligned single cycle memory. This is a very similar module, but it has an "err" output that is generated on unaligned memory accesses. Your processor should halt when an error occurs. Verify your design.

3.2 Stalling memory (Suggested Finish by 04/24)

At this step, replace the single cycle memory with the Stalling memory. This is a very similar module, but has stall and done signals similar to the cache you built. Your pipeline will need to stall to handle these conditions. Verify your design.

Instruction memory: First replace your instruction memory module with this stalling memory, keep your data data memory module the same (i.e. aligned perfect memory from previous step). Verify your design. This will be easier to debug, as only module's behavior has changed.

Data memory: Now, replace your data memory module alone with this stalling memory, revert your instruction memory module back to the aligned perfect memory. Verify your design. This will be easier to debug, as only module's behavior has changed.

Instruction and Data memory: Now change both instruction and data memories to the stalling memory design. Verify your design.

3.3 Four-banked memory

The simple memory module is still highly idealized. Real memory systems are pipelined and use multiple banks. You are provided with such a simple four-banked memory module
that models a simple DRAM controller.

I suggest that you do not interface this four-banked memory directly to your processor. Just use it to fetch blocks for your cache. This sub-step does not require verification.
Make sure your design continues to compile. Proceed to stage 4.

4. Stage Four - Cache Design: -> Must demonstrate in Cache Demo

Implement a functioning memory system that uses caches as described on the cache design page.

5. Stage Five - Direct-Mapped Cache:

Replace your processor's memory modules with the cache modules.

Again, follow an incremental approach like we did for the stalling memory.

Instruction memory: First replace your instruction memory module with the mem_system module developed , make your data data memory module perfect (i.e. aligned perfect memory from previous step). Verify your design. This will be easier to debug, as only one module's behavior has changed.

Data memory: Now, replace your data memory module alone with the mem_system module developed, revert your instruction memory module back to the perfect memory. Verify your design. This will be easier to debug, as only one module's behavior has changed.

Instruction and Data memory: Now change both instruction and data memories to the mem_system design. Verify your design.