Alchitry

Navigation: Main menu

This tutorial will cover how DRAM (Dynamic Random Access Memory), or more specifically SDRAM (Synchronized DRAM), works and how you can use it in your projects. We will be using the SDRAM Shield.

What is RAM?

It is first important to understand what RAM is in general before diving into a specific type. RAM is simply a large block of memory that you can access more or less at random very quickly. It provides temporary storage for your design for things like images, video, or sampled data. In some applications it can even be used to store the instructions and data for a processor.

Notice the word temporary I used. This is because RAM is a volatile form of memory. That means without power, the contents of the memory will be lost.

RAM is organized into banks, rows, and columns. I like to think of RAM as a set of notebooks where each notebook is a bank, each page is a row, and each line is a column. Each bank, or notebook, can be accessed independently of the other banks. Each bank is comprised of many rows and each row has many columns. To access a specific piece of data you must specify all three pieces of information, the bank, row, and column.

The actual protocol required to access data depends on the type of RAM being used. However, all RAM breaks our a very similar interface. You generally have an address input, which specifies the row and column, a bank select input, which specifies the bank, a data input/output, which is used for reading and writing data, and a few control signals.

How DRAM works

So now you know that any type of RAM is used to store large amount of data, how does it actually store this data?

The basic storage element behind DRAM is the capacitor. Just as a basic refresher, a capacitor is a device that is able to store a charge. You can think of them much like a balloon. Just as you can fill a balloon with some air, you can fill a capacitor with some charge.

The basic cell in DRAM looks like the following.

There is simply a capacitor that stores a charge, and a transistor that allows charge to either be put into the capacitor or taken out.

These cells are arranged into a large 2D array of rows and columns. These are the same rows and columns from before.

When you write data to DRAM, charge is placed on capacitors that should have a value of 1, but no charge is placed on capacitors that have a value of 0.

When you read data from DRAM, the charge on the capacitor is measured using a circuit called a sense amplifier. If the sense amplifier detected charge on the capacitor then it outputs a 1, otherwise it assumes the cell was a 0.

There are a two main problems to the fundamental design of DRAM. First, to read the charge from the capacitor, the charge must be drained. This causes all reads to be destructive. Once you read a piece of data from DRAM, the value is no longer being stored in the memory array. To deal with this, the data must be written back into the array when you are done with it. This is called precharging.

To make the interface to DRAM a bit more efficient, an entire row is read into a buffer in the DRAM. The process of reading a row into that buffer is referred to as opening or activating the row. Once a row is open, data can be read or written to any columns in that row without having to open it again.

However, only one row per bank can be open at a time. To read from a different row in the same bank, you must first precharge the current row, then open the new row.

The second fundamental flaw of DRAM, and the reason it is called dynamic RAM, is that capacitors leak charge. That means that once a charge is stored on a capacitor, it will start losing that charge. This happens either through the transistor connected to it, or through the capacitor itself. What this means for your data is that, if neglected, the values stored will be lost.

The fix to this problem is to periodically refresh each row. A refresh consists of simply reading a row then writing it back into the array. This process ensures that the capacitors retain their charge.

The amount of time a row can go between refreshes depends on the DRAM. However, the SDRAM chip on the SDRAM Shield, must be refreshed every 64ms.

Generally, SDRAM will be able to perform the refresh operation for you. However, you still must tell it when to refresh.

DRAM vs SDRAM

The difference between these two types of RAM is that SDRAM is synchronous and DRAM is not. All this means is that the SDRAM uses a clock while DRAM does not. The benefits to SDRAM are that inputs and outputs are synchronized to whatever it is connected to, in our case the FPGA, as well as some speed benefits due to pipelining.

SDRAM is much more common than plain DRAM.

It is also worth noting that DDR (Double Data Rate) RAM, usually heard in the context of computers, is a form of SDRAM.

DRAM vs SRAM

The difference between DRAM and SRAM is a bit more interesting. SRAM operates fundamentally differently than DRAM. It doesn't store data on capacitors, but instead uses two inverters back to back.

This solves the two problems discussed earlier about destructive reads and forgetting the value. However, this comes at a price, literally. SRAM is much more expensive than DRAM due to the fact that the technology is much less dense. Each cell in SRAM is much larger than each cell of DRAM, meaning you can't pack nearly as many into the same area.

SRAM is, however, faster and uses less power than DRAM. Because of this, it is still used frequently in digital systems for things like caches. Modern CPUs have something like 8-16MB of very fast SRAM cache, but the computer can have 1000x that much (8+GB) DRAM.

The Controller

Create a new project based on the Base Project. We now need to add the SDRAM controller to our project. In the Component Selector, select Controllers/SDRAM Controller. We also need the pin definitions for the SDRAM Shield, so also check off Constraints/SDRAM Shield. Add these to your project.

Open up the sdram.luc file and take a look at it. It can be helpful to have the datasheet for the SDRAM chip open.

You can declare structs inside your module, but they are then local to your module and can only be used internally (not in port definitions).

A struct definition consists of the struct keyword, followed by the name of the struct, followed by the list of the struct's members. A member declaration consists of a name, an optional struct type, and an optional array size.

We have the canonical clock and reset inputs. We then have a bunch of IO condensed into 4 lines by using structs. The connection to the SDRAM chip consists of an output and an inout. The interface from the controller to the rest of the FPGA is broken into an input and an output. Take a look at the struct definitions for details of their contents.

Commands

The SDRAM chip accepts a series of commands that we define as constants for easier use.

The relations of these states can be summed up in the state diagram shown below. The WAIT state wasn't shown for clarity.

When the board is powered on (or reset) the FSM starts in the INIT state. SDRAM requires a bit of initialization before you can read and write to it. This is also covered in the datasheet (page 42) for those curious.

After the board is initialized, it sits in the IDLE state until one of two things happen, either it's time to perform a refresh or there is a pending operation.

First, let's talk about the refresh. To manage the refreshing, there is a timer that tells the controller to send another refresh operation. The SDRAM requires 8,192 refresh commands to be sent every 64ms. That means you can either send a refresh command every 7.813µs or all 8,192 commands in a batch every 64ms. To provide a more uniform interface, this controller sends the refresh commands evenly spaced. This limits the maximum amount of time the controller will be busy doing refreshes. In some applications where you need very fast burst speeds, but have some known down time, performing burst refreshing can be better.

When a read or write command is pending, the controller first checks to see if the row is open. If the requested row is already open, life is great, it simply reads or writes to the row. If the row isn't open then it first opens the row before performing the operation. The worst case is if there is already another row open. In this case the other row must be precharged, before the controller can open the new row and perform the operation.

Each of these operations has some number of cycles the SDRAM requires to complete (the reason for the WAIT state). These sometimes vary with the clock frequency (in other words, they have a set amount of real time). This controller assumes a clock rate of 100MHz. This is important for other reasons as well that will be discussed a little later. All of these delays and timing specifications can be found in the datasheet (many of them are on pages 27-28).

This mostly sums up how the controller works. If you want an even deeper understanding, you need to take a look at the rest of the code in the controller as well as the SDRAM datasheet.

However, there is some advanced voodoo magic going on in the controller code that is worth mentioning.

Dealing with the Hardware

When you start interfacing with a relatively high speed external device, you start having to deal with FPGA specific details. There are two hardware related issues addressed in the controller. The first is that the FPGA can't route a clock signal directly to an output pin. This is because the clock and general logic of an FPGA share different routing resources and there isn't a way for the clock signal to move back into the general routing system. However, we can use an ODDR2 primitive to compensate for this.

// The OODR2 is used to output the FPGA clock to
// an output pin because a clock can't be directly
// routed as an output.
xil_ODDR2 oddr (
#DDR_ALIGNMENT("NONE"),
#INIT(0),
#SRTYPE("SYNC")
);

This is the instantiation of a ODDR2 module. If you look the project files, you will notice there is no xil_ODDR2.luc file. This is because this isn't really a module, but rather an FPGA primitive. ODDR2 or Output Double Data Rate 2, is a primitive that is generally used to output data on both the rising and falling edges of the clock (hence, double data rate). However, in this case we are using the ODDR2 to simply output our clock signal. You can't output the clock signal directly due to how the FPGA is structured internally. So instead, you can use the ODDR with the data pins wired to 0 or 1.

When C0 has a rising edge, D0 is output until C1 has a rising edge. At that point D1 is output. Notice that in our case C1 is actually just the clock inverted. That means D0 is output when the clock rises and D1 is output when the clock falls.

You may be thinking now "Ok... but if D0 is output when the clock rises, shouldn't D0 be 1 and D1 be 0"? Very good my young padawan. That is exactly right if you wanted to output to be the same as the clock. However, we don't want this. We want the output clock to be our clock inverted!

Why the *&^$# would we want the clock to be inverted? Wouldn't that mean that the SDRAM would read it's inputs and change it's outputs on our falling edge? Oh wait... that's exactly what we want! We want this because that gives both devices half a clock cycle for their output to become stable before the other device. This all has to do with satisfying setup and hold times of both devices. If you don't know what the means, check out the External IO Tutorial.

Timing is the other hardware related issue we need to account for and we will use another FPGA primitive, the IODELAY2, to deal with it.

// The IODELAY2 is used to delay the clock a bit
// in order to align the data with the clock edge.
// These settings assume a 100MHz clock and the
// SDRAM Shield being stacked next to the Mojo.
xil_IODELAY2 iodelay (
#IDELAY_VALUE(0),
#IDELAY_MODE("NORMAL"),
#ODELAY_VALUE(100),
#IDELAY_TYPE("FIXED"),
#DELAY_SRC("ODATAIN"),
#DATA_RATE("SDR")
);

As you may have guessed from the name, the IODELAY2 block provides a delay to inputs and outputs. In this case we are using it to delay the clock being output to the SDRAM. There are a lot of features of these primitives that aren't being used here. However if you want t check them out in their full glory, take a look at the UG381 document from Xilinx (ODDR2 starts on page 62 and IODELAY2 starts on page 74).

We need the delay because simply inverting the clock doesn't quite ensure timing is met. We need to shift it a little more.

The important values here are DELAY_SRC is set to make the IODELAY2 delay an output and ODELAY_VALUE is how much we want to delay the signal.

The actual amount of delay that is given per step of ODELAY_VALUE is a bit fuzzy and will actually vary over temperature and voltage in the Spartan 6 chip. However, with a 100MHz clock, using a delay of 100 (maximum is 255) ensures that the setup and hold times are being met. This delay was found empirically by running lots of tests checking for read/write errors.

The last piece to the puzzle is making sure that the input and output registers are packed into IOBs, or Input Output Buffers.

The dff type has a parameter, IOB, that, when set to 1, will mark that flip-flop to be packed into an IOB.

What the heck is an IOB? An IOB is simply a flip-flop that is embedded in the pin of the FPGA. They aren't in the typical FPGA fabric, but rather right at the inputs and outputs.

We want to make sure these registers are packed into IOBs to ensure that there are no additional delays due to the signal needing to propagate through the FPGA.

To make sure these registers are actually packed into the IOB, their output/input can't connect to anything other than the top level output/input. If you tried to read these signals in some other part of your design, the tools would be forced to pull the flip-flop out of the IOB, possibly messing up timing. This is why it is important that these signals go directly to the top level inputs/outputs.

Xilinx Primitives

At the time of writing this, the IODELAY2 and ODDR2 are the only primitives currently supported by the Mojo IDE. All the supported primitives can be found by typing xil and the auto-complete will list the known modules (the primitives are always prefixed with xil_). More primitives will be added over time.

This sums up how the controller works, but now we need to use it for something.

Using the Controller

What good is a fancy SDRAM controller if we don't even use it? NO GOOD that what! To demonstrate how to use the controller we are going to create a tester. Our module will write a bunch of stuff to the RAM then read it back to make sure the contents are still there and correct.

There is one big problem with creating a tester like this. What do we write to the RAM? It has to be something easily generated because we don't have enough memory to memorize all the values. If we did we wouldn't be using the SDRAM. We could use part of the address, but this causes a very artificial pattern that can fail to detect some problems.

Instead we will use a pseudo-random number generator. The key word there is pseudo. Which is layman's terms translates to not-really-a-random number generator. This is something that generates random-looking numbers but they are actually completely predictable. That's a great property for us because we need to be able to regenerate the exact same 8,388,608 long sequence of numbers to verify our write.

From the components library add Math/Pseudo-random Number Generator to your project.

This algorithm is called Xorshift and it simply is a ported version of one presented on Wikipedia.

This module will generate a new number each time next is high. It can be reset to start the sequence over. If the value of seed changes, the sequence will be different.

This type of number generator is great for hardware because it only uses XOR and shift operations. Both of which are really cheap. However, it isn't a super great random number generator and should not be used for crypto purposes where it isn't good enough to look random.

The Memory Interface

Before we get into our tester module, we need to understand the interface used for reading and writing the SDRAM. Take a look at memory_bus.luc.

The interface consists of a master and a slave. The slave in this case is the SDRAM controller (the one receiving commands) and we will play the role of the master by issuing commands.

Whenever we want to issue a command, we need to first make sure that slave.busy is 0. This indicates that the controller can accept a new command.

To issue a write command we set master.write to 1, master.addr to the address we want to write to, master.data to the value we want to write, and finally master.valid to 1 to indicate a new command.

To perform a read we set master.write to 0, master.addr to the address to read, and master.valid to 1. The value of master.data is ignored. We then need to wait for slave.valid to be 1. When it is 1, slave.data is the value we requested. Note that slave.busy may go back to 0 before the read is actually complete. This is because the busy flag only says when the controller can accept a new request, not necessarily when it idle. If you issue multiple read requests, they will be processed in the order they are received.

Our tester has two states, WRITE and READ. We start in the WRITE state and fill up the RAM with random stuff. Once the RAM is full, we reset the number generator and move to the READ state.

In the READ state we read each value back and generate the same sequence of numbers again. If the values we read back don't match the number in our sequence, we increment the error counter. The error counter is setup to saturate at 127 error so if there are a ton of errors it will simply max out.

We need to be able to see what our tester is doing so we will use the LEDs to show the status. We hook up leds[7] to the state (so we know when it's reading or writing) and the rest to the error counter.

Generating the Clock

If you've been paying attention (you have haven't you?) you probably noticed that the SDRAM controller says it assumes a clock of 100MHz. However, the Mojo's clock is only 50Mhz. Whatever will do? Luckily the FPGA has a super rad circuit called a PLL that lets you generate new clocks. Even more rad is that there are tools to help us set it up.

We are going to be using the Core Generator tool from Xilinx. Support for this tool is built into the Mojo IDE, so simply click Project->Launch CoreGen.

Under FPGA Features and Design/Clocking double click on Clocking Wizard.

You're a clocking wizard Harry!

Change the name to just clk_wiz because the default is UGLY. Also uncheck Phase alignment (we don't care about that) and set the primary input clock to 50MHz.

On the next page you shouldn't have to change anything as CLK_OUT1 is already set to generate 100MHz.

On page 3, uncheck everything because again, we don't care.

Skip page 4 and on page 5, remove the 1 from the signal names. We only have one input and one output so why bother labeling them 1?

Finally, click Generate.

Once it finishes generating the core, you can close all the CoreGen windows. The core should automagically (it's a word, trust me) be under the Cores section of your project.

The Top Level

Now that we have all the pieces we need to hook it all up.

If you take a look at the sdram_shield.ucf file we added in the beginning of the tutorial, you'll notice that there are only two signals defined.

You should be able to build your project now. Stack your SDRAM Shield onto your Mojo and load the project! If everything went well, you should see the left-most LED blinking and the other 7 off (no errors). Each time the LED blinks, 32MB of data was written and read back from the SDRAM!