Alchitry

Navigation: Main menu

In this tutorial we will be making some modifications to the Hello World! project from the last tutorial so make sure you have read the ROMs and FSMs tutorial (Alchitry, Mojo) first. We will be personalizing the greeter so that it first asks for your name and then prints "Hello NAME" where NAME is the name you entered. To do this we will need some form of memory and in this case we will use a single port RAM.

With the project open from the last tutorial, you can make a copy to edit for this tutorial by going to File->Clone Project. Enter a new name in the dialog that pops up and click Create.

The RAM

We need to add the RAM component to our project. Go to Project->Add Components... and under Memory check off Simple RAM.

Note that this component is written in Verilog instead of Lucid. This is because the tools that actually build your project can be very picky when it comes to deciding if something is a block of RAM or not. By using this module we can ensure that our RAM is properly recognized as RAM. This is important because FPGAs actually have dedicated block RAM (also known as BRAM). If your RAM is big enough, the tools will use BRAM to implement it instead of the FPGA fabric. Using BRAM is both substantially faster and smaller than the FPGA fabric.

A single port RAM like this works much the same as the ROM from the last tutorial. However, we now have the option to write to an address instead of only reading. To write to an address, we simply supply the address and data to write then set write_en to 1. The data at that address will then be updated to whatever write_data is.

The parameters SIZE and DEPTH are used to specify how big we want the RAM to be. SIZE specifies how big each entry is. In our case we will be storing letters and a letter is 8 bits wide so SIZE will be set to 8. DEPTH is used to specify how many entries we want. This will be the maximum name length we can accept.

The Greeter (revisited)

Just like the last tutorial we will have a greeter module. The interface to this module is exactly the same as before but it is now a bit more mannered and will greet you personally.

Like most tutorials, I'll post the entire module here and then break it down.

No More ROM

So unlike last tutorial, we aren't going to use an explicit ROM module. This is because some convenient features of Lucid allow us to easily use constants with strings as ROMs. Let us take a look at the constant declaration.

Here we are using a function called $reverse(). This function takes some constant expression and reverse the order of the top most dimension of the array. Since strings are 2D arrays with the top most dimension being the letter order, this is exactly the same as typing the string backwards like we did in the last tutorial. This is just a little bit cleaner and easier to deal with.

Because strings are 2D arrays, we can simply use HELLO_TEXT[i] to access the ith letter of it.

Note that we are using the @ symbol in place of a name. This will signal to our design where to insert the name that was recorded.

Modules and DFFs

Just like before we have an FSM state. This will store the current state of our module. IDLE is where we will start and it will initialize everything. PROMPT will print the prompt asking for your name. LISTEN will listen to you type your name and echo it back. Finally, HELLO will greet you personally.

We need counters to keep track of what letter in each ROM we are currently positioned.

Let us take a look at hello_count. We need it to be wide enough so that we can index all the letters in HELLO_TEXT. We can get how many letters there are in the string by using the WIDTH attribute. Because HELLO_TEXT is a multi-dimensional array (2D in this case), WIDTH will be a 2D array. The first index of WIDTH is the number of indices in the first dimension of HELLO_TEXT. This is the number of letters. So we simply use HELLO_TEXT.WIDTH[0]. Note that the second dimension has a width of 8 since each letter is 8 bits wide.

We can then use the $clog2() function as before to make sure it is large enough to store values from 0 to HELLO_TEXT.WIDTH[0]-1.

Next take a look at name_count. This will be used to index into the RAM. We can set this width to be whatever we want, but the size of the RAM will grow exponentially with it. I set it to 5 which will allow for a name of 25, or 32 letters long. We will play with this towards the end of the tutorial.

We need the size of the RAM to match the size of name_count.

simple_ram ram (#WIDTH(8), #DEPTH($pow(2,name_count.WIDTH)));

Here we are using the function $pow() which takes two constants and returns the first to the power of the second. In this case, name_count.WIDTH is 5, so 25 is 32. By using name_count.WIDTH instead of typing in 5 or 32 directly, we ensure that if we change the width of name_count then everything will still work.

The FSM

The IDLE and PROMPT states should look very familiar to the last tutorial so we will jump to the LISTEN state.

Here we wait until new_rx is 1. This signals that we have a new byte to process and that the data on rx_data is valid. We then write rx_data into our RAM. We are writing to the address specified by name_count.q as ram.address is set to this in the beginning of the always block.

We also need to send the character we received back so that you can see your name as you type it. We simply set new_tx to 1 and tx_data to rx_data. Note that we aren't checking tx_busy so it is possible this byte will be dropped. However, in practice you can't type fast enough for this to be an issue. If you wanted to make this more robust you would need to buffer the received letters and send them out only when tx_busy was 0.

The if statement is used to know when to stop. We have two conditions to stop on. The first is if we simply run out of space. To check of this we use &name_count.q. The & operator here ands all the bits of name_count.q together into a single bit. This tells us if all the bits of name_count.q are 1. The second condition is that the user pressed the enter key. We want to accept "\n" or "\r" as a stop character so we check for both.

When we are moving onto the next state, notice that we reset name_count. This is so that we can start printing the name from the beginning.

// HELLO: Prints the hello text with the given name inserted
state.HELLO:
if (!tx_busy) { // wait for tx to not be busy
if (HELLO_TEXT[hello_count.q] != "@") { // if we are not at the sentry
hello_count.d = hello_count.q + 1; // increment to next letter
new_tx = 1; // new data to send
tx_data = HELLO_TEXT[hello_count.q]; // send the letter
} else { // we are at the sentry
name_count.d = name_count.q + 1; // increment the name_count letter
if (ram.read_data != "\n" && ram.read_data != "\r") // if we are not at the end
new_tx = 1; // send data
tx_data = ram.read_data; // send the letter from the RAM
// if we are at the end of the name or out of letters to send
if (ram.read_data == "\n" || ram.read_data == "\r" || &name_count.q) {
hello_count.d = hello_count.q + 1; // increment hello_count to pass the sentry
}
}
// if we have sent all of HELLO_TEXT
if (hello_count.q == HELLO_TEXT.WIDTH[0] - 1)
state.d = state.IDLE; // return to IDLE
}

In this state, we are going to use two counters, hello_count and name_count. First we will start by sending each letter of HELLO_TEXT. However, once we hit the "@" letter we will send all the letters in our RAM. Once that is done, we will finish sending the rest of HELLO_TEXT.

Once everything has been sent, we return to the IDLE state to await another key press to start it all over again.

The Top Level

The top level tile file is exactly the same as last time since the interface to our greeter module is the same.

Building the Project

You should now be all set to build the project. Once the project has build successfully, load it onto your board and open up the serial port monitor to test it out. Note that you have to send it a letter to get it to prompt you for your name.

Notice that the moment you type 32 letters it cuts you off and says hello.

The rest of the tutorial is based on the Mojo and ISE.

For the Cu, iCEcube 2 shows similar (although somewhat simplified) statistics. It also seems to pack the RAM into BRAM even when it is only 32 entries deep. Changing it to 1024 (making name_count 10 bits wide) entries deep will cause it to use two BRAMs. This is listed under Device Utilization Summary in the build output.

For the Au, Vivado also shows similar statistics. Look for the table labeled Report Cell Usage in the synthesis output. You will notice it is using eight of something called RAM32X1S. This is a 32x1bit RAM that fits our small RAM perfectly! It isn't BRAM but a special slice that can be confirmed as a tiny RAM (or generic logic). If you increase the RAM size, you'll notice it switches to using RAMB18E1 which is larger more flexible BRAM. See this document for more info.

Once you've played with it a bit, look back at the output from the build. If you scroll up a bit from the bottom you should find something that looks like the following.

Device Utilization Summary:
Slice Logic Utilization:
Number of Slice Registers: 96 out of 11,440 1%
Number used as Flip Flops: 96
Number used as Latches: 0
Number used as Latch-thrus: 0
Number used as AND/OR logics: 0
Number of Slice LUTs: 163 out of 5,720 2%
Number used as logic: 157 out of 5,720 2%
Number using O6 output only: 123
Number using O5 output only: 12
Number using O5 and O6: 22
Number used as ROM: 0
Number used as Memory: 5 out of 1,440 1%
Number used as Dual Port RAM: 0
Number used as Single Port RAM: 4
Number using O6 output only: 0
Number using O5 output only: 0
Number using O5 and O6: 4
Number used as Shift Register: 1
Number using O6 output only: 1
Number using O5 output only: 0
Number using O5 and O6: 0
Number used exclusively as route-thrus: 1
Number with same-slice register load: 0
Number with same-slice carry load: 1
Number with other load: 0
Slice Logic Distribution:
Number of occupied Slices: 60 out of 1,430 4%
Number of MUXCYs used: 16 out of 2,860 1%
Number of LUT Flip Flop pairs used: 174
Number with an unused Flip Flop: 84 out of 174 48%
Number with an unused LUT: 11 out of 174 6%
Number of fully used LUT-FF pairs: 79 out of 174 45%
Number of slice register sites lost
to control set restrictions: 0 out of 11,440 0%
A LUT Flip Flop pair for this architecture represents one LUT paired with
one Flip Flop within a slice. A control set is a unique combination of
clock, reset, set, and enable signals for a registered element.
The Slice Logic Distribution report is not meaningful if the design is
over-mapped for a non-slice resource or if Placement fails.
IO Utilization:
Number of bonded IOBs: 22 out of 102 21%
Number of LOCed IOBs: 22 out of 22 100%
Specific Feature Utilization:
Number of RAMB16BWERs: 0 out of 32 0%
Number of RAMB8BWERs: 0 out of 64 0%
Number of BUFIO2/BUFIO2_2CLKs: 0 out of 32 0%
Number of BUFIO2FB/BUFIO2FB_2CLKs: 0 out of 32 0%
Number of BUFG/BUFGMUXs: 1 out of 16 6%
Number used as BUFGs: 1
Number used as BUFGMUX: 0
Number of DCM/DCM_CLKGENs: 0 out of 4 0%
Number of ILOGIC2/ISERDES2s: 0 out of 200 0%
Number of IODELAY2/IODRP2/IODRP2_MCBs: 0 out of 200 0%
Number of OLOGIC2/OSERDES2s: 0 out of 200 0%
Number of BSCANs: 0 out of 4 0%
Number of BUFHs: 0 out of 128 0%
Number of BUFPLLs: 0 out of 8 0%
Number of BUFPLL_MCBs: 0 out of 4 0%
Number of DSP48A1s: 0 out of 16 0%
Number of ICAPs: 0 out of 1 0%
Number of MCBs: 0 out of 2 0%
Number of PCILOGICSEs: 0 out of 2 0%
Number of PLL_ADVs: 0 out of 2 0%
Number of PMVs: 0 out of 1 0%
Number of STARTUPs: 0 out of 1 0%
Number of SUSPEND_SYNCs: 0 out of 1 0%

This tells you how much of the FPGA your design is using. The two most important numbers are typically the slice register and slice LUT usage. You can see in our case we are using about 2% of the space in the FPGA!

The reason we are looking at this is to see how the RAM was implemented in the FPGA. Remember the FPGA has blocks of RAM that we can use? These are shown under Specific Feature Utilization. RAMB16BWER and RAMB8BWER are the two types of BRAM we can use. But wait! We aren't using any! This is because our RAM is too small to warrant its own BRAM.

If we go back to where name_count is declared and make it bigger, we can increase the RAM size.

dff name_count[8]; // 8 allows for 2^8 = 256 letters

If you build the project again with the bigger RAM, you will get the following.

Device Utilization Summary:
Slice Logic Utilization:
Number of Slice Registers: 90 out of 11,440 1%
Number used as Flip Flops: 90
Number used as Latches: 0
Number used as Latch-thrus: 0
Number used as AND/OR logics: 0
Number of Slice LUTs: 144 out of 5,720 2%
Number used as logic: 140 out of 5,720 2%
Number using O6 output only: 107
Number using O5 output only: 12
Number using O5 and O6: 21
Number used as ROM: 0
Number used as Memory: 1 out of 1,440 1%
Number used as Dual Port RAM: 0
Number used as Single Port RAM: 0
Number used as Shift Register: 1
Number using O6 output only: 1
Number using O5 output only: 0
Number using O5 and O6: 0
Number used exclusively as route-thrus: 3
Number with same-slice register load: 2
Number with same-slice carry load: 1
Number with other load: 0
Slice Logic Distribution:
Number of occupied Slices: 48 out of 1,430 3%
Number of MUXCYs used: 16 out of 2,860 1%
Number of LUT Flip Flop pairs used: 152
Number with an unused Flip Flop: 70 out of 152 46%
Number with an unused LUT: 8 out of 152 5%
Number of fully used LUT-FF pairs: 74 out of 152 48%
Number of slice register sites lost
to control set restrictions: 0 out of 11,440 0%
A LUT Flip Flop pair for this architecture represents one LUT paired with
one Flip Flop within a slice. A control set is a unique combination of
clock, reset, set, and enable signals for a registered element.
The Slice Logic Distribution report is not meaningful if the design is
over-mapped for a non-slice resource or if Placement fails.
IO Utilization:
Number of bonded IOBs: 22 out of 102 21%
Number of LOCed IOBs: 22 out of 22 100%
Specific Feature Utilization:
Number of RAMB16BWERs: 0 out of 32 0%
Number of RAMB8BWERs: 1 out of 64 1%
Number of BUFIO2/BUFIO2_2CLKs: 0 out of 32 0%
Number of BUFIO2FB/BUFIO2FB_2CLKs: 0 out of 32 0%
Number of BUFG/BUFGMUXs: 1 out of 16 6%
Number used as BUFGs: 1
Number used as BUFGMUX: 0
Number of DCM/DCM_CLKGENs: 0 out of 4 0%
Number of ILOGIC2/ISERDES2s: 0 out of 200 0%
Number of IODELAY2/IODRP2/IODRP2_MCBs: 0 out of 200 0%
Number of OLOGIC2/OSERDES2s: 0 out of 200 0%
Number of BSCANs: 0 out of 4 0%
Number of BUFHs: 0 out of 128 0%
Number of BUFPLLs: 0 out of 8 0%
Number of BUFPLL_MCBs: 0 out of 4 0%
Number of DSP48A1s: 0 out of 16 0%
Number of ICAPs: 0 out of 1 0%
Number of MCBs: 0 out of 2 0%
Number of PCILOGICSEs: 0 out of 2 0%
Number of PLL_ADVs: 0 out of 2 0%
Number of PMVs: 0 out of 1 0%
Number of STARTUPs: 0 out of 1 0%
Number of SUSPEND_SYNCs: 0 out of 1 0%

Notice that we are using a RAMB8BWER now. Also notice that the number of registers and LUTs we are using went down. This is because we are using the BRAM instead of the general fabric.

This is why it is important to use the simple_ram component that implements the template the tools look for. If we used a different coding style the tools may not recognize that it could use BRAM and we could quickly fill up the FPGA with a large RAM that would otherwise take very little space.