Wednesday, July 21, 2010

I've used Altera a few times in the last 10 years. Wrote a few FPGAs, used Quartus, SignalTap, and Megawizard. But I've never had the chance to really compare Xilinx and Altera. Now I've actually had the opportunity to do a side by side comparison of Xilinx's and Altera's PCI Express cores.

Here it is: I was comparing Xilinx and Altera's PCIe root complexes to see which one simulates faster. First challenge was to create the cores, and learn how to instantiate them. I used Virtex 6 and Stratix IV as the FPGAs to instantiate for. I used the vendor provided cores, and not the 3rd party ones. I started with Xilinx.

Xilinx's instantiation wizards are a bit ugly, but they create the core well enough. For Xilinx I started by checking if there were any compile time defines I needed, and it looked like I didn't need anything special. So I found the top level file for the core, and started to manually instantiate it in a basic top_tb.v file. I then started to compile just to see what I'd need to get the simulation moving. There were a bunch (if not all) of the files in one directory which I needed, and I also needed the secureip and unisims libraries. Once I got the simulation to load, I went in to my top_tb, and added in the reference clock as needed. I then saw the tx p/n ports start wiggling as expected. For Xilinx it was easy to see that I needed a parameter to put it into simulation mode - a mode which shortens certain processes in the physical level training algorithm. At this point, I connected my endpoint and saw the link come up very quickly.

Now on to Altera.

Altera's instantiation wizards are much nicer. They also create the core as expected. I began in the same way as Xilinx by going through and checking if I needed special compile time defines. Like Xilinx, there weren't any. What I did notice which surprised me was the use of defparams, but that's not a problem b/c it's inside their code and doesn't affect me. I now took the top level module and began instantiating it in a new top_tb.v. Altera's core required me to compile altera_mf, sgate, 200model, stratixiv_hssi_atoms, stratixiv_pcie_hip, and some others which I can't recall at the moment. I also had to compile a number of files that were in the core's tree. Once I got all the modules needed to load the simulation, I was able to run. Altera's core loads up with a large number of mis-matched port connections. Altera's core didn't bother to connect many output ports in many places (presumably b/c they weren't using them). They also connected some ports with incorrect sizes. This fills the Modelsim console with a bunch of warnings. At this point I added the clock to top_tb.v, and set the lowest bit of test_in to 1 as appropriate for simulations. I expected to quickly see tx p/n to start wiggling. I saw the tx p/n pins wiggle after a very long time. I didn't bother to connect the endpoint.

Now for the important stuff:

Altera's simulations are slow as molasses!

Altera's PCIe core runs many times slower than Xilinx's or Lattice's. I have run all three and I was shocked. Truth is at first I thought Altera's core wasn't working until I realized that I had to give it more time. That is how slow it was. At least 2x slower if not closer to 4x slower than either of the other 2 vendors that I've worked with.

Altera's core is confusing and unintuitive!

I was trying to find the signal that says that the transaction layer is ready. Something like Lattice's dl_up, and Xilinx's trn_dl_up. There is no such thing in Altera's core. To see that Altera's core is up you must look at the ltssm status. The core provides multiple clocks that you need to set, many different resets, a test_in and test_out array, and in general lots of signals that are shockingly unclear.

Altera's port list is very old style, and separates the ports based on input and output!

The top level port list is old Verilog style. I don't have any problem with that of course. What is disconcerting is that the list has 2 sections, inputs and outputs. This is completely unhelpful. Xilinx's core has the ports separated by function: Common, TX, RX, CFG ... This is how I would have liked Altera to do it. Perhaps there's a reason for this, but it seems to follow the same paradigm of providing a very unfriendly RTL.

Altera's provided simulation is a mess!

I tried to understand the paradigm of Altera's PCIe core by looking at the simulation, and I can only use the word appalled. The method they use to reset the simulation, is some form of counter at which point they start things running. I was looking for this to find Altera's recommended method of seeing if the transaction layer is up. There was nothing to find.

I finally gave up on Altera, not because I couldn't get it working, but because it was not nearly worth the effort. Their core is a huge mess. I'm sure it works well enough in synthesis, but I wouldn't bother trying to run it in simulation. Even when using their Modelsim script to run simulation, you get the Modelsim warnings of unconnected ports. They should be embarrassed to release something like this. If only their IP team that provides these cores could learn from the GUI team. Very nice UI, horrible RTL.

I'm sure I could go on and on, but suffice it to say that for the advanced user who likes to read the code created, Xilinx is much better. Also for anyone who desires to simulate efficiently, Xilinx is much better. I have never been so shocked by a vendor before, especially one that is purported to be the most user friendly of the vendors. Both Lattice and Xilinx win against Altera.

As a side-note relating to Altera, it seems these types of issues have existed long before their PCI Express core. I have spoken with some very long-time Altera users, and it is known that their FIFOs simulate horrifically slow, and their old PCI core user-side was a mess too. Perhaps its time for Altera to heed this call and take the necessary steps to remedy this situation.