Posted
by
Soulskill
on Wednesday January 08, 2014 @03:33PM
from the somewhere-between-zero-and-lots dept.

dryriver writes "We have developed a graphics algorithm that got an electronics manufacturer interested in turning it into hardware. Here comes the problematic bit... The electronics manufacturer asked us to describe how complex the algorithm is. More specifically, we were asked 'How many (logic) gates would be needed to turn your software algorithm into hardware?' This threw us a bit, since none of us have done electronics design before. So here is the question: Is there a piece of software or another tool that can analyze an algorithm written in C/C++ and estimate how many gates would be needed to turn it into hardware? Or, perhaps, there is a more manual method of converting code lines to gates? Maybe an operation like 'Add' would require 3 gates while an operation like 'Divide' would need 6 gates? Something along those lines, anyway. To state the question one more time: How do we get from a software algorithm that is N lines long and executes X number of total operations overall, to a rough estimate of how many gates this algorithm would use when translated into electronic hardware?"

it would only make sense to reuse the same adder circuit for each addition, instead of making a separate adder circuit for each operation.then you'd add control logic to move the data to adder circuits, multiplier circuits, etc.then essentially what you have is a microprocessor.then you just turn that microprocessor into the simplest one possible. which is basically a queue and a stack, and a few elementary logic operations. you can do operations a bit at a time.so the number of logic gates your program needs it the number to make a queue and a stack, and a few elementary logic operations, and that's probably on the order of about 40 gates.

if you only need a estimation, use something like bamboo from PandA to convert your C Code to Verilog. Then synthesize this code for a FPGA. In the summery you should find how many logic cells would be used as well as how many digital gates in an asics are necessary. This value is only a estimation, but for your question, this should work.

Oh, one more thing about "C to Gates" compilers. In the industry I have not seen one in actual use, but they do supposedly exist. However, they would only work in a limited domain.

For example, if you have C++ that does simple control or DSP-type stuff, then it might work (cannot vouch for the quality of the results). On the other hand, if you get one of these compilers and try feeding it the source code for the Apache web server or the Quake engine source code, you are completely screwed.

If your application is, say, a novel type of network filter that inspects and does something to Ethernet packets, you have to figure out how to interface your design with a real Ethernet SerDes.. which is a *LOT* different than opening up something in the "/dev/" directory. If your application is robotics, then you also need to get data into and out of the chip. How exactly is this done? How fast does the logic need to run? Is it speech processing? If so, then this will involve a lot of straight-forward DSP. If you constrain the design to tell it how fast the data needs to flow through, you should be able to get a reasonable estimate. Does your application need a lot of memory? If so, you might need some type of RAM controller. DRAM controllers can be hairy to work with, and you also have to consider latency and throughput.

In theory, C to gates can work quite well, ***for a limited subset of applications***.

HOWEVER: as others have pointed out, anybody who needs to know the answer to this question should be qualified to answer it for themselves.

Actually, that depends on what the 24,000 transistors are doing. Let's assume that you stupidly did a divide using Verilog "/". This implies a one-cycle divide which might well take that many transistors. The problem is that you would not likely be able to get this to work in real life. With so many levels of logic, your timing would be pure crap. Plus you might have fanout and congestion issues that would further limit your timing. So you could get a divide in one clock cycle, but limit yourself to a clock speed of 10 MHz, for example.

Once you get past about 10 or 12 levels of logic (in my opinion), it is time to re-code, no matter what your clock speed is. If you can't get the job done in 12 level, it is time to re-think your approach. Register re-timing can certainly be useful, but it is much better to do the job right in RTL, the way God intended. Register re-timing can make later steps more complicated (including formal verification).

It's also ill-formed (to the point of being almost meaningless) in the sense that the smallest number of gates for a given algorithm is probably going to be to implement some kind of low-end processor which then runs the algorithm as code.

What they really wanted to ask was "what's the best price/performance option for executing this algorithm, given the following expected parameters and an initial production run size of X".