how to design this datapath unit for DSP using VHDL/Verilog?

I want to design an arithmatic datapath unit for digital signal processing
using VHDL and/or Verilog.

The input are 5 elements(either sequential or parallel) each having 8 bits.
It needs to multiply each of these 5 inputs with a predefined constant
matrix(10x10, floating point scaled and round to integer). The output will
be a 10x10 matrix summing the above five matrices up, each element having 12
bits). So for each element of the matrix, I can have a MAC unit. The
internal computation will be 16 bits.

If I put an MAC for each element, I will have a purely parallel
architecture, but I need 100 16bits MAC units, which will be too resource
consuming.

I am considering to make a parallel-serial architecture, at each time, it
outputs one row, which will be 10x12 bits... so the output will be
row-by-row.

I also need to consider to streamlize the datapath operation. Since there
will be a stream of 5 elements input in a non-stop fashion, the output will
also be non-stop streaming. So after one row is outputted, that row can be
used for computation/storage of the results for the next 5 input elements.

I am ok so far in thinking... but further thinking makes me confused and
perplexed... how to do sequential timing control(how to what to do at which
cycle)? do I need to pipelining? how to design the architecture? I mean, I
know pipelining theoratically from one semester course, but now I am going
to implement one, I am totally lost...

Advertisements

In article <bipblj$53j$>, walala <> wrote:
>Dear all,
>
>I want to design an arithmatic datapath unit for digital signal processing
>using VHDL and/or Verilog.
>
>The input are 5 elements(either sequential or parallel) each having 8 bits.
>It needs to multiply each of these 5 inputs with a predefined constant
>matrix(10x10, floating point scaled and round to integer). The output will
>be a 10x10 matrix summing the above five matrices up, each element having 12
>bits). So for each element of the matrix, I can have a MAC unit. The
>internal computation will be 16 bits.
>
>Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix
>
>Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are matrices;

What is your throughput requirement and what technology are you using?

That will determine the amount of parallelism that you need.

If the requirement is low enough, then only one MAC unit will be required.

Next, you must define the timing of the inputs. If they are serial, then
it's easy: stuff the data into the MAC unit. Being pipelined (right?),
the MAC unit will output the answer N clocks later.

If you have more parallelism in your input data than you want in your
MAC units, then you will need to buffer the data. This circuit will be
easy to design once you define the timing requirements.

Advertisements

The requirement of output throughput is 33-50MHz, i.e., it should output 33
million to 50 million 12-bits element per second,

and each 5 inputs correspond to 10x10=100 such 12-bits element outputs...

The technology I am going to use is 0.25u.

I think the inputs are naturally serial, but again, I am not sure how to do
the parallel-serial partition of the internal MACs... and how to pace the
outputs...

Seems inputs are faster than the outputs, maybe I should let the input wait
after fed into the unit?

Can you give some further advice on how to do this architecture? how to do
the timing? I think it is really difficult...and point me to some resources?

Thanks very much,

-Walala

"David Jones" <> wrote in message
news:7N14b.5257$...
> In article <bipblj$53j$>, walala <>
wrote:
> >Dear all,
> >
> >I want to design an arithmatic datapath unit for digital signal
processing
> >using VHDL and/or Verilog.
> >
> >The input are 5 elements(either sequential or parallel) each having 8
bits.
> >It needs to multiply each of these 5 inputs with a predefined constant
> >matrix(10x10, floating point scaled and round to integer). The output
will
> >be a 10x10 matrix summing the above five matrices up, each element having
12
> >bits). So for each element of the matrix, I can have a MAC unit. The
> >internal computation will be 16 bits.
> >
> >Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix
> >
> >Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are matrices;
>
> What is your throughput requirement and what technology are you using?
>
> That will determine the amount of parallelism that you need.
>
> If the requirement is low enough, then only one MAC unit will be required.
>
> Next, you must define the timing of the inputs. If they are serial, then
> it's easy: stuff the data into the MAC unit. Being pipelined (right?),
> the MAC unit will output the answer N clocks later.
>
> If you have more parallelism in your input data than you want in your
> MAC units, then you will need to buffer the data. This circuit will be
> easy to design once you define the timing requirements.

Can we assume the input are all present at once(parallel)? Since there are
only 5 inputs(5x8=40bits), is it a reasonable assumption?

"walala" <> wrote in message
news:biqil7$kf7$...
> Hi David,
>
> Thanks for your answer!
>
> The requirement of output throughput is 33-50MHz, i.e., it should output
33
> million to 50 million 12-bits element per second,
>
> and each 5 inputs correspond to 10x10=100 such 12-bits element outputs...
>
> The technology I am going to use is 0.25u.
>
> I think the inputs are naturally serial, but again, I am not sure how to
do
> the parallel-serial partition of the internal MACs... and how to pace the
> outputs...
>
> Seems inputs are faster than the outputs, maybe I should let the input
wait
> after fed into the unit?
>
> Can you give some further advice on how to do this architecture? how to do
> the timing? I think it is really difficult...and point me to some
resources?
>
> Thanks very much,
>
> -Walala
>
> "David Jones" <> wrote in message
> news:7N14b.5257$...
> > In article <bipblj$53j$>, walala <>
> wrote:
> > >Dear all,
> > >
> > >I want to design an arithmatic datapath unit for digital signal
> processing
> > >using VHDL and/or Verilog.
> > >
> > >The input are 5 elements(either sequential or parallel) each having 8
> bits.
> > >It needs to multiply each of these 5 inputs with a predefined constant
> > >matrix(10x10, floating point scaled and round to integer). The output
> will
> > >be a 10x10 matrix summing the above five matrices up, each element
having
> 12
> > >bits). So for each element of the matrix, I can have a MAC unit. The
> > >internal computation will be 16 bits.
> > >
> > >Hence for each 5 inputs x1, x2, x3, x4, x5, the output matrix
> > >
> > >Y=x1*C1+x2*C2+x3*C3+x4*C4+x5*C5 where Y, C1, C2, C3, C4, C5 are
matrices;
> >
> > What is your throughput requirement and what technology are you using?
> >
> > That will determine the amount of parallelism that you need.
> >
> > If the requirement is low enough, then only one MAC unit will be
required.
> >
> > Next, you must define the timing of the inputs. If they are serial,
then
> > it's easy: stuff the data into the MAC unit. Being pipelined (right?),
> > the MAC unit will output the answer N clocks later.
> >
> > If you have more parallelism in your input data than you want in your
> > MAC units, then you will need to buffer the data. This circuit will be
> > easy to design once you define the timing requirements.
>
>

Share This Page

Welcome to The Coding Forums!

Welcome to the Coding Forums, the place to chat about anything related to programming and coding languages.

Please join our friendly community by clicking the button below - it only takes a few seconds and is totally free. You'll be able to ask questions about coding or chat with the community and help others.
Sign up now!