How parallelism in project management, synthesis and processing resources can accelerate FPGA-based design

The headline figures for the latest release of the Synplify Pro and Synplify Premier FPGA synthesis software tools say that they can deliver up to 3X faster runtime and a 10% improvement in timing quality of results. But the way those numbers are achieved is, in some ways, as interesting and useful for planning projects as the results themselves.

A number of features have been added to Synplify and others enhanced, each of which adds an incremental lift to productivity that is then consolidated in the whole.

This article discusses some of those features, in terms of their performance, and how they lift productivity in project management.

There are two main themes. First, how to divide and conquer complex projects to speed them on their way. Second, how – either within the main project or a series of subprojects – to exploit new tools and multi-processing, at the desktop or on a server farm.

For those who are focused on FPGA, we will look at some key technology features that have been added to and/or evolved within Synplify. These include a much greater exploitation of multi-processing. This allows for the synthesis, mapping, partitioning and place & route stages to be completed in hours, even for large designs.

In this article, we will look at three specific techniques:

Hierarchical project management (HPM), which enables the subdivision of major projects into discrete, manageable and incremental units.

Distributed Synthesis, which exploits multi-processing and other features.

The Common Distributed Processing Library (CDPL), a Synopsys library that enables users to take the greatest advantage of their licenses and computing resources.

HPM makes a useful starting point as it explains how to split up a design in process so that you can then apply to it the efficiencies available from, say, multi-processing.

Hierarchical project management for FPGA synthesis

The management of any major design project needs to consider Brooks’ Law, which states that adding manpower to a project that is already late makes it later.

Fred Brooks said that delays will lengthen for two main reasons.

The first is ‘ramp-up’ time, the time it takes for new team members to become comfortable with the technology and challenges when they join a project.

The second is ‘combinatorial explosion’. This is where more people – and by extension, more tasks – creates more communication channels and, for a design, more risk of a domino effect, in which a change to one part of the design ripples through others being undertaken by separate engineering teams.

HPM directly responds to combinatorial explosion. It is a ‘divide and conquer’ strategy enabled within Synplify, based on splitting up tasks into discrete blocks, or ‘subprojects’. Consider Figure 1:

Using HPM, Tom, the project architect, has created five ‘subprojects’ that comprise the target design, and assigned them to four engineers – Ivan owns one sub project that applies to two instances of the same module.

Each project can be worked on independently and in parallel and there is ample flexibility available when the architect decides where to set the boundaries, and what format they take.

For example, a subproject can be either instance- or module-based. Subprojects can also be either ‘top-down’, passing the post-compile RTL netlist result (SRS) consisting of generic technology-independent primitives up to the top-level project; or they can be ‘bottom-up’, passing the final post-map gate-level netlist (EDIF) consisting of technology specific primitives up to the top level project.

The advantages here go beyond the divisions. Each subproject holder can use various tools that allow for incremental development using both express features (e.g. Fast Compile) and others that require reiterations that only cover an altered portion (e.g. Compile Points).

Similarly, Tom the architect can take projects back up to the main level for a synthesis or constraint check both during and at the end of the design. Where part of a subproject is still undergoing some editing, a ‘completed’ version of the work to date can be passed back to the top level while a separate file is still being edited in by the subproject holder. Alternatively, if a subproject is not ready when Tom wishes to attempt an initial synthesis, he can turn off that portion of the design and run the others.

The GUI makes this process as easy and transparent as possible (Figure 2).

Figure 2 The HPM GUI (Source: Synopsys)

Runtime acceleration with Distributed Synthesis

A recently added feature of Synplify is Distributed Synthesis, which provides designers a method to accelerate runtimes, reduce memory requirements and enable a server farm environment. This is achieved by dividing the design into many RTL partitions enabling parallel processing, enabling the tool to only resynthesizes those parts of the design that have been changed and harnessing the available computing resources on a single machine or distributed on multiple machines.

Distributed Synthesis works within a revised version of the traditional Synopsys bring-up flow, which has three main steps: Compile, Pre-Map and Map. With Distributed Synthesis, the new flow has four main steps: Compile, On Demand Constraints (ODC), Global Prep, and Map (both ODC and Global Prep here replace different aspects of Pre-Map) (Figure 3).

Figure 3 Four steps to bring up a Distributed Synthesis job (Source: Synopsys)

The ODC step formalizes the process, so that when a constraint is applied to an object, that constraint loads only the relevant group. This saves not just time but also memory.

The Distributed Synthesis features within the new release of Synplify show most clearly how the use of multi-processing can lead to significant overall productivity improvements.

However, we should first briefly note what some of those other new features are, and how they also extract benefits from multi-processing and/or reduce the degree to which any design or subprojects within it need to be rechecked or resynthesized. Four stand out.

Continue on Error: Rather than stopping a compile at each error, this feature can be turned on so that Synplify continues to check a design to the end and produces a full list of problems within the portion being checked.

Compile Points: This is the ability during mapping and in a multi-processing environment to define subsections of your RTL – at main project or subproject level - so that when each is incrementally changed the rest of the design does not need to be resynthesized (even at the subproject level, there are likely to be several iterations). Compile Points can be applied automatically, manually, or as a mixture of both, with manual points set for particularly sensitive points of the project.

Fast Synthesis: This skips some optimizations for area and timing (timing constraints, meanwhile, are only used for timing analysis). It is useful for an initial flow check, to ensure that your code is fully synthesizable, has properly applied constraints, and is fully annotated for place and route (P&R). It can synthesize a design 3-4X faster than a full run.

Constraints Checker: This can be used at both the main and the sub-project level to check the constraint file (.fdc) after entry and editing. It can be run without the need for full mapping and confined to an area that needs to be checked or rechecked.

Looking in more detail at Distributed Synthesis, there is an option for designers to enable a single portion; Distributed Compile. In many ways, Distributed Compile (Figure 4) can be seen as extending the advantages previously offered by Compile Points.

The objective is to partition the target RTL or IP and ideally break it up automatically into Compile Groups. The idea is to allow either the architect or the subproject holder to recompile the design as it evolves most efficiently.

Figure 4 Distributed compilation in Synplify (Source: Synopsys)

Distributed Compile means when there is a change to the RTL, previously formed Compile Groups that are unaffected and do not need to be recompiled.

However, Distributed Compile now more aggressively leverages multi-processing at the desktop and across server farms, to a point where it can cut run times by up to 50%.

The objective of Distributed Synthesis is to limit the size of synthesis iterations so that the tool only resynthesizes those parts of the design that have been changed. The re-synthesis time is further reduced by harnessing the available computing resources.

Distributed Synthesis works within a revised version of the traditional Synopsys bring-up flow, which has three main steps: Compile, Pre-Map and Map. With Distributed Synthesis, the new flow had four main steps: Compile, On Demand Constraints (ODC), Global Prep, and Map (both ODC and Global Prep here replace different aspects of Pre-Map).

The ODC step formalizes the process, so that when a constraint is applied to an object, that constraint loads only the relevant group. This saves not just time but also memory.

Distributed Compile and Distributed Synthesis can make the best use of multi-processing partly because it uses Synopsys’ Common Distributed Processing Library (CDPL), which is used beyond FPGA synthesis and offers more generic benefits for design managers.

Common Distributed Processing Library

CDPL can help when the progress of a project is limited by the performance of the machine on which the work on it is done.

Figure 5 Outline of CDPL (Source: Synopsys)

CDPL is a library common to many Synopsys tools and helps users find the best available computing resources. This may mean using another machine at your location, or portioning out the work to multiple machines on a server farm (Figure 5).

The key limitations in finding the best configuration are the number of machines to which the team has access, and the number of tool licenses it holds. Otherwise, CDPL capability is simple to set up (Figure 6).

Figure 6 Setting up a CDPL facility (Source: Synopsys)

Given the importance of time-to-bring-up in efficient PFGA project management, consider this graph (Figure 7) showing the combination of CDPL with Distributed Synthesis for a 1.8 million LUT design. The run time improvement is in the region of 80%.

Conclusion

Today’s designs are so complex that few tools can offer to increase productivity by virtue of one new function or algorithm. The tools must improve on numerous fronts, and work within an ecosystem that enables users to get as much as possible out of their computing resources. The sum of all the incremental improvements can still add up to a great deal, as demonstrated by the new release of Synplify.

Another aspect of design efficiency is ‘human parallelism’, the ability to enable lots of people to work together efficiently. Brooks Law shows that just adding resources to a problem may not solve it – instead, complex tasks need to be divided so that they can be worked on by small teams of people with the right skills.

Design managers know this and it is therefore only fair for them to ask how well their tools enable them to do this. With its HPM capabilities, Synplify answers that question so that, not only parallel machine processing but, perhaps even more importantly, human parallelism becomes another viable increment toward greater overall efficiency.

Author

Cheong Tse is a staff corporate applications engineer for FPGA-based synthesis software tools at Synopsys. He has 20 years of experience in FPGAs. Before joining Synopsys, he worked at Lattice Semiconductor as a software engineer and at Teradyne as a hardware design engineer. He holds a BSEE from the University of California at Berkeley.