We present a practical FPGA-based accelerator for solving Boolean Satisfiability problems (SAT). Unlike previous efforts for hardware accelerated SAT solving, our design focuses on accelerating the Boolean Constraint Propagation (BCP) part of the SAT solver, leaving the choices of heuristics such as branching order, restarting policy and learning and backtracking to software. Our novel approach uses an application-specific architecture instead of an instance-specific one to avoid time-consuming FPGA synthesis for each SAT instance. By careful pipelining and avoiding global signal wires, our design is able to achieve much higher clock frequency than that of previous work. Our co-processor can load SAT instances in milliseconds, can handle SAT instances with tens of thousands of variables and clauses using a single FPGA, and can easily be scaled-up to handle more clauses by using multiple FPGAs. Our evaluation using a cycle-accurate simulator shows that the FPGA co-processor can achieve 3.7-38.6x speed up on BCP compared with state-of-the-art software SAT solvers. Single FPGA implementation of the co-processor only consumes 7.16 watt power in the worst case, which is an order of magnitude lower than modern CPU.

A modern SAT solvers and SW/HW partition

Our main goal is to leverage hardware acceleration to build a practical SAT solver that performs better than pure software solvers for real world SAT instances. Due to the continuous improvements for software SAT solvers in branching, learning and restarting, it is impossible to completely map a software SAT solver into hardware and build a practical hardware SAT solver as previously attempted . Moreover, due to the huge research investment in so called “chaff like” solvers, it would be difficult for algorithms that significantly deviate from it to be competitive performance wise.

Therefore, we focus our effort on accelerating only Boolean Constraint Propagation (BCP), which accounts for 80%?90% of the CPU runtime in a highly optimized SAT solver. Literature continues to provide optimizations for software BCP implementations, but the principle behind BCP (unit implication) has not changed, thus making it a good target for hardware acceleration. Moreover, most SAT solvers have clean interfaces between the implication engine and the rest of the solvers, providing easy integration of the BCP co-processor into different solvers. The left figure provides the CPU time breakdown of the SAT solver from software profiling and the resulting partitioning we used in implementing our hybrid SAT solver (using both a general purpose CPU and a dedicated (reconfigurable) SAT co-processor).

Modern SAT solvers have tuned the BCP engine so well that each implication only takes a couple of thousand CPU cycles for typical SAT instances. Since the CPU is running in the Gigahertz range, the co-processor must have a high clock frequency and each implication on average can at most take tens of cycles to achieve reasonable speedup. This is very challenging for an FPGA implementation, due to the inherently slower speed of FPGAs compared with CPUs and ASICs. Global signals must be avoided and the co-processor must be fully pipelined. Most SAT instances encountered in the industrial applications contain thousands of variables, therefore, the implication time must be independent of the variable count . Finally, for any practical solutions, the hardware resource requirement must be roughly proportional to the size of the instance being solved.

Due to the large amount of time required for logic synthesis and placement and routing, we avoided developing a BCP instance-specific SAT solver. Instead, we load SAT instances into an application-specific BCP co-processor. We leverage the Block RAM (BRAM) in modern FPGAs to store instance specific data. This approach not only reduces the instance loading overhead, but also makes the latency of the co-processor predictable, which simplifies the design of the interface with the host machine. Our design is mainly bounded by the BRAM capacity of the FPGA. In our current design, we target the Xilinx Virtex 5 LX110T FPGA, which can handle up to 64K variables and 64K clauses. Using the largest Xilinx FPGA (Virtex 5 LX330T), the capacity can be extended to 64K variables and 144K clauses, by instantiating more inference engines. For even larger SAT instances, the BCP co-processor design can be easily expanded to utilize multiple FPGAs.