Not all FPRASs are equal: demystifying FPRASs for DNF-counting

Abstract

The problem of counting the number of solutions of a DNF formula, also called #DNF, is a fundamental problem in artificial intelligence with applications in diverse domains ranging from network reliability to probabilistic databases. Owing to the intractability of the exact variant, efforts have focused on the design of approximate techniques for #DNF. Consequently, several Fully Polynomial Randomized Approximation Schemes (FPRASs) based on Monte Carlo techniques have been proposed. Recently, it was discovered that hashing-based techniques too lend themselves to FPRASs for #DNF. Despite significant improvements, the complexity of the hashing-based FPRAS is still worse than that of the best Monte Carlo FPRAS by polylog factors. Two questions were left unanswered in previous works: Can the complexity of the hashing-based techniques be improved? How do the various approaches stack up against each other empirically? In this paper, we first propose a new search procedure for the hashing-based FPRAS that removes the polylog factors from its time complexity. We then present the first empirical study of runtime behavior of different FPRASs for #DNF. The result of our study produces a nuanced picture. First of all, we observe that there is no single best algorithm that outperforms all others for all classes of formulas and input parameters. Second, we observe that the algorithm with one of the worst time complexities solves the largest number of benchmarks.

Notes

Acknowledgements

The authors would like to thank anonymous reviewers for their insightful comments and suggestions. Moshe Y. Vardi and Aditya A. Shrotri’s work was supported in parts by NSF grant IIS-1527668, NSF Expeditions in Computing project “ExCAPE: Expeditions in Computer Augmented Program Engineering”. Kuldeep S. Meel’s work was supported in parts by NUS ODPRT Grant R-252-000-685-133, AI Singapore Grant R-252-000-A16-490, and Sung Kah Kay Assistant Professorship Fund.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

For obtaining a concrete algorithm from the framework described in Algorithm 2, we need to instantiate the sub-procedures SampleHashFunction, GetLowerBound, GetUpperBound, EnumerateNextSol, ExtractSlice and ComputeIncrement for a particular counting problem. We now show how SymbolicDNFApproxMC [13], which uses Row Echelon XOR hash functions, and the concepts of Symbolic Hashing and Stochastic Cell-Counting, can be obtained through such instantiations. Then we prove that by substituting the BinarySearch procedure by ReverseSearch, the complexity of the resulting algorithm is improved by polylog factors.

SampleHashFunction

One can directly invoke the procedure SampleBase described in Algorithm 4 of [13] with minor modifications. This is shown in Algorithm 7. Note that the hash function A,b,y so obtained belongs to the Row Echelon XOR family.

Extracting a prefix slice

Procedure ExtractSlice required for ReverseSearch is shown in Algorithm 8. If flip is false, ExtractSlice returns the result of the procedure Extract (described in [13]) directly. Otherwise, the p-th bit of y[y] is negated before being passed to Extract.

EnumerateNextSol

SymbolicDNFApproxMC enumerates solutions in the cell, in the order of a Gray code sequence, for better complexity. This is achieved by invoking the procedure enumREX (Algorithm 1 in [13]).

ComputeIncrement

Procedure CheckSAT (Algorithm 10 adapted from [13]) can be used to compute the increments to Ycell as shown in Algorithm 9. The assignment s is divided into a solution x and a cube Fi using the same Interpret function used in line 7 of Algorithm 6 in [13]. CheckSAT samples a cube at random in line 3 and checks if the assignment x satisfies it in line 5. The returned value follows the geometric distribution [9], and can be used to compute an accurate probabilistic estimate Ycell of the true number of solutions in the cell [13].

Lemma 1

Proof

Ycell is incremented by cx/m in line 5 of BSAT after a call to ComputeIncrement and CheckSAT. Since BSAT returns after Ycell reaches threshold, the sum of cx over all invocations of CheckSAT is m ⋅threshold. Every time cx is incremented, the check in line 5 of CheckSAT is performed which takes \( \mathcal {O}(n) \) time. Moreover, EnumerateNextSol also takes \( \mathcal {O}(n) \) time as enumREX in [13] takes \( \mathcal {O}(n) \) time. As a result, the complexity of BSAT is \( \mathcal {O}({\mathsf {m}} \cdot {\mathsf {n}} \cdot {\mathsf {threshold}}) \). \(\ \Box \)