MAY 2017 | VOL. 60 | NO. 5 | COMMUNICATIONS OF THE ACM 99
mentions, and provide a guideline to the developer for improving the KBC system built using DeepDive.

Incremental and efficient execution. Especially with the
above design choices, performance is a major challenge.
In our KBC systems using DeepDive, we may need to perform inference and learning on a large number of highly
correlated random variables. For example, in PaleoDeepDive, we construct factor graphs that contain more than 300
million variables, each representing a potential mention to
extract as final output. Therefore, one of our technical focus
areas has been to speed up probabilistic inference. 32, 33, 35, 50, 51
A second major technical focus has been the development of
efficient incremental methods for grounding and inference,
given the iterative nature of KBC application development.
In Section 4, we briefly describe these techniques and provide
pointers to readers who are interested in further details.

4. TECHNIQUES

A DeepDive program is a set of rules with weights specified
using the language we described above. During inference,
the values of all weights are assumed to be known, whereas
in learning, one finds the set of weights that maximizes the
probability of the evidence. The execution of a DeepDive
program consists of two phases: (i) grounding and (ii) statistical inference and learning. In this section, we briefly
describe the techniques we developed in each phase to
make DeepDive performant and scalable.

4. 1. Groundingh

As shown in Figure 8, DeepDive explicitly constructs a factor
graph for inference and learning using a set of SQL queries.
A factor graph is a triple (V, F, ŵ) in which V is a set of nodes
that correspond to Boolean random variables, F is a set of
hyperedges (for f ∈ F, f ⊆ V ), and ŵ: F×{0, 1}V → ℝ is a weight
function. In DeepDive, V and F are explicitly created using a
set of SQL queries, and this process is called grounding.

Learning and inference. In the learning and inference
phase, DeepDive generates a factor graph, similar to Markov
Logic, and uses techniques from Tuffy. 33 The inference and
learning are done using standard techniques (Gibbs sampling) that we describe below after introducing the formal
semantics.

Error analysis. DeepDive runs the above three phases
in sequence, and at the end of the learning and inference,
it obtains a marginal probability p for each candidate fact.
To produce the final KB, the user selects facts that DeepDive
predicts are true with probability above some user-selected
threshold, for example, p > 0.95. Typically, the user needs to
inspect errors and repeat the previous steps, a process that
we call error analysis. Error analysis is the process of understanding the most common mistakes (incorrect extractions,
overly specific features, candidate mistakes, etc.) and deciding how to correct them. 39 To facilitate error analysis, users
write standard SQL queries.

3. 3. Discussion of design choices

We have found the following key aspects of the DeepDive
approach that we believe enable noncomputer scientists to
build sophisticated KBC systems: ( 1) There is no reference
in a DeepDive program to the underlying machine learning algorithms. Thus, DeepDive programs are declarative
in a strong sense. Probabilistic semantics provide a way to
debug the system independent of the algorithm it uses. ( 2)
Deep Dive allo ws users to write feature extraction code (UDFs)
in familiar languages (Python, SQL, and Scala). ( 3) By using
and producing relational databases, DeepDive fits into the
familiar SQL stack, which allows standard tools to inspect
and visualize the data. ( 4) The user constructs an end-to-end
system and then refines the quality of the system in a pay-as-you-go way. 28 In contrast, traditional pipeline-based ETL
scripts may lead to a user’s time and effort being overspent
on a specific extraction or integration step—without the
ability to evaluate how important each step is for the quality
of the end result. Anecdotally, pay-as-you-go leads to more
informed decisions about how to improve quality.

The above design choices necessitated overcoming several
technical challenges, two of which we briefly highlight below.

Joint statistical inference. In many systems, successivestages are simply pipelined together, propagating errorsfrom one stage to the next, and complicating iterativedevelopment efforts. This can also have noticeable per-formance effects when the information from differentsources are all noisy, and potentially need to be consid-ered together—that is, jointly—to make a correct extrac-tion. To join extractions with different confidence levelstogether, one needs a principled framework. The Deep-Dive approach to this challenge is based on a Bayesianprobabilistic approach. DeepDive treats all these informa-tion sources as one joint probabilistic inference problem,with all predictions modeled as random variables within afactor graph model. This probabilistic framework ensuresthat all facts produced by DeepDive are associated witha marginal probability. These marginal probabilities aremeaningful in DeepDive; that is, they represent the em-pirical accuracy that one should expect for the extractedh There is a justification for probabilistic reasoning, as Cox’s theoremasserts (roughly) that if one uses numbers as degrees of belief, then one musteither use probabilistic reasoning or risk contradictions in one’s reason-ing system; that is, a probabilistic framework is the only sound system forreasoning in this manner. We refer the reader to Jaynes et al. 21

Figure 8. Schematic illustration of grounding. Each tuple
corresponds to a Boolean random variable and node in the factor
graph. We create one factor for every set of groundings of an
inference rule.