LogicBlox 4.0

Executive Summary

LogicBlox 4.0 unveils the foundation for the LogicBlox Smart
Database technology. The Smart Database delivers raw performance,
a simplified programming model and reduced database administration
overhead, together with full ACID-compliance.

Innovations in the core query evaluation algorithm and the
concurrency model enable LogicBlox 4.0 to out-perform
LogicBlox 3.x across a wide spectrum of query workloads
(transactional, analytical, as well as graph). For certain
query workloads, LogicBlox 4.0 out-competes industry leaders.

Innovations in query optimization and continued improvement in
the programming model reduce the need for manual tuning of
queries as well as the database. Preliminary benchmarks show
that for natural queries -- those that have not been
hand-optimized by a human expert -- LogicBlox 4.0 delivers
query evaluation times that are competitive with the
evaluation times of hand-optimized queries on LogicBlox 3.x.

Beyond full ACID-compliance, LogicBlox 4.0 delivers full
serializability as the default isolation level, further
relieving programmers from having to reconcile anomalies that
may arise from weaker isolation levels.

LogicBlox 4.0 supports applications built using the BloxWeb
service-oriented architecture. Some modification of logic may be
necessary. Please refer to the migration guide for details.

What's New

The more appropriate question for this release is, "What's not new?"
The LogicBlox 4.0 runtime was reinvented from the ground up to
improve raw query evaluation as well as concurrent read/write
transaction performance. We show benchmarks that demonstrate
the improvements in query evaluation and the handling of
concurrent transactions.
LogicBlox 4.0 continues to evolve the language, as well. The
number of parameters a programmer needs to tune in order to meet
an application's performance requirements has been dramatically
reduced. We outline these simplifications. We also use
benchmarks to demonstrate the effectiveness of automatic query
optimization and tuning over manual efforts in LogicBlox 4.0.

On Performance

On Query Evaluation

One of the core innovations in LogicBlox 4.0 is the join
algorithm. We demonstrate its current performance
characteristics using three sets of benchmarks: TPC-H, some
realistic queries extracted from an existing application that
have proven to be problematic for LogicBlox 3.10, and the
4-clique query which demonstrates the applicability of
LogicBlox 4.0 on graph algorithms.

Figure 1 illustrates the
performance of LogicBlox 4.0 on
TPC-H against LogicBlox 3.10, and LogicBlox 4.0-parallel.
(Note that parallel evaluation is not included in LogicBlox
4.0 but is expected for a subsequent release). TPC-H is the
industry standard benchmark suite for OLAP databases; it
includes 22 complex queries over large volumes of data.
These queries are representative of the queries seen on
typical LogicBlox applications. Figure 1 shows that
LogicBlox 4.0 takes only half the amount of time that
LogicBlox 3.10 takes to complete the entire TPC-H suite. The
median query time for LogicBlox 4.0 is 24.6 seconds, while
115.1 seconds on LogicBlox 3.10.

Figure 5. TPC-H Scale 10

Figure 2
demonstrates the performance of LogicBlox 4.0 over
LogicBlox 3.10 on two queries extracted from an existing
project. These queries were found to cause application
performance issues on LogicBlox 3.10. The figure
demonstrates that LogicBlox 4.0 improves upon LogicBlox 3.10
performance by an order of a magnitude. It further
demonstrates that running the same query the second time
(columns 2 and 4) yields further gains due to the indexes
that have already been created during the queries' first
runs.

Figure 6. Sample problematic queries on 3.10

Finally, we demonstrate the performance of LogicBlox 4.0 on
a graph query, 4-clique. A clique is a graph where every
vertex is connected to every other vertex in the clique.
Graph queries, such as 4-clique, are typical in its use of
self-joins. Specialized databases have been implemented
specifically to support graph queries. As shown in Figure 3, LogicBlox 4.0 (red line)
outperforms PostgreSQL, Amazon Redshift, commercial
in-memory column-store, as well as graph databases. This
benchmark illustrates the unique promise for the LogicBlox
database to unify not only the processing of transactional
and analytical, but also graph workloads.

Figure 7. 4-Clique

On Concurrency

LogicBlox 4.0 implements a novel, lock-free concurrency
model. This model is key in enabling high transaction
throughput for mixed read and write, long- and short-running
concurrent transactions. LogicBlox 4.0 currently supports fully
concurrent read transactions, while serializing the
writes. In future releases, we will be implementing
concurrent writes, as well.

Figure 4 shows the
scalability of LogicBlox 4.0 using a
micro-benchmark, where we show the number of trivial
read/write transactions per second on an small database,
while increasing the number of threads available to the
runtime. The blue and yellow lines show that for concurrent
access with all readers, or one writer along with readers,
LogicBlox 4.0 achieves nearly perfect, linear speedup as more
threads are allocated to it. The green line demonstrates
the impact of performing ad-hoc queries without pre-compilation;
its performance indicates the cost of compiling queries. The
ad hoc query scenario is a less likely scenario for users of
deployed applications, and is more likely to impact
application support personnel, who may use ad hoc queries to
inspect the state of an application.

Figure 8. Perfect scaling of read/write concurrent transactions

We also include three benchmarks showing transaction
throughput at various levels af connectivity:

Figure 5 demonstrates
read-only transaction throughput
of LogicBlox 4.0 when transactions are executed directly
against the lowest level, C++ API into the workspace.
This benchmark provides a good indicator for what the
runtime is capable of, independent of the overhead
imposed by higher level layers such as the database
server (ConnectBlox) layer and the service container
(BloxWeb) layer.

Figure 6
demonstrates the throughput of a simple read-only
transaction, when executed through the database server
layer. Observe that we maintain a near perfect scaling
to the number of threads. However, compared to Figure
5, one can observe around 25% reduction in transaction
throughput in this scenario, which can be attributed to
the overhead of the database server, communication over
TCP sockets, as well as to additional resources needed
to run separate clients applications.

Figure 7 demonstrates
the throughput of a slightly more involved read-only
transaction, which takes an input parameter from the
client and producing output data. Again, while
maintaining near-perfect scaling with respect to the
number of threads, additional overhead of marshalling
input and output data decreases performance.

Both Figure 6 and
Figure 7 further demonstrate
that LogicBlox 4.0
achieves orders of magnitude improvement in transaction
throughput over LogicBlox 3.10.

On Simplifying Programming Model and Database Administration

LogicBlox 4.0 includes changes to the programming language
that simplifies the number of choices a programmer has to make
to produce a correct, performant program.

Simplified numeric types

LogicBlox 4.0 supports one simple decimal type,
which is a fixed-point decimal type that replaces floating
point decimals decimal[64] and
decimal[128]. Fixed-point decimal provides
exact representations for additions and substractions, and
allows for optimization for certain aggregations. It is
recommended over floating point types,
i.e. float[32] and float[64].

Furthermore, LogicBlox 4.0 implements aggressive data
compression. Thus, it is no longer necessary to declare
integer types of various bit-width in an attempt to save
storage space. We encourage the use of
int[64].

Simplified predicate properties

LogicBlox 4.0 no longer requires the declaration of certain
predicate properties regarding data storage and locking.
The following predicate properties are no longer necessary,
and will be removed in subsquent 4.x releases:

entity capacities

storage models

locking policies

scalable types

supplemental indexes

On automatic query optimization

We conclude with a benchmark illustrating the effectiveness
of automated query optimization provided by LogicBlox 4.0. The
first two columns of Figure 8
("query1-first" and
"query1-second") show the running times of a query on 3.10 as
well as 4.0, the first time it is run, and the second. The
second two columns ("query2-first" and "query2-second") show
the running times of logically the same query, but hand-tuned
for 3.10.

The chart illustrates that without programmer hand-tuning a
query, LogicBlox 4.0 achieves 4x performance gains over
LogicBlox 3.10. After going through the effort of hand
tuning, however, LogicBlox 3.10 is able to evaluate the query
faster than the non-parallel LogicBlox 4.0 (but still slower
than the parallel version, expected for subsequent 4.x releases).

The key takeaway from the benchmark is that LogicBlox 4.0 is
capable of providing very good performance without incurring
the cost of programmers hand-tuning their queries. In
addition to allowing programmers to express logic more
naturally, system-generated optimizations is capable of
adapting itself, and re-optimize, to the characteristics of
the data as they change in the database. We encourage
programmers to evaluate the performance of their natural
queries on LogicBlox 4.0 without applying any manual optimization.

Figure 12. Effect of Tuning

Known Limitations and Discontinued Feature

LogicBlox 4.0 supports applications build on the service-oriented
framework only. Blade-based applications are not currently
supported.

Additionally, the following features are not supported, but are planned for subsequent releases:

Delimited File Services can only be used to import/export
simple files. Features such as error reporting or optional
columns are not yet supported.

Mathematical programming and machine learning extensions.

Default value predicates.

Ordered entities.

Replace and remove block.

choice and ambig
predicate-to-predicate mappings.

String concatenation aggregations.

argmin and argmax.
To get the argument, an additional join can be used.

There is limited support for user-level logging. More
detailed logging levels will be supported in subsequent
releases.

Strings stored in predicates can be at most
255 characters long. Post-4.0 versions will
remove this limit.

Predicates can have arity at most 64.
Post-4.0 versions will remove this limit.

A few primitive predicates are missing. These include:
boolean:hash, int64:hash,
floatXX:hash, datetime:hash,
float64:isFinite, and
floatXX:round2.

We do not detect whether transactions that are marked
as read-only are indeed read-only. Possible changes by
transactions marked as read-only are most of the times
not committed into the database; however, once in a
while we do commit read-only marked transactions
(so that we can make subsequent use of indices
created in the transaction). Until we verify that read-only
marked transactions do not change the database, it is
the responsibility of the user to do so.

For functional EDB predicates, retractions are performed
based on key match only, rather than on key-value match. For
example, -a["foo"]="bar" removes
a["foo"]="quz"; i.e., it behaves like
-a["foo"]=_. This will be corrected in a future
release.

The following features are removed, and will not be supported in subsequent releases:

Floating-point decimal primitive types
(decimal[64] and decimal[128]) are
replaced by a single fixed-precision decimal type.

Primitive type color is not supported.

Meta-predicates such as system:Predicate
are not supported. Future versions of meta-predicates
will likely differ substantially from how meta predicates
are handled in 3.X.

MoReBlox. While continuing to be supported in the LogicBlox 3.x
series, support for generic programming in future releases is
undergoing a redesign process and will be different enough
that existing MoReBlox programs will require a rewrite.

MochaBlox and WrappedControls. While continuing to be supported
in the LogicBlox 3.x series, we encourage application
programmers to choose the user interface framework of their
choice, and communicate with the workspace over the service
interface.

Installation and Upgrade information

Installing LogicBlox 4.0 is as simple as following
the steps outlined below: