Ada programming

Mittwoch, 17. September 2014

Ada Direct I/O B-Tree vs SQLite benchmark

1. Abstract

SQLite is actively used in Ada community for storing
persistent data. The advantage of SQLite over other RDBMS is that the database is kept in a
single file. The database management system is supplied as plain C code which
can be statically linked to the application, which makes it ideal for small embedded applications
and other applications where a standalone database server looks an overkill. There
are drawbacks as well. SQLite is written in C, which feels uncomfortable to use
in an Ada application with its focus on safety and maintainability. Worse is
that SQLite has
a SQL interface with its high relational to host language impedance mismatch. Possible advantages of a relational database are minimal or non-existent for the
use case when SQLite comes in question, while the costs of coding, design and
performance needed to translate SQL statements and binding parameters are high.

In this post I present a pure Ada alternative to SQLite, based on the standard
Ada library package Ada.Direct_IO with an implementation of a transaction layer
on top of it and an implementation of B-tree resident in the file. I also
provide some benchmark tests to compare its performance against SQLite. The
software is a part of the
Simple Components for Ada library available under GM
GPL for many platforms. It is kept backward compatible to Ada 95, and can be
used with Ada 2005 or 2012 as well.

2. The approach

The package Ada.Direct_IO provides system-independent random file access
interface. It is a generic package instantiated with the type of the file block.
Blocks can be read and written in any order indexed by block number.

The package Persistent.Blocking_Files
adds a memory cache to Ada.Direct_IO file. When a block is requested it is read
into the cache. Sequential reading and writing occurs on the cache until the
block is there. Cached blocks are marked as updated when written and stored back
into the file when the block is removed from the cache or else the file is
closed.

The package
Persistent.Blocking_Files.Transactional
extends Persistent.Blocking_Files
by adding virtual to physical block mapping while keeping the interface. When a
virtual block is updated its physical counterpart is not changed. An unused
physical block is allocated and the virtual block is remapped to it. This new
physical block is kept overwritten until closing the transaction. The
implementation maintains the list of free blocks and the list of disposed blocks
which become free upon committing the transaction. Since original blocks are not
changed until translation commit, an application or system crash would be
equivalent to rollback.

Within the file blocks the package Persistent.Memory_Pools
allocates blocks of varying size similarly to dynamic memory pools. The blocks
can be allocated and freed like in standard pools. The size of a block is
limited by the size of the file block. The package provides input and output
streams which allow writing objects larger than the maximal block size. The
streams handle chains of blocks transparently to the application. Additionally
the package maintain a mutex object that allows concurrent access to the pool.

The package Persistent.Memory_Pools.Streams.Generic_External_B_Tree
provides an implementation of B-trees with the buckets (nodes) allocated in the
virtual blocks and leaves allocated in the pool. A
B-tree is a key to value map, which can effectively searched by the key. It also
can be searched for a less-or-equal and greater-or-equal key. The tree leaves can
be traversed in the key order. B-trees buckets have fixed size to fit the file
blocks.

3. The benchmark

The benchmarks compares performance of the B-tree implementation with an SQLite
table with two columns. The SQLite3 from the
amalgamation version 3080200 was used for the benchmark. The
amalgamation was statically linked to the benchmark application. The SQLite
benchmark ran in the full-mutex mode and used the following table:

5. Conclusion

Ada's Direct_IO offers a viable alternative to SQLite for Ada applications. The
performance tests show sufficiently better performance of mutable operations.
Direct use of B-tree operations and the persistent memory pool based on Ada
stream attributes is easier for handling complex Ada types of persistent objects
than breaking them into the limited set of types supported by the SQL version of
SQLite.

Freitag, 28. Dezember 2012

Type
introspection is a language feature which allows to determine properties of a
type at run-time. Ada is a statically typed language so no introspection is
needed, most of the time. When it comes to class-wide types their instances are
not so static.

One might
think that Ada does not support type introspection, which is almost true, but
not quite true. In fact, Ada (starting with Ada 95) has features which can be
used to introspect types. This is stream-oriented attributes (RM 13.13.2). The
language generates default implementations of T'Read and T'Write based on the
type contents. Of course, the compiler knows the type components in order to be
able to generate these implementations. Can this knowledge be used for run-time
introspection? Yes it can.

"...
For type extensions, the Write or Read attribute for the parent type is called,
followed by the Write or Read attribute of each component of the extension
part, in canonical order. ..."

This gives
us an idea of calling T'Write on the container type noticing what its
implementation would call for the components.

First we
declare the abstract base type of the container:

type Container is abstract tagged limited
null record;

procedure Introspect

(Stream : access Root_Stream_Type'Class;

Item: Container

)is null;

for Container'Write use Introspect;

Then the
abstract base type of the members:

type Member is abstract tagged limited null
record;

procedure Introspect

(Stream : access Root_Stream_Type'Class;

Item: Member

);

for Member'Write use Introspect;

And finally
a fake stream type which will collect the information of the container type
contents:

package List_Of_Tags is

new
Ada.Containers.Vectors (Positive, Tag);

use List_Of_Tags;

type Scriber is new
Ada.Streams.Root_Stream_Type with record

List : Vector;

end record;

procedure Read

( Stream : in out Scriber;

Item: out Stream_Element_Array;

Last: out Stream_Element_Offset

)is null;

procedure Write

(Stream : in out Scriber;

Item: Stream_Element_Array

)is null;

The read
and write implementations do nothing. The stream contains a vector of type
tags, written from the implementation of member's write:

procedure Introspect

(Stream : access Root_Stream_Type'Class;

Item:
Member

)is

begin

Scriber'Class (Stream.all).List.Append

( Member'Class (Item)'Tag

);

end Introspect;

This is
basically all. Putting the above in a package, e.g. Introspection, it could be
used as follows:

with
Ada.Tags;use Ada.Tags;

with
Ada.Text_IO;use Ada.Text_IO;

with
Introspection;use Introspection;

procedure
Test is

type A is new Member with null record;

type B is new Member with null record;

type C is new Container with record

X : A;

Y : B;

end record;

Object: C;

Contents : aliased Scriber;

begin

C'Write (Contents'Access, Object);

for Index in
Contents.List.First_Index

.. Contents.List.Last_Index

loop

Put (Integer'Image (Index));

Put ("");

Put (Expanded_Name
(Contents.List.Element (Index)));

New_Line;

end loop;

end Test;

This will
print:

1TEST.A

2TEST.B

Final
notes:

The derived types of Container and Member should not override stream
attributes. This will make the compiler to generate them.

The method works
recursively, that is you can have container types put into containers all
introspected. The trick works as follows. From T'Write of the nested container
you can introspect it using another stream object and use the information obtained.

Be careful introspecting from Initialize. When Initialize is called, the derived type's Initialize is not yet completed, i.e. the type is not fully constructed. This is also the reason why T'Write is better to use than T'Read, even if T'Read does not actually update the object.

Scottish Martin's comments do absolutely make some good points, and I had no quibble with them. He ends his comment, though, with one of my pet peeves: 'A professional team can develop quality software whatever the chosen implementation language and toolset.' (And that just set me off--though Martin's just in the wrong place at the wrong time. :-)

That statement is analogous to the 'Any competent programmer can write good code in any language' trope.

The advocated language could be Ada, Lisp, Haskell, or any of many others that face an uphill struggle for acceptance. The advocacy is dismissed with the claim that programming language choice just doesn't make much difference, and after all, a competent programmer can write quality software in any language.

While this claim about the ability to create good code may be true, it's irrelevant, and is usually thrown in the face of a developer who is advocating the use of a programming language that differs from the corporate herd selection, in order to shut them up, which it too often does. The claim, though, begs the question of how much it costs, in time and money, to develop that quality software using a chosen language and toolset. And whether a different choice could lead to quality software being developed faster and more cheaply, thereby encouraging the creation of even more quality software.

I've argued about this before. Programming language choice does matter, programming toolsets do matter. Programming language and development tools are where the bits hit the hardware, and if you want quality work from a developer, you need to use quality tools.

Seriously, do Indy and Formula One racing mechanics get their tools at WalMart and Harbor Freight? Because 'a professional mechanic ...'

Montag, 12. Mai 2008

Ada is renown for its concurrency support. Parallel programming is a difficultissue in all aspects. It is difficult to learn, to design, to program, to validate, butfor all, it is most difficult to reuse.

Yesterday I finished the version 3.0 of the Simple Components for Ada, where I triedto summarize my experience and ideas in this area. The library among other thingscontains some basic gears for dealing with concurrency.

The section 9 is devoted to implementation of some lock-free datastructures. These become more popular with new multi-core architectures.Though Ada was not designed to provide lock-free primitives on the low-level,for that obvious reason that this would be non-portable. It still has necessarytools. Here I mean the pragma Atomic, which allows many interesting things to do.

The section 10 contains implementations of locking synchronization primitives.Protected objects introduced in Ada 95 is an excellent mechanism of a greatpower. Especially interesting is to explore the requeue statement. Events,pulse events, arrays of events, mutexes, arrays of mutexes let be implementedas protected objects. Using the requeue statement one can do a lot of things,which appear impossible at the first glance. For example, this sectionpresents a programming pattern for using entry parameters in the barriers of.It also discusses how race condition and deadlocks can be avoided when usingprotected objects.

Two classical problems are considered on examples: the checkpointsynchronization problem, and the dining philosophers one.

It is free as only it can be, the license is GMGPL I hope you will enjoy it.

On Solaris I only hat to upgrade parts of Gtk+. But in MinGW I have nothing to build on and need to start from scratch. And with the very first package I run into an complete blocker which puts a interesting new angle to the "The Henn and Egg problem" problem I described a few days ago:

I googled the internet high an low without any hint whatsoever on how to break the recursive dependency. As it is it seems that it is impossible to bootstrap Gtk+ at all.

Now, one might argue that this is a developer problem and developers should be used to grieve. But that is not the way it works. Such flaky design will show in the final product as well. I for once have disabled the "GNOME stable" download channel for openSUSE long ago and with no desire for reactivation. It left me quite a few time with an unusable system. The problem was – guess what – faulty dependencies.

Montag, 21. Januar 2008

Over time I have created GNAT compiler for a variety for of platforms and one thing I noticed is the constant encounter of Hen and Egg situation. — At least when you leave the save heaven of Linux. Below some examples:

Texinfo and libiconv

A classic one: The compiler needs Texinfo to create it's online help and you need an working compiler to create Texinfo. Ahh, and Texinfo needs libiconv and libiconv needs a working C++ compiler.

But these problems is easily solved since online help is only optional extra.

Buinutils

Similar to Texinfo only a little trickier as a working assembler and linker are an absolute must. And don't expect every platform to come with a sufficiently up-to-date assembler or linker — Solaris springs to my mind here.

C++

C++ is an integral part of GCC compiler and needed by libiconv. But that is not the problem. The problem is that the C++ build expect the C header file to exist at prefix/include. Of course they only exist there after make install.

Ada

For us Ada users the worse one: Ada is self hosted. Being self hosted is of course very cool indeed: About 95% of the Ada compiler and runtime are written in Ada. I do pity the poor GNU-Fortran maintainer where most of the compiler and runtime is written on C — Yuck.

Of course that coolness comes for a price: You need a working Ada compiler to build the Ada compiler.

Resume

Why am I telling you all that? Now, if you are planning to build a newer compiler / tool chain for any platform apart from an up to date Linux you should – from the very onset – expect to build the compiler more then once. In fact: plan for 3 to 4 successful iterations (and an uncounted amount of unsuccessful iterations). Start off with just C and Ada, add other languages later. Don't worry if for example Texinfo is reported to old, Create a newer version later and try again. And remember that libiconv needs an up to date C++ compiler.