AT&T Researchers — Inventing the Science Behind the Service

Daytona is AT&T’s database management system for warehousing immense amounts of data while providing the capability to quickly query and retrieve data within seconds, even from tables containing more than a trillion records. It supports such standard database functionality as SQL, data dictionary, transactions, locking, logging, recovery, and views. In addition, it offers indefinite size scalability via its compression technology and fully general horizontal partitioning, great speed scalability via its unique SPMD parallelization capabilities, special in-memory data structures, and optional shared memory use and lastly, its own powerful 4GL query language Cymbal, which includes SQL. It does all this in a more streamlined, better engineered way that sets Daytona apart from other database systems in terms of database capacity, speed, query language expressiveness, and ease of use.

Among its distinguishing characteristics is Daytona’s unique and simple architecture. First, there are no database server processes. While most database systems rely on server-based processes for scheduling, file access, locking, caching, networking and other tasks, Daytona employs the operating system alone for these tasks. This avoids the inefficiencies inherent in having both server processes and the operating system redundantly trying to do the same kinds of things at the same time on the same hardware. Thus, Daytona’s architecture enables it to be more compact and more efficient than other database systems.

Data storage is also simplified with Daytona storing its data as UNIX ASCII flat files in standard file systems. Consequently, there’s no need to pay the administrative overhead needed to create special raw disk partitions to hold the data. Furthermore, as simple files, the data remains accessible in a way not possible with other systems that store data using a proprietary, representational form. In particular, when not compressed, standard UNIX tools can operate on Daytona’s data directly. Storing data as flat files also means more efficient tables since records are stored one right after the other.

In Daytona, data is stored as UNIX flat files, with each line corresponding to a table row and each field separated from others by a simple character (the default is a |). Even comments are supported, in this case by using the % character. Daytona files are viewable (when uncompressed) by vi and other UNIX editors.

Queries themselves are fast because queries are compiled directly to C code and then to machine executables, which run faster than interpreted queries. (Most database servers first interpret queries into an intermediate representation language.) Once created, the executables can be invoked directly by name.

By using the Single Program Multiple Data (SPMD) parallelization paradigm (not typically used by others), Daytona achieves great speed scalability by ensuring that multiple CPU cores can be employed to produce the answers for a single query. Daytona’s use of this paradigm consists of compiling a single program which, by design, creates k clone child processes, each of which solves 1/kth of the problem and reports back its results to the parent for integration.

For creating sophisticated queries, Daytona has its own powerful high-level, 4GL querying language Cymbal® that includes SQL as a subset. Queries can be written in Cymbal, SQL, or even a combination of the two, giving greater flexibility to write queries specialized for a particular task. Queries can be performed on the data even as it’s being loaded into Daytona. As a 4GL, Cymbal contains a number of constructs that allows it to also be used as a programming language for additional power.

Cymbal uses both declarative and procedural queries even within the same program, as shown in this sample program, which prints:

Yes, 17 is a prime.
Yes, 17 is a prime.

The definition of Is_A_Prime is declarative, and the definition of Is_A_Prime_Too is procedural.

Cymbal offers a very-high-level, one-of-a-kind way to store tuple-to-tuple associative arrays in UNIX shared memory, while optionally using all of the other capabilities of Cymbal. Multiple processes can concurrently read and write these associative arrays. As an example, consider continually maintaining associative array caches in UNIX shared memory that contain user account data that are then used to join with user activity records streaming by. Processes can also synchronize their access to these arrays in such a way as to pass data (or other messages) as might otherwise be done using pipes (or message queues). By working in shared memory, the user gains the speed that would otherwise be lost due to disk I/O.

To support access by third party tools such as Business Objects, Daytona offers a JDBC interface by means of its pdq network shell daemon. Daytona also has interfaces for Perl and Python.

Daytona is easy to use. The architecture is easy to understand and easy to administer by anyone with basic knowledge of UNIX. In fact, it can be installed and ready to go in less than 10 minutes. No special expertise in a proprietary system is needed.

Too many cooks . . . Whereas most database systems employ server processes running on the operating system, Daytona interfaces directly to the operating system without needing to create database server processes. This avoids the redundancy of the first case where two large programs (the OS and database servers) are performing many of the same tasks at the about same time using the same resources.