The Datumtron API

An In-Memory Graph Database API based on the Datum Universe Model

The advantages of using Datumtron API are performance gains, ease of querying complex databases, native understanding of time, representing code as data, and providing a platform for data mining and machine learning. For a full tutorial, please read the white paper "Datumtron In-Memory Graph Database API".

Performance

As a graph of nodes, we can find a node, hold a reference to that node, and then use it to find other nodes. Finding an instance katum based on its object value, is achieved in approximately constant time regardless of the number of katums in the graph.

For queries where we use equality, for example; Get Employees where employee city = Tokyo; a typical relational database scans all employees and select the ones where the city column has the value ‘Tokyo’. This has time complexity of O(n) where n is the number of employees. In the Datum Universe, we find the ‘Tokyo’ katum from the City katum in O(1). The instance set of Tokyo has the Tokyo Employees.

For queries where we apply a condition on a single column, for example; Get Employees where salary > 100,000; a typical relational database scans all employees and apply the condition on each employee. Again, this has time complexity of O(n) where n is the number of employees. In the Datum Universe, we scan the salary instances for the ones > 100,000. This is achieved in O(m) where m is the number of unique salary instances -- which is less than the number of employees. Each salary instance has the set of employees who get that salary.

For queries that involve multiple tables, relational database joins these tables and conducts the query. A modern relational query server creates a temporary data structure to improve the performance of the Join query for example; using Hash join and Merge join algorithms. In the Datum Universe, there is no need for this whole process; the graph represents data in an already “hashed” form.

The performance gains are not only as a result of in-memory processing but also the result of the Datum Universe graph representation itself, as well as the Datumtron API implementation and use of Hashing.

Ease of use

Datumtron API use of a small set of operators allows writing statements that are clear and concise. This is especially true when multiple “tables” are involved. For example, products ordered by customers who live in München are

Contrast this with the equivalent SQL query statement which contains 4 joins.

Temporal

Datumtron API has built-in understanding of the concept of time. The ‘now’ operator allows you to change data with the understanding that it is time changes not corrections or permanent modifications. The ‘now’ operator creates a new Time Instance, and the ‘at’ operator allows you to retrieve time instances by their time index. Time instances are regular katums with a ‘time’ object attached. As any other instance, you can find a time instance by its attached time object. Using these simple operators, we can build more complex temporal facilities to analyze change due to the passing of time in specific periods of time and to recognize trends.of katums in the graph.

Code as Data

Storing and executing code in Datumtron is seamlessly integrated in the API. Code is represented by objects of type ‘function’ or any class implementing the ‘IFunction’ interface. These objects are attached to katums as any other object type. The ‘of’ operator recognize these two types and executes the attached function. These katums have attributes and instances like any other katums. On one hand, this gives executable code access to the data in the graph and on the other hand, allows for selecting code to execute based on its attributes.

Intelligence

This is the distinctive advantage of the Datum Universe representation and the Datumtron API. Representing data as a structure of one fundamental element ﴾datum﴿ with one fundamental relationship ﴾is﴿ enables us to create more complex but generic inference logic and data structures.

The simplicity of the building blocks of the Datum Universe, lends itself to building more complex data structures. We have seen how relational database can be represented and how this simplified and improved the time complexity of querying the database. Tables provide a good visual or mental picture of large amounts of similar data. Other forms of visualizations for example, graphs and trees provide more mental pictures of network or hierarchical oriented data. The Datum Universe can be the underlying representation of data while tables, networks, and trees can be the human mental model or the higher level visualization of data.

Inheritance is inferencing new information about datums by looking at their parents attributes – that is by looking ‘up’ in the graph. By looking ‘down’ at instance attributes, we can generalize and theorize attributes of the parents. If all instances of apple in the Datum Universe are red, then we can theorize that an apple is red. These generalizations are qualified by ‘support’ and ‘probability’ factors. Datumtron API has additional operators to ‘induce’ and manipulate datums in order to classify the graph. These features combined, provide the basics for data mining, classifications and machine learning.