The Datum Universe Model

What?

The Datum Universe model is a knowledge representation scheme. It specifies how to represent data and how to make inferences.

Why?

Identify the fundamental elements of knowledge and the fundamental elements of intelligence. The goal is to provide a unified minimalistic framework that has the building blocks for representing any kind of knowledge constructs

How?

The Datum Universe represents knowledge using two fundamental concepts:

The Datum; which is an abstract element defined entirely by its relations to other datums

The "is" relation; which is a directed link between two datums and is the only type of relation allowed among datums.

Intelligence emerges from two fundamental properties of the Datum Universe:

The transitivity property of the "is" relationship (if a is b and b is c, then a is c) allows for inheritance and generalization. See side notes for examples.

Induction; which is the capability of the datum universe to create new datums autonomously. The induced datums reduce the number of connections and provide a built-in classification process. See side notes for an example.

You can understand the datum universe model as a Graph, as a Poset, or by comparison to relational Tables and EAV / RDF models. You can also read the white papers listed.

Advantages

Existing data models rely on higher, more complex, and diverse building blocks to represent real life knowledge. For example, in relational databases, we have the concepts: table, row, column, field, primary key, and foreign key. In graph-based knowledge representations, we have nodes and an unlimited number of relationship types. These concepts are "hard-coded" because we cannot modify or advance them, nor reason about their properties within the same framework.

The Datum Universe approach is to build these complex concepts as "soft-coded" constructs out of the minimum "hard-coded" elements. This approach makes it possible to:

Encode intelligence into the framework. Providing inheritance, generalization /prediction, and a classification process. These are the building blocks for data mining, machine learning and natural language understanding applications.

Study knowledge bases using formal tools like Partial Orders and Graph Theory. The Datum Universe is essentially a Directed Acyclic Graph (DAG) in a Transitive Reduction state. This gives us a deeper insight into the knowledge base content.

Implement the model in various ways to achieve different performance-memory behaviour from extremely fast O(1) to extremely memory compact.

Provide a simple operator-based query language.

Implement the model totally in hardware. For example, similar to IBM's SyNAPSE chip

Applications

We can highlight 3 major areas for applications of the Datum Universe:

Traditional In-memory database systems. The model provides flexibility in the actual representation of data in memory. Different representations may target different memory/performance profiles. The fundamental nature of the model makes it easy to represent temporal data as well as executable code as datums. This also leads to the simplicity and power of the models's query operators. See the Datumtron Graph Database API white paper.

Data mining system. Use the induction process to have data mining / machine learning built into the database. As data changes, new patterns can be detected making predictions based on updated patterns. Contrast this with running external data mining algorithms on snapshots of the database. For example see the Predict tool.

Intelligent knowledge Agent. Using Natural Language parsers, we can acquire knowledge from existing text sources and build a datum universe. Since the hard-coded knowledge is minimal, there is no limit on the depth of understanding that can be achieved by an intelligent agent.

A brief introduction to the Datum Universe Model which is the theory behind the Datumtron API.

Datum Universe representation of "The color of apple is red"

color is thing

red is color

apple is red

Notice that:

apple, red, color, and thing are all datums.

"is" is the only relation allowed and it is a general case of the "IS-A" relation.

The fact "color of apple is red" is concluded from the two relations: "red is color" and "apple is red".

A full relational database (Northwind) is converted to the datum universe in the API Tutorial.

Inheritance

is the ability to deduce attributes of a datum based on attributes of its "is" relations. For example, if we have "apple is red", and "apple1 is apple", then we can conclude that apple1 is red.

Generalization

is the opposite of inheritance; deducing properties of a datum using the attributes of its instances. For example, if all of the instances of apple that we have seen are sweet, i.e., "apple1 is sweet", "apple2 is sweet", etc., then we can generalize that all apples are sweet and conclude that "apple is sweet".

Induction

The induction of a new datum1. The induction of a new datum

The induction of a new datum1. The induction of a new datum

The induction of a new datum1. The induction of a new datum

The induction of a new datum1. The induction of a new datum

The induction of a new datum1. The induction of a new datum

The induction of a new datum1. The induction of a new datum

The induction of a new datum1. The induction of a new datum

The induction of a new datum1. The induction of a new datum

is the creation of a new datum to reduce the number of relations. For example, if we have 10 instances of apple, that are all sweet and fresh, and as a result we have many repeating pairs like "apple1 is sweet", "apple1 is fresh", "apple2 is sweet", "apple2 is fresh", etc. A new datum X may be induced as follows "X is sweet", "X is fresh" and "apple1 is X", "apple2 is X", etc. This reduces the number of relations from 2*10 to 2+10.