Importing and Exporting Data with TDX

Thus far we’ve looked at using delta updates to insert data. This is a great way to get started and insert small amounts of data e.g. via the REPL. However, when larger amounts of data have to be imported into, or from a LogicBlox workspace you’re better off using LogicBlox’ TDX (Tabular Data eXchange) services. TDX makes it easy, with a single LogiQL definition, to create high-performance web services that support both data import and export in a comma-separated value (CSV) format. These imports and exports are primarily useful for data integration: for instance, to import data from some other database or legacy and move it into LogicBlox, or vice versa.

Extend our .project file and add a dependency on the lb_web library and add the new tdx module that we’ll create.

Create the .logic files that implement the TDX services. We’ll put these in the tdx folder of our project (same as the module we added to the .project file).

Extend our config.py file to add LB web as a dependency and to easily load our new services through make

Extending our project file

LogicBlox has support for reusable libraries that can be “imported” by listing them in your project file. A few of those libraries are shipped with LogicBlox and for TDX we’ll be using the lb_web library, which we can use by adding the following line to our .project file:

lb_web, library

In addition to this, we’ll also create a new LogicBlox module that we’ll call tdx (with .logic files stored in tdx/*.logic):

tdx, module

Our new project file now looks as follows:

application, projectname
lb_web, library
core, module
tdx, module

Defining TDX services

Next, we’ll move on to defining the actual services. Let’s start with a service to define ice creams, set their prices and costs. A service definition consists of three parts:

The file definition: this defines the separator and columns in your CSV data and the types of those fields.

The file binding: this maps the columns defined in your file definition to predicates in your workspace.

This should read quite naturally: we’re defining a service with the symbolic name tdx/icecream. It defines a “file” using | as a delimiter. And the columns are named “ICECREAM”, “COST” and “PRICE”, and the first has a textual value, and second and third an numeric value. If you’ve never come across hierarchical syntax before (the { ... } syntax), it is primarily used as syntactic sugar for the following:

In this code we’re creating a file binding for the file definition tdx/icecream. The file_binding_entity_creation attribute defines what should happen for entities that don’t already exist in the workspace. Setting this to accumulate will automatically create these entities, and setting it to none will make the import fail for entries that reference non-existing entities. We’d like entities to be created automatically, so we picked accumulate.

The predicate_binding_by_names define which of the file column names should be used as arguments to which predicates. For instance, in our example CSV file we have this line:

Popsicle Lemon|25|50

Based on our first predicate_binding_by_name mapping, Popsicle Lemon will be used as first argument to core:icecream:cost predicate, and 25 as its second. Since cost is a functional predicate, this means that Popsicle Lemon will be its key and 25 its value. Effectively this mean the following code will run:

+core:icecream:cost["Popsicle Lemon"] = 25.

The mapping for the price works exactly the same, but instead uses PRICE as a second argument.

And then the last bit of the puzzle: mapping our service to an actual URL:

That’s it. Again we define a three-column file, but this time with “ICECREAM”, “WEEK” and “SALES” columns. These are then mapped to a single predicate week_sales where the first two are used as the key and the last value as a value. That is:

ICECREAM|WEEK|SALES
Popsicle Lemon|1|100

Would result in something like this:

//lang:logiql
+core:sales:week_sales["Popsicle Lemon", 1] = 100.

And just like the previous service this one can now be used to do both import and export.

Extending config.py

We have to make a few minor changes to our config.py file used for building our project:

We have to add lb_web as a dependency to the project and our application library.

For convenience we’ll add a make start-service target that compiles your project, loads it into a workspace and starts the TDX services.

Adding the lb_web dependency is as easy tweaking our depends_on and lb_library calls: