Documentation

15.2. Planning your Export Strategy

The important point when planning to export data, is
deciding:

What data to export

When to export the data

Where to export data to

It is possible to export all of the data in a VoltDB database. You
would do this by creating export-only replicas of all tables in the schema
and writing to the export-only table whenever you insert into the normal
table. However, this means the same number of transactions and volume of
data that is being processed by VoltDB will be exported through the
connector. There is a strong likelihood, given a high transaction volume,
that the target database will not be able to keep up with the load VoltDB
is handling. As a consequence you will usually want to be more selective
about what data is exported when.

If you have an existing target database, the question of what data
to export is likely decided for you (that is, you need to export the data
matching the target's schema). If you are defining both your VoltDB
database and your target at the same time, you will need to think about
what information is needed "downstream" and create the appropriate
export-only tables within VoltDB.

The second consideration is when to export the
data. For tables that are not updated frequently, inserting the data to a
complementary export-only table whenever data is inserted into the real
table is the easiest and most practical approach. For tables that are
updated frequently (hundreds or thousands of times a second) you should
consider writing a copy of the data to an export-only table at an
appropriate milestone.

Using the flight reservation system as an example, one aspect of the
workflow not addressed by the application described in Chapter 6, Designing VoltDB Client Applications is the need to archive information about the
flights after takeoff. Changes to reservations (additions and
cancellations) are important in real time. However, once the flight takes
off, all that needs to be recorded (for billing purposes, say) is what
reservations were active at the time.

In other words, the archiving database needs information about the
customers, the flights, and the final reservations. According to the
workload in Table 4.1, “Example Application Workload”, the customer and flight
tables change infrequently. So data can be inserted into the export-only
tables at the same time as the "live" flight and reservation tables. (It
is a good idea to give the export-only copy of the table a meaningful name
so its purpose is clear. In this example we identify the export-only
tables with the export_ prefix or, in the case of the reservation table
which is not an exact copy, the _final suffix.)

The reservation table, on the other hand, is updated frequently. So
rather than export all changes to a reservation to the export-only
reservation table in real-time, a separate stored procedure is invoked
when a flight takes off. This procedure copies the final reservation data
to the export-only table and deletes the associated flight and reservation
records from the VoltDB database. Figure 15.2, “Flight Schema with Export Table” shows
the modified database schema with the added export-only tables,
EXPORT_FLIGHT, EXPORT_CUSTOMER, and RESERVATION_FINAL.

Figure 15.2. Flight Schema with Export Table

This design adds a transaction to the VoltDB application, which is
executed approximately once a second (when a flight takes off). However,
it reduces the number of reservation transactions being exported from 1200
a second to less than 200 a second. These are the sorts of trade offs you
need to consider when adding export functionality to your
application.

The third decision is where to export the data to. As described in
Section 15.4, “Configuring Export in the Deployment File”, you can export the data through multiple
different protocols: files, HTTP. JDBC, etc. Your choice of protocol will
depend on the ultimate target destination for your exported data.

You can also export to multiple destinations at once. When you
declare an export table, you assign it to a specific
stream. If you want different tables to be exported
to different destinations, you can declare the tables to belong to
different streams. Then in the deployment file you can configure each
stream to be exported to a different destination.