Task 1

Implement the TPC-C "New Order" transaction and storage for all relations as a standalone C++ program.

Write a load procedure to populate the database with the data provided in the *.tbl files.

Translate the "New Order" (neworder.script) transaction to C++ using the following mapping from SQL to C data types:

integer: int32_t

char,varchar: length plus character array (do not use pointers or std::string)

timestamp: uint64_t (seconds since 01.01.1970)

numeric(x,y): int64_t (x is the total number of digits, y is number of digits after the decimal separator, for example numeric(10,2) x=12.3 is represented as 1230)

Test your implementation by translating newOrderRandom (from oltp.script) to C++ and executing it 1 million times; how many newOrder transactions per second can your implementation execute?

Before and after executing the transactions print the number of tuples in the order, neworder, and orderline tables

You can find the initial data files and a script implementing the newOrder transaction here. Send your solution to Viktor Leis until 21 Oct 2015, 2pm.

Task 2

Write a simple parser for SQL schema files. It should handle the TPC-C schema file. You can assume that all attributes are declared NOT NULL, and you must support additional non-primary (non-unique) key indexes. This code may be helpful, but it does not support all needed features.

Write a generator that generates C++ code from the schema. It must be possible to store, insert, delete, access attributes, and find (using the primary key) tuples.

(Manually) re-implement the newOrder transaction using the generated code.

Run the query 10 times in sequence. For each run, print the result of the query (should be 1367872001.25 initially) and measure the execution times.

Execute 1 million transactions (the newOrder and delivery mix). Using the fork system call, run the query concurrently. Whenever a query is finished, create a new snapshot using fork, so that exactly one snapshot and query is active at any given time. This example for using fork may be helpful.

Measure how many transactions per second you can process using this snapshotting model, how long the queries take on average, and how long the fork takes.

Task 5

The where clause is optional and can contain one or more selections (connected by "and"). You only need to support selections of the form "attribute = attribute" and "attribute = constant".

You can assume that each relation is referenced only once, and that all attribute names are unique.

Implement semantic analysis, which takes the parsed SQL statement and
the schema as input and computes an algebra tree that can be used as
input to your code generator. Report errors when non-existing
attributes or tables are referenced. Also, you should report an error
if a table has no join condition (i.e., a cross product would be
necessary). Build left-deep join trees based on the order of the
relations in the from clause.

Task 6

Create a new query engine similar to Task 5 that generates parallel code for multi-core CPUs using the Intel TBB library.
Use the tbb::parallel_for construct to parallelize the table scans and tbb::concurrent_unordered_map for the join hash tables.

Compare the performance of the old, single-threaded engine with the new engine using a single thread (use tbb:task_scheduler_init init(tr); to restrict the number of cores used by TBB)

Compare the performance of your new engine using either a single thread or multiple threads.