Introduction

The self-managed version of GraphDB is a hosted database in the Cloud providing all the power of a scalable triple store as a pay-by-the-hour service through Amazon Web Services. GraphDB (Free or Standard Edition) can be purchased as an AMI running on EC2 instances from 1-core / 2 GB RAM to 8-core / 64 GB RAM.

Our customers often tell us that they want to develop and test in the cloud before bringing projects in-house. Now, you can do that without the need for buying GraphDB licenses or provision hardware first - GraphDB in the Cloud is perfect for running limited-time projects or low-volume experiments in a production-quality setting without an investment in hardware.

All GraphDB instances are designed to store data on user-supplied Amazon EBS volumes (network attached storage), so that your data is persisted and safe even if the instance is not running. GraphDB in the Cloud is accessible via standard RESTful APIs and SPARQL endpoints

Amazon Web Services

The following Amazon Web Services concepts which are related to running GraphDB on the AWS cloud:

AWS Marketplace is an online marketplace which makes it possible for customers to use its "1-Click deployment" to instantly launch pre-configured software and services on the AWS cloud infrastructure and pay only for what they use by the hour

The GraphDB software is available as a product on the AWS Marketplace.

AMI (Amazon Machine Image) provides a virtual server image which can be instantly launched on the AWS cloud

GraphDB provides such an AMI, and customers can provision it on virtual instances running on AWS.

EC2 (Elastic Compute Cloud) is the computing infrastructure where AMIs are launched as virtual instances. Security groups configure the firewalls controlling the netwprk traffic to a running virtual EC2 instance. Key pairs are used to encrypt and decrypt login information and must be used for accessing a running EC2 instance.

The GraphDB AMI will be provisioned as an EC2 virtual instance and a security group will be used to restrict network access to the instance, based on the user preferences

the user will use the private key pair to log into the running EC2 virtual instance with GraphDB

the EBS volume is created via and managed by the user's own AWS account. The user is responsible for data volume maintenance tasks such as: volume expansion, snapshots, backup & restore.

on-demand EC2 instances are charged by the hour with no long-term commitments or upfront payments, while the reserved EC2 instances provide a cheaper alternative to on-demand instances for longer term use. Note that GraphDB SHOULD NOT be deployed on spot instances, since they can be terminated abruptly which can lead to database file corruption.

Pricing Details

GraphDB in the AWS cloud is available in various server configurations:

database type

AWS instance type

virtual cores

RAM (GB)

price ($/hour)
GraphDB Cloud | GraphDB Free

EC2 cost ($/hour)

data volume estimate
(triples)

XS

T2-S

1

2

------ | free

0.02

50 million

S

T2-M

2

4

------ | free

0.03 - 0.05 (reserved/on-demand)

200 million

M

M4-L / T2-L

2

8

0.35 | free

0.10 - 0.14 (reserved/on-demand)

500 million

L

R3-L

2

15

0.40 | free

0.11 - 0.18 (reserved/on-demand)

1 billion

XL

R3-XL

4

30

0.75 | free

0.22 - 0.35 (reserved/on-demand)

2 billion

2XL

R3-2XL

8

61

1.40 | free

0.44 - 0.70 (reserved/on-demand)

4 billion

The EC2 cost depends on the type of instance being used - on demand instances are optimal only for short term and occasional use, while reserved instances are optimal for longer term and more frequent use.

Note that GraphDB in the AWS cloud SHOULD NOT be deployed on spot instances, since they can be abruptly terminated and this may lead to data corruption

The purchase preview screen offers two options for launching the product: 1-Click Launch and Manual Launch. The following sections follow the process of manual launching the product via the EC2 Console describing the various configuration options and their default values

EC2 Instance Configuration & Startup

Add storage. The GraphDB AMI is bundled with a pair of EBS volumes - one for the application and one for the data storage. The latter can be reused beyond the life-cycle of the product usage and initially it contains no data. There are several important parameters which might be adjusted at this step:

Volume size - by default it will allocate 4GiB (sufficient for approximately 15 million triples) but depending on the estimated needs the size should be adjusted prior to volume creation

Volume type - affects the IO performance (SSD vs Magnetic drives)

Delete on Termination SHOULD NOT be selected. Otherwise the data will be lost after machine termination

Device name (/dev/sdf) SHOULD NOT be changed

If there already exists a data volume from previous use of the system, remove the second volume configuration row and attach the old volume manually when the instance is already running (as /dev/sdf)

creating a security group (or reusing an existing one). Two ports has to be opened: 22 (SSH) for EC2 instance management; 8080 (HTTP) for accessing GraphDB service (Workbench UI & RESTful APIs)

creating a key pair (or reusing an existing one)

Review and Launch

GraphDB Startup

Login into the instance via SSH using the private key for the EC2 instance and user ec2-user

run the script responsible for proper mounting of the EBS data volume:

The script verifies that the EBS data volume is properly attached and creates a mount point for it. If the EBS volume is not attached yet for some reason, the script prompts the user for that and performs several delayed retries giving time to the user to attach the volume via the AWS Management Console. If the time is not sufficient this script should be rerun again.

On successful execution of the script confirms that the volume is mounted and prints out the mount point location: /data_mount/data.

Running the GraphDB service:

The script will verify that the data volume is available (if not it terminates with a reminder message) and will start the service:

Workbench Configuration

Open the GraphDB Workbench UI in your web browser under http://<instance-public-url>:8080

if you are running GraphDB version 6.6.5 or older, the service URL is http://<instance-public-url>:8080/graphdb

If you are running GraphDB version 6.6.5 or older for the first time, then you have to setup the data location manually via Admin > Locations and Repositories > Attach Location.

In the newer GraphDB versions this property is preset

If the data volume attached was used previously, the old repositories will be detected and listed under Admin > Locations and Repositories.

Verifying the Configuration & Startup

Testing the service. Back in the SSH console, test the configuration of the GraphDB instance by executing:

It will perform various automated tests like creating a repository, loading some data, query the data and delete the repository. Results from each test is printed in the console.

GraphDB Performance Tuning

This section provides a guidance on the recommended configuration for your GraphDB server.

The following parameters control the amount of memory assigned to each of the different caches:

configuration settings per
instance type

parameter name
(unit)

description

M

L

XL

2XL

data volume estimate (triples)

-

-

500 million

1 billion

2 billion

4 billion

AWS instance type / RAM

-

-

M4-L /8 GB

R3-L /15 GB

R3-XL /30 GB

R3-2XL /61 GB

Entity index size

entity-index-size

defines the number of entity hash table index entries; the bigger the size, the less the collisions in the hash table and the faster the entity retrieval; the entity hash table does not rehash, so its index size is constant throughout the life of the repository.

75000000

150000000

300000000

600000000

Total cache memory

cache-memory
(bytes)

The amount of memory to be distributed among different caches

3414m

6394m

12924m

26551m

Tuple index memory

tuple-index-memory
(bytes)

Memory used for PSO and POS caches

2561m

4796m

9693m

19913m

Enable predicate indices

enablePredicateList

enables or disables mappings from an entity (subject or object) to its predicates; switching this on can drastically speed up queries that use wildcard predicate patterns.

yes

yes

yes

yes

Predicate index memory

predicate-memory
(bytes)

specifies the amount of memory to be used for predicate lists cache

853m

1598m

3231m

6638m

Use context index

enable-context-index

if set to 'true' then GraphDB will build and use the context index/indices

yes

yes

yes

yes

All of these performance related settings can be configured from the GraphDB Workbench at repository creation time:

GraphDB Shutdown & Restart

The termination of the GraphDB service should be done only via the provided shell script:

This will perform a graceful shutdown of the service persisting any in memory data to the EBS volume. This operation might take some time so be sure there's no active java process prior to restarting the service or terminating the EC2 instance.

The GraphDB service can be started again at any time (only possible if the EC2 is stopped rather than terminated) with these steps:

Mount the external EBS volume with the data:

Start the GraphDB service:

Stopping the EC2 Instance

Note that the GraphDB service has to be gracefully shut down as explained in the previous step

The EC2 resources can be completely or partially released depending on the use case requirements:

stopping the instance - this operation stops the instance and preserves its filesystem state. You can use the EC2 Management Console for performing this task. This scenario is appropriate when the service is not needed for certain time period but it will be restarted later when it is necessary. In this case the attached EBS volume remains attached.

terminating the instance - complete termination of the service. This terminates the EC2 machine and its file system. Only the EBS data volume remains intact and it is automatically detached.

Managing Repositories & Querying Data

infer (optional) - specifies whether inferred statements should be included in the query evaluation. Inferred statements are included by default ("true")

This resource represents a SPARQL query endpoint for the repository

/repositories/<REPOSITORY>

POST

same as GET

same as GET. POST can be used in cases where the length of the (URL-encoded) query exceeds practicable limits of proxies, servers, etc. In case a POST request is used, the query parameters should be send to the server as www-form-urlencoded data.

/repositories/<REPOSITORY>

DELETE

-

deletes a repository and its data from the database

Create, Read, Upload & Delete Data

resource

method

parameters

details

/repositories/<REPOSITORY>/statements

GET

subj (optional) - restricts the GET operation to statements with the specified resource as subject

pred (optional) - restricts the GET operation to statements with the specified URI as predicate.

obj (optional) - restricts the GET operation to statements with the specified value as object

context (optional) - If specified, restricts the operation to one or more specific contexts in the repository

infer (optional) - Specifies whether inferred statements should be included in the result of GET requests. Inferred statements are included by default. Specifying any value other than "true" (ignoring case) restricts the request to explicit statements only

fetches specific (or all) statements from the repository

/repositories/<REPOSITORY>/statements

POST

baseURI (optional) Specifies the base URI to resolve any relative URIs found in uploaded data against

update (optional) - specifies the SPARQL 1.1 Update string to be executed. The value is expected to be a syntactically valid SPARQL 1.1 Update string

Performs updates on the data in the repository. The data supplied with this request is expected to contain either an RDF document, a SPARQL 1.1 Update string, or a special purpose transaction document. If an RDF document is supplied, the statements found in the RDF document will be added to the repository. If a SPARQL 1.1 Update string is supplied, the update operation will be parsed and executed. If a transaction document is supplied, the updates specified in the transaction document will be executed

/repositories/<REPOSITORY>/statements

PUT

baseURI (optional) Specifies the base URI to resolve any relative URIs found in uploaded data against

Updates data in the repository, replacing any existing data with the supplied data. The data supplied with this request is expected to contain an RDF document (RDF/XML, N-triples, Turtle, N3, RDF/JSON, ...)

/repositories/<REPOSITORY>/statements

DELETE

subj (optional) - restricts the DELETE operation to statements with the specified resource as subject

pred (optional) - restricts the DELETE operation to statements with the specified URI as predicate.

obj (optional) - restricts the DELETE operation to statements with the specified value as object

context (optional) - If specified, restricts the operation to one or more specific contexts in the repository

Deletes statements from the repository

Working with Named Graphs

resource

method

parameters

details

/repositories/<REPOSITORY>/rdf-graphs

GET

-

get information on the named graphs in the repository

/repositories/<REPOSITORY>/rdf-graphs/<GRAPH>

GET

fetches statements in the named graph from the repository

/repositories/<REPOSITORY>/rdf-graphs/<GRAPH>

PUT

Updates data in the named graph in the repository, replacing any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in some of the supported RDF formats

/repositories/<REPOSITORY>/rdf-graphs/<GRAPH>

POST

Updates data in the named graph in the repository, adding to any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in some of the supported RDF formats

/repositories/<REPOSITORY>/rdf-graphs/<GRAPH>

DELETE

Delete all data in the named graph in the repository.

/repositories/<REPOSITORY>/rdf-graphs/service

GET

graph(optional) - specifies the URI of the named graph to be accessed

default(optional) - specifies that the default graph is to be accessed. This parameter is expected to be present but have no value.NOTE: Each request needs to specify precisely one of the above parameters.

fetches statements in the named graph from the repository

/repositories/<REPOSITORY>/rdf-graphs/service

PUT

graph (optional) - specifies the URI of the named graph to be accessed

default (optional) - specifies that the default graph is to be accessed. This parameter is expected to be present but have no value.NOTE: Each request needs to specify precisely one of the above parameters.

Updates data in the named graph in the repository, replacing any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in some of the supported RDF formats

/repositories/<REPOSITORY>/rdf-graphs/service

POST

graph (optional) - specifies the URI of the named graph to be accessed

default (optional) - specifies that the default graph is to be accessed. This parameter is expected to be present but have no value.NOTE: Each request needs to specify precisely one of the above parameters.

Updates data in the named graph in the repository, adding to any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in some of the supported RDF formats

/repositories/<REPOSITORY>/rdf-graphs/service

DELETE

graph (optional) - specifies the URI of the named graph to be accessed

default (optional) - specifies that the default graph is to be accessed. This parameter is expected to be present but have no value.NOTE: Each request needs to specify precisely one of the above parameters.

Delete all data in the named graph in the repository.

Working with Namespaces and Contexts

resource

method

parameters

details

/repositories/<REPOSITORY>/contexts

GET

-

Gets a list of resources that are used as context identifiers

/repositories/<REPOSITORY>/size

GET

context(optional) - If specified,restricts
the operation to one or more specific contexts in the repository

Gets the number of triples in a repository

/repositories/<REPOSITORY>/namespaces

GET

-

Gets a list of namespace declarations that have been defined for the repository

/repositories/<REPOSITORY>/namespaces

DELETE

-

Removes all namespace declarations from the repository.

/repositories/<REPOSITORY>/namespaces/<PREFIX>

GET

-

Gets the namespace that has been defined for a particular prefix

/repositories/<REPOSITORY>/namespaces/<PREFIX>

PUT

-

Defines or updates a namespace declaration, mapping the prefix to the namespace that is supplied in plain text in the request body

Backup & Restore

Backing up the data is a simple process of taking snapshot of the EBS data volume. The snapshot then can be used for restoring the application data state or for replication of the data or migrating it to other data center.

The proper order of steps for data backup are:

stop the GraphDB service to ensure all in-memory data is persisted properly on the file system

stop the AWS instance to ensure the file system is in consistent state

take a snapshot of the EBS data volume

restart the AWS instance and the GraphDB service.

Data restore steps (on running AWS instance):

stop the GraphDB service if it is running

detach the old EBS data volume (if any)

create a new EBS volume from the backup data snapshot

attach the new volume on /dev/sdf device

run the attach_data_vol.sh script and then the GraphDB service

Data restore steps (new AWS instance):

in the Launch instance wizard,

remove the default blank data volume

add the backup data snapshot as a source for the data volume

follow the rest of the start-up and configuration procedure described above

Upgrading the GraphDB Product

This section describes the procedure for upgrading the GraphDB product whenever a newer version is available on the AWS Marketplace. An older version of GraphDB will still remain functional, but updating to the latest one is always recommended due to the improvements in performance and stability.

The upgrade process should follow these steps:

starting a new EC2 instance with the latest version of the GraphDB product via the AWS Marketplace. We'll refer to this instance as EC2-NEW

stopping the GraphDB service/process on the old instance (we'll refer to it as EC2-OLD)

detaching the EBS data volume from EC2-OLD

attaching the EBS data volume to EC2-NEW

starting the GraphDB service/process on EC2-NEW

terminating the EC2-OLD instance which is no longer needed

The following sections provide detailed instructions & screenshots for performing the upgrade procedure:

Log into the EC2-OLD instance

Stop the GraphDB service/process
Use the graphdb.sh script to stop the service

Unmount the EBS data volume in order to transfer it later to the EC2-NEW instance

From the AWS Management Console detach the EBS data volume from the EC2-OLD instance. To identify the correct volume, in the Attachment Information column search for value: <old-instance-id>:/dev/sdf

Launch the EC2-NEW instance with the latest version of the GraphDB product