Encrypting Data and Database Connections

A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.

Encrypting Data and Database Connections

Best practices for implementing encryption and managing keys.

Encryption can be used to protect data in a Greenplum Database system in the following
ways:

Connections between clients and the master database can be encrypted with SSL. This is
enabled by setting the ssl server configuration parameter to
on and editing the pg_hba.conf file. See "Encrypting
Client/Server Connections" in the Greenplum Database Administrator Guide for
information about enabling SSL in Greenplum Database.

Network communications between hosts in the Greenplum Database cluster can be encrypted
using IPsec. An authenticated, encrypted VPN is established between every pair of hosts in
the cluster. Check your operating system documentation for IPsec support, or consider a
third-party solution such as that provided by Zettaset.

The pgcrypto module of encryption/decryption functions protects data
at rest in the database. Encryption at the column level protects sensitive information,
such as passwords, Social Security numbers, or credit card numbers. See Encrypting Data in Tables using PGP for an example.

Best Practices

Encryption ensures that data can be seen only by users who have the key required to
decrypt the data.

Encrypting and decrypting data has a performance cost; only encrypt data that requires
encryption.

Do performance testing before implementing any encryption solution in a production
system.

Server certificates in a production Greenplum Database system should be signed by a
certificate authority (CA) so that clients can authenticate the server. The CA may be
local if all clients are local to the organization.

Client connections to Greenplum Database should use SSL encryption whenever the
connection goes through an insecure link.

A symmetric encryption scheme, where the same key is used to both encrypt and decrypt,
has better performance than an asymmetric scheme and should be used when the key can be
shared safely.

Use functions from the pgcrypto module to encrypt data on disk. The data is encrypted
and decrypted in the database process, so it is important to secure the client connection
with SSL to avoid transmitting unencrypted data.

Key Management

Whether you are using symmetric (single private key) or asymmetric (public and private key)
cryptography, it is important to store the master or private key securely. There are many
options for storing encryption keys, for example, on a file system, key vault, encrypted
USB, trusted platform module (TPM), or hardware security module (HSM).

Encrypting Data at Rest with pgcrypto

The pgcrypto module for Greenplum Database provides functions for encrypting data at rest
in the database. Administrators can encrypt columns with sensitive information, such as
social security numbers or credit card numbers, to provide an extra layer of protection.
Database data stored in encrypted form cannot be read by users who do not have the
encryption key, and the data cannot be read directly from disk.

pgcrypto is installed by default when you install Greenplum Database. You must
explicitly enable pgcrypto in each database in which you want to use the module.

pgcrypto allows PGP encryption using symmetric and asymmetric encryption. Symmetric
encryption encrypts and decrypts data using the same key and is faster than asymmetric
encryption. It is the preferred method in an environment where exchanging secret keys is not
an issue. With asymmetric encryption, a public key is used to encrypt data and a private key
is used to decrypt data. This is slower then symmetric encryption and it requires a stronger
key.

Using pgcrypto always comes at the cost of performance and maintainability. It is important
to use encryption only with the data that requires it. Also, keep in mind that you cannot
search encrypted data by indexing the data.

Before you implement in-database encryption, consider the following PGP limitations.

No support for signing. That also means that it is not checked whether the encryption
sub-key belongs to the master key.

No support for encryption key as master key. This practice is generally discouraged,
so this limitation should not be a problem.

No support for several subkeys. This may seem like a problem, as this is common
practice. On the other hand, you should not use your regular GPG/PGP keys with pgcrypto,
but create new ones, as the usage scenario is rather different.

Greenplum Database is compiled with zlib by default; this allows PGP encryption functions
to compress data before encrypting. When compiled with OpenSSL, more algorithms will be
available.

Because pgcrypto functions run inside the database server, the data and passwords move
between pgcrypto and the client application in clear-text. For optimal security, you should
connect locally or use SSL connections and you should trust both the system and database
administrators.

pgcrypto configures itself according to the findings of the main PostgreSQL configure
script.

When compiled with zlib, pgcrypto encryption functions are able to
compress data before encrypting.

Pgcrypto has various levels of encryption ranging from basic to advanced built-in
functions. The following table shows the supported encryption algorithms.

Creating PGP Keys

To use PGP asymmetric encryption in Greenplum Database, you must first create public and
private keys and install them.

This section assumes you are installing Greenplum Database on a Linux machine with the Gnu
Privacy Guard (gpg) command line tool. Pivotal recommends using the latest
version of GPG to create keys. Download and install Gnu Privacy Guard (GPG) for your
operating system from https://www.gnupg.org/download/. On the GnuPG website you will find installers for popular Linux
distributions and links for Windows and Mac OS X installers.

As root, execute the following command and choose option 1 from the
menu:

# gpg --gen-key
gpg (GnuPG) 2.0.14; Copyright (C) 2009 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
gpg: directory `/root/.gnupg' created
gpg: new configuration file `/root/.gnupg/gpg.conf' created
gpg: WARNING: options in `/root/.gnupg/gpg.conf' are not yet active during this run
gpg: keyring `/root/.gnupg/secring.gpg' created
gpg: keyring `/root/.gnupg/pubring.gpg' created
Please select what kind of key you want:
(1) RSA and RSA (default)
(2) DSA and Elgamal
(3) DSA (sign only)
(4) RSA (sign only)
Your selection? 1

Respond to the prompts and follow the instructions, as shown in this
example:

RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048) Press enter to accept default key size
Requested keysize is 2048 bits
Please specify how long the key should be valid.
0 = key does not expire
<n> = key expires in n days
<n>w = key expires in n weeks
<n>m = key expires in n months
<n>y = key expires in n years
Key is valid for? (0) 365
Key expires at Wed 13 Jan 2016 10:35:39 AM PST
Is this correct? (y/N) y
GnuPG needs to construct a user ID to identify your key.
Real name: John Doe
Email address: jdoe@email.com
Comment:
You selected this USER-ID:
"John Doe <jdoe@email.com>"
Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O
You need a Passphrase to protect your secret key.
(For this demo the passphrase is blank.)
can't connect to `/root/.gnupg/S.gpg-agent': No such file or directory
You don't want a passphrase - this is probably a *bad* idea!
I will do it anyway. You can change your passphrase at any time,
using this program with the option "--edit-key".
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
gpg: /root/.gnupg/trustdb.gpg: trustdb created
gpg: key 2027CC30 marked as ultimately trusted
public and secret key created and signed.
gpg: checking the trustdbgpg:
3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0 valid: 1 signed: 0 trust: 0-, 0q, 0n, 0m, 0f, 1u
gpg: next trustdb check due at 2016-01-13
pub 2048R/2027CC30 2015-01-13 [expires: 2016-01-13]
Key fingerprint = 7EDA 6AD0 F5E0 400F 4D45 3259 077D 725E 2027 CC30
uid John Doe <jdoe@email.com>
sub 2048R/4FD2EFBB 2015-01-13 [expires: 2016-01-13]

Note: Different keys may have the same ID. This is rare, but is a normal event. The client
application should try to decrypt with each one to see which fits — like handling
ANYKEY. See pgp_key_id() in the pgcrypto documentation.

If
you created a key with passphrase, you may have to enter it here. However for the
purpose of this example, the passphrase is blank.

Encrypting gpfdist Connections

The gpfdists protocol is a secure version of the gpfdist
protocol that securely identifies the file server and the Greenplum Database and encrypts
the communications between them. Using gpfdists protects against
eavesdropping and man-in-the-middle attacks.

The gpfdists protocol implements client/server SSL security with the
following notable features:

Client certificates are required.

Multilingual certificates are not supported.

A Certificate Revocation List (CRL) is not supported.

The TLSv1 protocol is used with the TLS_RSA_WITH_AES_128_CBC_SHA
encryption algorithm. These SSL parameters cannot be changed.

SSL renegotiation is supported.

The SSL ignore host mismatch parameter is set to false.

Private keys containing a passphrase are not supported for the gpfdist
file server (server.key) or for the Greenplum Database (client.key).

It is the user's responsibility to issue certificates that are appropriate for the
operating system in use. Generally, converting certificates to the required format is
supported, for example using the SSL Converter at https://www.sslshopper.com/ssl-converter.html.

A gpfdist server started with the --ssl option can only
communicate with the gpfdists protocol. A gpfdist server
started without the --ssl option can only communicate with the
gpfdist protocol. For more detail about gpfdist refer to
the Greenplum Database Administrator Guide.

There are two ways to enable the gpfdists protocol:

Run gpfdist with the --ssl option and then use the
gpfdists protocol in the LOCATION clause of a
CREATE EXTERNAL TABLE statement.

Use a YAML control file with the SSL option set to true and run
gpload. Running gpload starts the
gpfdist server with the --ssl option and then uses
the gpfdists protocol.

When using gpfdists, the following client certificates must be located in the
$PGDATA/gpfdists directory on each segment:

The client certificate file, client.crt

The client private key file, client.key

The trusted certificate authorities, root.crt

Important: Do not protect the private key with a passphrase. The server does not
prompt for a passphrase for the private key, and loading data fails with an error if one is
required.

When using gpload with SSL you specify the location of the server
certificates in the YAML control file. When using gpfdist with SSL, you
specify the location of the server certificates with the --ssl option.

The following example shows how to securely load data into an external table. The example
creates a readable external table named ext_expenses from all files with
the txt extension, using the gpfdists protocol. The files
are formatted with a pipe (|) as the column delimiter and an empty space as
null.