Database Technologies

Wednesday, December 10, 2014

In the course of my work at EnterpriseDB, migrating Oracle databases to EnterpriseDB's Postgres Plus Advanced Server is a common task. However, now and then we encounter unique situations. While working on a migration project recently, we encountered a new use case for RLS (Row level Security).

The customer had a centralized database where it stored a huge number of transactions. These transactions are performed by different business units located in different parts of the world. There are certain types of transactions that should not be visible even if they are being queried by the same company. That is where RLS comes in. With RSL, specific transactions, or kinds of transactions, that can remain visible are mapped back to an attribute in the table.

The customer needed the application to authenticate users and set the context for which records in the database become visible for a specific session.
In its deployment of Oracle, the customer had used the functions/procedure in the Oracle package DBMS_SESSION. In the application, the customer used DBMS_SESSION.SET_CONTEXT to set the context. And for the Row Level Security, the customer was using the DBMS_SESSION.SYS_CONTEXT to implement security around the transactions.

Postgres Plus Advanced Server has a DBMS_SESSION package that is compatible with Oracle. However, it does not currently offer users the capability of setting the user defined context and implementing RLS based on those context. Given others may experience similar situations , as our customer, I wanted to provide the procedures and functions that users could deploy.

Monday, December 1, 2014

PLV8 is a programming language that lets users write stored procedures and triggers in JavaScript and store them in their Postgres database. This allows application programmers to write a lot of their server-side programming in the same language they use to build their web client applications. Fewer languages to learn usually means fewer mistakes and faster time to completion. The extensive language support is one of many reasons why Postgres’ use across the world is increasing lately. The recent addition of document data support with JSON and JSONB data types in PostgreSQL, and in Postgres Plus Advanced Server from EnterpriseDB, is the main reason for the increasing interest in the PL/V8 language extension.

Below are the steps you need to compile PLV8 with Postgres Plus Advanced Server 9.3/9.4.

To get started, here are the prerequisites:
1. A supported version of PostgreSQL or Postgres Plus Advanced Server, such as versions 9.1 and higher.
2. V8 version 3.14.5
3. g++ version 4.5.1

If you want to know more about V8, you can visit the following wiki page:
http://en.wikipedia.org/wiki/V8_(JavaScript_engine)

It’s important to note that when compiling PLV8 with Postgres Plus Advanced Server 9.3 or the upcoming 9.4, you will get the following two types of error messages:

Tuesday, November 18, 2014

EnterpriseDB recently launched a new tool for backup and recovery – named simply EDB Backup and Recovery Tool, or BART. This tool makes the DBA’s life easier by simplifying the tasks for managing their Postgres physical backup and recovery tasks, whether they are PostgreSQL or Postgres Plus Advanced Server deployments.

BART has the following advantages over custom scripts for managing backups:

1. It’s stable and it uses the tool pg_basebackup to take a physical backup. This tool has been well defined and is well-supported by the PostgreSQL community.

2. It catalogs all of the backups users are taking, which is important in terms of: i. Listing the type of backups used ii. Listing the status of those backups with server information.

3. BART also provides functionality to restore backups, with all required archived WAL files. So automation around this tool will make DBAs’ lives easier for restore and recovery.

4. BART provides an option to validate your backup by using checksum. This is useful for confirming you took a valid backup and it is not corrupted at disk level.

5. BART provides an option to define your retention policy around the backups you are keeping.

Given all of the above advantages, I decided to give this new tool a try and share some tips. To get started, you need the following prerequisites:

1. BART currently requires a Linux 64 bit platform, CentOS 6.x or RHEL 6.x2. Need to have password-less, direct SSH access to the target machine where you want to restore backups as well as the database servers you want backed up3. Install the Postgres Plus Advanced Server or PostgreSQL binaries for pg_basebackup

Yum or rpm

To install this tool, you have two options that I will explore below:

1. Yum command
2. Rpm command.

Using the yum command:

To perform a yum command installation, BART users can ask EDB for credentials to the EnterpriseDB yum repository and configure the their local yum repository as follows:

After creating the yum repo, the user can execute the following command to install BART:

yum install edb-bart

If the user doesn't want to install the EDB Backup and Recovery Tool using the yum command, then the user can download a free standing rpm using the link below from EDB’s website:

http://www.enterprisedb.com/downloads/postgres-postgresql-downloads

and then enter the rpm install command as follows:

rpm -ivh edb-bart-1.0.1-1.rhel6.x86_64.rpm

After installing BART using the above commands, the user can see the binaries in the directory:/usr/edb-bart-1.0/bin and a sample BART configuration file in /usr/edb-bart-1.0/etc

That’s a very easy installation.

For more information on configuring BART Host and Database Host, the following are some documents that will help:1. pg_basebackup configuration for PostgreSQL:http://www.postgresql.org/docs/current/static/app-pgbasebackup.html

2. For direct password less ssh configuration user can refer following link http://www.enterprisedb.com/docs/en/1.0/bart/EDB_Backup_and_Recovery_Tool_Guide-17.htm#P1008_76316

After the installation of the BART binaries, the user also has to create a BART configuration file.

Content under the [BART] tag are called global configuration settings. Under this tag are the following:

1. bart-host: the IP address of the host on which BART is installed. The value for this parameter must be specified in the form: bart_user@bart_host_address, where bart_user is the operating system user account on the BART host that is used to run BART and owns the BART backup catalog directory. bart_host_address is the IP address of the BART host.

2. backup_path: specifies the file system parent directory where all BART database server base backups and archived WAL files are stored. This parameter is required.

3. pg_basebackup_path: specifies the path to the pg_basebackup program of the Postgres database server installed on the BART host.

4. log file: specifies the path to the BART log file. This parameter is optional. If no path to a log file is specified after logfile =, or if the parameter is commented out, BART does not create a log file.

The remaining part of configuration file is self-explanatory. The TAG: [PG]/[PPAS94] part is content for servers which the user wants to back up.

Pg_basebackup Settings

After performing the above configuration on the Backup Server, the user has to do set following settings on the servers that they want to back up. Below are the settings for enabling backup using pg_basebackup.

The user has to set a few parameters in PostgreSQL postgresql.conf file, which he wants to backup:

1. wal_level parameter to archive or hot_standby.
2. archive_mode=on
3. archive_command setting.
4. max_wal_senders to 1 or more than one, since pg_basebackup uses the replication protocol to copy data directory.

For more information on each setting please refer to the following:1. wal_level:http://www.postgresql.org/docs/9.4/static/runtime-config-wal.html

Monday, June 30, 2014

PostgreSQL 9.3 has two key software updates making switchover/switchback easier in High Availability configurations.

First, let’s address the software patches and their descriptions:
1. First patch was committed by Fujii Masao.

Patch commit# 985bd7d49726c9f178558491d31a570d47340459

With this patch, the walsender process tries to send all outstanding WAL records to the standby in replication when the user shuts down the master.

This means:
a. All WAL records are synced between two servers after the clean shutdown of the master
b. After promoting the standby to new master, the user can restart the stopped master as new standby without a fresh backup from new master.

2. Second patch was committed by Heikki Linnakangas in PostgreSQL 9.3.

Patch commit# abfd192b1b5ba5216ac4b1f31dcd553106304b19

Before PostgreSQL version 9.3, streaming replication used to stop replicating if the timeline on the primary didn’t match the standby. This generally happens when the user promotes one of the standbys to become the new master. Promoting a standby always results in an increment of timeline ID and after that increment, other standbys will refuse to continue replicating.

With this patch in PostgreSQL 9.3, the standby asks the primary for any timeline history files that are missing from the standby when it connects – if the standby recovery.conf file has the following setting:
recovery_target_timeline='latest'

The missing files are sent using a new replication command TIMELINE_HISTORY, and stored in the standby's pg_xlog directory. Using the timeline history files, the standby can follow the latest timeline present in the primary, just as it can follow new timelines appearing in an archive WAL directory.

Because of above patches, if the user performs the following sequence of steps then switchover/switchback can be easily achieved:
To switchover from Master to Standby, use following steps:
On Master:
1. Use any one of following stop options for clean shutdown.

Wednesday, May 21, 2014

The people using PostgreSQL and the Streaming Replication feature seem to ask many of the same questions:
1. How best to monitor Streaming Replication?
2. What is the best way to do that?
3. Are there alternatives, when monitoring on Standby, to using the pg_stat_replication view on Master?
4. How should I calculate replication lag-time, in seconds, minutes, etc.?

In light of these commonly asked questions, I thought a blog would help. The following are some methods I’ve found to be useful.

Monitoring is critical for large infrastructure deployments where you have Streaming Replication for:
1. Disaster recovery
2. Streaming Replication is for High Availability
3. Load balancing, when using Streaming Replication with Hot Standby

PostgreSQL has some building blocks for replication monitoring, and the following are some important functions and views which can be use for monitoring the replication:

1. pg_stat_replication view on master/primary server.
This view helps in monitoring the standby on Master. It gives you the following details:

pid: Process id of walsender process
usesysid: OID of user which is used for Streaming replication.
usename: Name of user which is used for Streaming replication
application_name: Application name connected to master
client_addr: Address of standby/streaming replication
client_hostname: Hostname of standby.
client_port: TCP port number on which standby communicating with WAL sender
backend_start: Start time when SR connected to Master.
state: Current WAL sender state i.e streaming
sent_location: Last transaction location sent to standby.
write_location: Last transaction written on disk at standby
flush_location: Last transaction flush on disk at standby.
replay_location: Last transaction flush on disk at standby.
sync_priority: Priority of standby server being chosen as synchronous standby
sync_state: Sync State of standby (is it async or synchronous).

Above are some important functions/views, which are already available in PostgreSQL for monitoring the streaming replication.

So, the logical next question is, “What’s the right way to monitor the Hot Standby with Streaming Replication on Standby Server?”
If you have Hot Standby with Streaming Replication, the following are the points you should monitor:1. Check if your Hot Standby is in recovery mode or not:
For this you can use pg_is_in_recovery() function.

2.Check whether Streaming Replication is working or not.
And easy way of doing this is checking the pg_stat_replication view on Master/Primary. This view gives information only on master if Streaming Replication is working.

3. Check If Streaming Replication is not working and Hot standby is recovering from archived WAL file.
For this, either the DBA can use the PostgreSQL Log file to monitor it or utilize the following functions provided in PostgreSQL 9.3:

pg_last_xlog_replay_location();
pg_last_xact_replay_timestamp();

4. Check how far off is the Standby from Master.
There are two ways to monitor lag for Standby. i. Lags in Bytes: For calculating lags in bytes, users can use the pg_stat_replication view on the master with the function pg_xlog_location_diff function. Below is an example:

Thursday, May 15, 2014

PostgreSQL 9.3 has lot of new improvement like the addition of new operators for JSON data type in postgreSQL, that prompted me to explore its features for NoSQL capabilities.

MongoDB is one of NoSQL solutions that have gotten a great deal of attention in the NoSQL market. So, this time I thought to do some benchmarking with the NoSQL capability of JSON in MongoDB and the JSON datatype in PostgreSQL 9.3

For this benchmark, I have used the same machine with no optimization in installation of PostgreSQL and MongoDB (since I wanted to see how things work, out of box with default installation). And I used the sample data from MongoDB's site, around which I had developed the functions which can generate random data using the same sample for Mongo and for PostgreSQL.

Wednesday, November 20, 2013

Postgres Plus Advanced Server 9.3, released on Tuesday for general availability, contains all of the innovation that was part of PostgreSQL 9.3, which was released in September.
These features include the following:

For Postgres Plus Advanced Server, we integrated into the core PostgreSQL additional performance improvements, new packages and Object oriented features so that our database can address a wider range of enterprise use cases.

Partitioning enhancements to boost performance for INSERTS/UPDATES/SELECT was a major development for Postgres Plus. Below are graphs illustrating the performance increases of Postgres Plus Advanced Server 9.3 compared to the 9.2 version.

Figure: TPS SELECT

Figure TPS UPDATE

The release features some important functions that can make a developer’s life easier, such as:

1. DBMS_RANDOM package. This packages helps users to create Random numbers, Random strings and Random dates. PostgreSQL supports random functions, which enable users to have their own function on top of random for random string and random date.

With this package, users can easily use the built-in functions for those two purposes. Below are some examples:
DBMS_RANDOM package can be used to easily generate random strings and dates, and users don’t need to make their own wrapper function for these two activities. Below is one simple example of using it.

2. DBMS_LOCK.sleep: Similar to pg_sleep function, the DBMS_LOCK.sleep package is meant for Oracle users/developers who are more familiar with Oracle packages.

3. DBMS_CRYPTO: DMBMS_CRYPTO is new in Postgres Plus Advanced Server 9.3 and it provides the interface to encrypt and decrypt stored data, and can be used in conjunction with PL/SQL programs for running network communications. It provides support for several industry-standard encryption and hashing algorithms.

4. DBMS_SCHEDULER: With Postgres Plus 9.2, we got DBMS_JOB package, and now Postgres Plus 9.3 has come out with DBMS_SCHEDULER, which has more control options for jobs and better visibility of scheduled jobs.
5. UTL_ENCODE: This is another important package for keeping sensitive information in the database. Using this package, users can easily encode and decode their data and keep it in database. This package can be utilized in new applications, and gives flexibility around important encode/decode functions for developers and users where data is very important. This is particularly useful for users working applications that were written for Oracle but migrated onto Postgres Plus Advanced Server. .

6. UTL_HTTP: Advanced Server 9.3 has an UTL_HTTP package. This package provides functions for HTTP callouts from SQL/SPL. Developers/users can use this package and associated functions for accessing data on the Internet over HTTP.

7. UTL_URL: This is one of additions we developed for Postgres Plus 9.3. This package has two functions, escape and unescape mechanisms for URL characters. The escape function helps to escape a URL before the URL can be used to fetch data from a website. The unescape function can unescape any escaped character used in the URL, before fetching the data from a website. These two packages with UTL_HTTP allow users to direct the fetching of data from a website without having to write complex code in the application for handling specific data from a website.

8. New in EDB*loader. In Postgres Plus 9.3 EDB*loader has more control options for bulk loading.
a. ROWS parameter: prior to version 9.3, EDB*Loader processed entire data files as a single transaction. With parameter, users can control the processing of large amounts of data after which COMMIT needs to be executed. More control for processing/Loading data files.
b. Error Codes: EDB*loader now supports some additional exit/error codes, which will help users/developers include proper exit code checking while using a higher speed bulk data loader with parallel processing. Exit codes are shown below:

0: Success
1: Failure
2: Warning
3: Fatal

c. Streaming output files to client: Prior to version 9.3, users had to check the logfile of EDB*loader on the server side. Files: logfile, Bad file and discard file used to be created on the server side, and for troubleshooting users had to log in on the server to verify these files. Now, Postgres Plus 9.3 enables these files to be created at the client site. Users are no longer required to log into the server to verify/check these files..
d. New GUC for empty string. EDB*loader now has one GUC which users can utilize to control the default behavior of an empty_string in their datafile. edbldr.empty string has the following valid values:
1. null: empty field is treated as a null if the raw field contains no characters or a pair of delimiters with nothing in between.
2. empty_string: empty field is treated as a empty_string, if the raw field contains no characters or a pair of delimiters with nothing in between.
3. pgsql: empty field is treated as a null if the raw field contains no characters, but as an empty string if it contains a pair of delimiters with nothing in between.
Default is pgsql.

9. New REGEXP functions. Postgres Plus 9.3 nowhas three new REGEXP functions for developers.
a. REGEXP_COUNT: This searches a string for a regular expression, and returns a count of the times that the regular expression occurs. Here’s a simple example:

10. New exception codes for UTL_FILE package: The UTL_FILE package provides the capability to read from, and write to files on the operating system’s file system. To provide error control capabilities to developers/users, Postgres Plus 9.3 has added some new exception codes, which users can utilize in their PL/SPL code for better exception handling. One simple example is given below: