Honza's IT blog about Linux and open source software, where you can enjoy posts about databases, programming, scripting, Fedora and another geeky stuff.

Tuesday, February 23, 2016

Notes from Prague PostgreSQL Developer Day

What a surprise that I haven't heard about containers at all during
whole conference.. Really a special thing today, database experts are
probably quite reluctant to such cool hype technologies... Anyway, it
was very interesting 1-day conference around PostgreSQL, so sharing my
notes, since those might be interesting for others as starting point for
further study. Most of the slides are here:
http://p2d2.cz/rocnik-2016/prednasky

Hlavní novinky v PostgreSQL 9.5

First
presentation by Pavel Stehule, last time replacement for Tomas Vondra,
was about news in PostgreSQL 9.5 and even without slides it was very
interesting summary with enough details at the same time. PostgreSQL 9.5
turns to be wanted by quite a lot of attendees.
Most of main
PostgreSQL upstream developers do it as full time jobs, which means it
is a professional development, even though it is still true open-source
and always will be. On the other hand, as the project is more and more
mature, a lot of features are in development for even 3 years.
As for
version 9.5, many users are interested in UPSERT, performance is not
that different from previous versions. UPSERT is kind of like MERGE from
Ansi SQL, but MERGE was not implemented 100%, so it is called
differently.
Grouping sets kind of allow to create multidimensional views on data.
Locks
reworked a lot and arrays in PLPGSQL mean that for some complicated
scenarios it may improve performance a lot. Usually, sort with C lang is
much quicker than with non-C lang. Languages are simply sorted slowly,
in 9.5 it is much better even for language sorting.
Bitmap indexes
with a lot of duplicits do not exist like in Oracle. Something new was
created though, originally called minmax, now BRIN index, which is much
smaller than other indexes and is primarily used for append only data.
Extension pg_rewind avoids cloning master and we can make a slave from master that has even a bit old data during fail-over.

Two
things where PostgreSQL is behind commercial DBMS are multiprocessors
utilizing and partitioning. Today the HDD is not the bottleneck anymore,
with RAM disks and many CPUs it is speed of on CPU. In 9.6 there is
already couple of commits for parallel queries, which is so great that
we might even call 9.6 to be 10.0. There are also plans to have logical
or BDR replication.. some more patches for making guesses better, using
correlations. Especially useful for users who do not understand query
execution and cannot use EXPLAIN.
Another feature is RRS (row right security), especially interesting for banking sector for example.

IO in Postgres - Architecture, Tuning, Problems

Andres
from CitusData talked about tuning, giving overview around memory
architecture, explaining why there is shared memory and private per
process memory. One cache handled by PostgreSQL, another hash is in the
system when reading files. Clock sweep invalidates part of the buffer so
we know what buffer part we can replaced -- that might be expensive in
some cases.
Reason why WAL is efficient is because it is a sequential
write. Checkpoint then write bunch of data at some point, that can be
expensive. We can extend checkpoint timeout, for large data written,
checkpoint_segments should be tuned up. To reduce starting up we can do
checkpoint before shutdown.

Checkpoints explain why there is
sometimes more load and sometimes less in benchmarks graphs. Andres also
showed how the tuning of buffer size and checkpoint values influence
the graphs.
Interesting that turning off caching in OS may do better job, although we need to count with consequences.
Tuning
buffers is basically about understanding the workload, no advice can be
good enough if it is done without that. Drop index and re-indexing can
be better with lower shared_buffer settings.
Batch runs may use different settings. Slides available at anarazel.de/talks/pgdevday-prague-2016-02-18/io.pdf

SQL Tabs

Sasha from Shards.io was speaking about his cool project SQL tabs .

Sometimes
psql console is not enough, but people love it. So he started a new SQL
client with requirements to be nonblocking, black and white themed,
having connections switching...

react.js, nodejs, and other
technologies used for creating web-based system that allows browsing,
datatype exploration, context advisor, functions listing, query history,
... It uses libpq in backend, so the output is as usual. It can also
generate some simple graphs by adding some semantic in SQL comments and
selecting proper type of chart and writing comments in markdown. It also
allows to combine more queries and having a simple report with text,
chars, table.
It understands some datatypes so time series are nicely formated as one would expect.
More info at www.sqltabs.com.

Table spaces on own clusters to avoid influencing other databases when character of one app changes rapidly

Monitoring is important but sometimes troublesome

Use check_postgres.pl and autovacuum, configured well for the workload

Default privileges help creating fine granulary permissions

Logging can supplement audit

Backup and recovery by barman, pgbackrest, pacemaker from rh for recovery, streaming replication with delay

Testing data recovery scenery

For monitoring they use elegraph + influx + grafana

Visualization helps understand where is problem

Configure autovacuum so that it runs often, then it does little things, so it is quickly

Open-source "clusterovací" řešení pro PostgreSQL

Petr Jelínek from 2ndQuadrant talked about general scaling concepts (vertical, horizontal), focusing on OLTP, OLAP, ...
For
vertical scaling and OLTP we have couple of features already in 9.5
(sampling, BRIN indexes, sort optimizations and in 9.6 there will be
some first implementation of parallel query execution.
Hot standby shorty introduced - integrated solution for scaling, it solves only read scaling, since slaves are read only.
PL/Proxy - now almost legacy, developed in Skype, no maintanance, map-reduce implementation in PostgreSQL.
Greenplum
by Pivotal (fork of PostgreSQL), stable, open-source since Autumn 2015
(Apache) is kind of MPP database (multi parallel processing) and is
quite a lot diverse from vanilla PostgreSQL. It has own tooling, tables
optimized for inserts.
CitusData is gonna release CitusDB very soon
as extension only, now it is a fork of PostgreSQL and extensions. And it
will be open-source soon as well, now only pg_shard is open-sourced. It
has logical servers, so it duplicates data in distributed tables.
Postgres-XL
comming from Postgres-XC and StormDB, now Open-Source (MPL) taken by
community after the company crashed. Soon on 9.5. It is MPP that
supports both OLTP and OLAP, but is more complex for administration and
installation.
BDR from 2ndQuadrant is also a fork of PostgreSQL
(9.4), but it has goal to be integrated back once. It already pushed
couple of features to vanilla PostgreSQL (background workers, event
triggers, ...). It is asynchronous multi-master solution (all data on
all servers) that uses optimistic detection and conflict solutions (in
contradiction to MVCC in vanilla PostgreSQL). So, it makes the
consistancy be true eventually, after conflict resolution (after
commit). Primary focusses on latency optimization (avoid global writes
across the universe).
Questions about eventual consistency and
sequences; answer was that nobody can expect normal application will
work the same as before without multi-master.

Monitoring PostgreSQL prakticky

Pavel
Stehule talking about administration and monitoring, starting with
interesting thought that moving from "It may work" to "I know it works"
is quite a small step.. Architecture is always a key. Client server is
for one server, cluster scales... Wrong chose in the beginning will
affect all.
Already during user interface design to think about
database -- so the database can return only limited set of lines.
Important values are parameters of RAM, IO -- while we are in RAM, then
there is performance degradation and it does not go linearly, it
degradates by jumps.
Configuration of database and optimization of queries does not help when the architecture and app design is wrong.
Sometimes
users add indexes without thinking. Sometimes users deploy replication
and cluster when it is not necessary, ram may help much better,
aggregated data in db is also not a bad idea.
Work_mem shared_buffers, effective_cache_size, max_connections are important - wm*mc*2 + sb + fs + os < ram
Database servers can never swap.
Shared buffers too high (over 20gb) could make finding free cache too slow, but it always depends on cache work characteristic
Effective chache size says how big portion of index size is actively used.
Strong
negative feedback is about 80/20 rule, that 80% of workload is
generated by 20% of queries. Sometimes even modules that are not used,
eat 50% of workload.
People cannot work with memory today, it is
better to read data once, work with them and write time to time to db,
not connect to db for every piece.
Monitoring is important to seek for regressions.
Good queries should be verified that it is not an accident.
Disk
in work has much different characteristic than on machine without load.
Pg_bench is the other extreme, it does not simmulates any real
workload.
Some overview at pg_stat table - Many Rollbacks are wrong,
many tmp files are wrong. Changes in transactions count is suspicious,
like firewall killing connections by mistake.
Kill very long running queries before they kill the server.
Especially in beginning of the app it is important to monitor queries because database is a living organism.
Queries around 50ms cannot be much faster, if we need more, we need cache and not use db at all..
Monitoring
regressions in cpu utilization using profiling tools works good in
linux, fix is usually to distribute the workload to more processors.
Autovacuum monitoring is also important.
And finally the bench-marking is needed and it should be done soon enough and ideally in production environment.
Finding
whether something is effective is hard, takes long time. Finding
whether something behaves the same as yesterday is simple and quick.
Graphs are quick way from hope to know.
Too wide tables are usually wrong, in oltp we should respect normal forms.
How large is typical oltp today? Usually tens of gb. Today people often store bitmaps to db which makes it larger.
Ram > 1/10 db

pg_paxos: Table replication through distributed consensus

Pg_paxos
was introduced by Marco Slot and we learned it is a framework for
distributed problems. Servers sometimes need to agree on something, that
is what paxos algorithm is about.
Two phases: nodes agree in participation, proposal asks for acceptance when majority must agree
If anything fails, we start over.
Paxos state machine (multi-paxos) helps to replicate log on multiple servers.
Paxos
is extension with low throughput and high latency, so no alternative to
streaming or logical replication, not even a magic distributed
postgres, but it is a useful building block for distributed systems. It
is somewhat experimental in pl/pgSQL. Available on github on citusdata
organization.
Demo showed how to use three paxos servers in cloud to
acquire a lock, then communication can begin normally and we know only
one server works with the data.

Budha Said...

Search This Blog

About Me

Currently working for Red Hat, I'm trying to spread the open-source idea and improve things for developers using Fedora, while databases became my primary interest recently. Beyond IT world I love sports and travelling.