Honza's IT blog about Linux and open source software, where you can enjoy posts about databases, programming, scripting, Fedora and another geeky stuff.

Tuesday, November 24, 2015

Thoughts from PGConf.eu, PostgreSQL Europe Conference 2015, Day 3

This is a continuation and the last piece of notes from PGConf.eu in Vienna this year. First day notes available here and second day notes available here.

I started the 3rd day with KaiGai Kohei, who talked about pb_strom in his presentation called GPGPU Accelerates PostgreSQL - Unlock the power of multi-thousand cores. Plugin pg_strom is a combination of PostgreSQL and gpgpu, which is a project in early state that aims to online native GPU code generation (nvrtc run-time compilier, cached, so compilation is done only once). It uses 9.5 feature of custom scan/join interface, that allows to overwrite part of the execution plan. Benchmarking showed approx. 5x speedup for queries that included 100M rows; join was more important than aggregate. Another benchmark showed some anomalies where pg_strom was slower for some queries.
https://github.com/pg-strom/devel

Ruben Gaspar Aparicio from Cern provided some experiences from his work as DBaaS. Except DB related information, some Cern statistics were also unbelieveble -- 6000+ magnets in The large hadron collider, which makes it the largest machine in the world. It produces 0.5PB data in one second during proton collision, totally 100PB stored now, CPU with 250K cores, 2M jobs per day. 8000 physicists require real-time access to all that LHC data from 160 computer centres in 35 countries.

Except 36 PostgreSQL 9.2 and 9.4 they use 226 MySQL and 8 Oracle in DB on demand (DBoD), then RAC with ~100 Oracle DBs. They support SSO in system, overview of instances DB providers offer different things, own DBs on demand is how the physicists work. The DBaaS supports upgrades, monitoring, backup and HA. However, there is no DBA support nor app support nor Vendor support. Everybody gets own database and must solve issues on their own. System supports back-up, configuration changes, PiTR (point in time recovery). Statistics of queries can be intuitively seen by uesrs that are no DB experts. Logs overview helps to fix problems by users themselves.
I was smiling about Ruben's quote that "Deleting is manual, because there are people who don't realize that when they delete something it is no more there."

New machines are installed by puppet and they run on physical servers + OpenStack as infrastructure. They already plan to use containers for running the DBs in Cern. They use Coverity and apps are usually written in Perl 5.20. Travis-CI is then used for running unit-testing.

Talk The Elephants In The Room: Limitations of the PostgreSQL Core Technology by Robert Haas direct I/O was then about dirty kernel page buffers when writing to disc, where PostgreSQL was not able to do almost anything. A need for full featured logincal replication solution was mentioned as something we really need (not only features).
Database size is bigger and bigger -- horizontal scalability seems to be the answer, i.e. the focus is in better sharding.
Other areas to consider: better parallel queries, changes to storage format, direct I/O, built-in connection pooler.

Simon Riggs closed the event with a keynote where he mentioned pgconf.eu 2015 to be the worldwide biggest event ever.
Zero-downtime upgrade is something PostgreSQL upstream wants to achieve when upgrading from 9.4 to 9.5 -- using cross-version replication. BDR (bi-directional replicatoin) is heading into postgres, which may allow multi-master replication in 9.7.
Simon mentioned Stonebraker's Vision who said that there will be either data warehouse database or transactional databases, but nothing will do both. PostgreSQL is proof that Stonebraker was wrong, because PostgreSQL is both. It was also mentioned that patched-PG projects like PgSQL-XL, that focus on scalability and that will be based on 9.5 at 1Q2016 may be several times faster than 9.6.

Dave Page, Magnus Hagander had the final word that belonged to 2nd Quadrant, which is becoming sustainable open shource development, customer funded company that provides Highly Available Support and RemoteDBA.

I already wrote it was my first PGConf.eu, but since the three days were full of very interesting topics and hallway was full of very interesting fans of PostgreSQL, I'm sure it was not the last one.