Marko Sutic's Database Blog

Galera Cluster is a virtually synchronous multi-master replication plug-in. When using Galera Cluster application can write to any node and transactions are then applied to all serves via row-based replication events.

This is built-in Mysql row-based replication which supports replication with differing table definitions between Master and Slave.
So, when using row-based repplication source and target table do not have to be identical. A table on master can have more or fewer columns or use different data types.

But there are limitations you must watch over depending on MySQL version you are running.
- The database and table names must be the same on both Master and Slave
- Columns must be in the same order before any additional column
- Each extra column must have default value
- ...

Data inconsistency is the worst problem that could happen in synchronous cluster configuration.
It could happen without any notice, but sooner or later it will stop reapplication process and failing node will be excluded from the cluster.

What’s happening is a side effect of a multi-master cluster with several appliers.
The cluster needs to control when a DDL statement ends in relation to other
transactions, in order to deterministically detect conflicts and schedule
parallel appliers. Effectively, the DDL statement must execute in isolation.
Galera Cluster has a 65K window of tolerance for transactions applied in parallel,
but the cluster must wait when ALTER commands take too long.

Basically if you run DDL statement on slightly bigger table you will probably freeze your cluster for a duration of the DDL statement. And you cannot cancel this operation.
Built-in method to perform model changes is using "Rolling Schema Upgrade" method.

A "fast" DDL may as well be done via TOI. This is a tentative list of such:
* CREATE/DROP/RENAME DATABASE/TABLE
* ALTER to change DEFAULT
* ALTER to change definition of ENUM or SET (see caveats in manual)
* Certain PARTITION ALTERs that are fast.
* DROP INDEX (other than PRIMARY KEY)
* ADD INDEX?
* Other ALTERs on 'small' tables.
* With 5.6 and especially 5.7 having a lot of ALTER ALGORITHM=INPLACE cases,
check which ALTERs should be done which way.
Otherwise, use RSU. Do the following separately for each node:
1. SET GLOBAL wsrep_OSU_method='RSU'; -- This also takes the node out of the cluster.
2. ALTER TABLE ...
3. SET GLOBAL wsrep_OSU_method='TOI'; -- Puts back in, leading to resync
(hopefully a quick IST, not a slow SST)

Even though I globally enabled RSU method on Node1 with statement noted in documentation Node1 is not desynced from the cluster and DDL statement is executed on other nodes.
If this was production database I would have big problems during ALTER command.

Now Node1 is desynced from the cluster and I am running ALTER statement locally without impact on the rest of the cluster.

I don’t know if this behaviour is expected for this version, or it is bug.

When running Galera Cluster in production it is VERY important to take some time and perform tests especially if you need to run ALTER statement on big tables.
Expect the unexpected even when running statements from documentation.

MySQL slow query log is great for identifying slow queries that are good candidates for optimisation. Slow query logging is disabled by default, but it is activated by DBA's or developers on most environments.

You can use slow query log to record all the traffic but be careful with this action as logging all traffic could be very I/O intensive and could have negative impact on general performance. It is recommended to record all traffic only for specific time periods.

This is why slow query logging is controlled with long_query_time parameter to log only slow queries.
But there is another parameter to think about - log_queries_not_using_indexes.
By default log_queries_not_using_indexes is disabled. If you have this parameter turned on you will log queries that don’t use an index, or that perform a full index scan where the index doesn't limit the number of rows - regardless of time taken.

If you have long_query_time configured to reasonable time, and still notice that queries are intensively logged in slow query log file, then you probably have enabled log_queries_not_using_indexes.

Enabling this parameter you’re practically saying that full scans are "evil" and should be considered for optimisation. But full scan doesn’t always mean that query is slow. In some situations query optimizer chooses full table scan as better option than index or you are probably querying very small table.

For instance, on several occasions I've noticed slow query logs flooded with queries like this:

Imagine you have MariaDB Galera cluster with nodes running in different data centers. Data centers are not connected via secured VPN tunnel.
As database security is very important you must ensure that traffic between nodes is fully secured.

Galera Cluster supports encrypted connections between nodes using SSL protocol and in this post I want to show how to encrypt all cluster communication using SSL encryption.

Galera Cluster is excellent virtually synchronous multi-master database cluster. It has many benefits which you can check on GaleraCluster.
But beside benefits it has some limitations and one of them is handling large transactions.

Large replication data sets could degrade performance of whole cluster causing cluster freezing, increased memory consumption, crashing nodes, etc. To avoid this issues it is recommended to split large transactions into smaller chunks.

In this post I want to show you how to safely delete large amounts of data on Galera Cluster. You can perform this task using several tools or writing custom procedures to split large transaction into chunks. In this example I will use pt-archiver tool from Percona.

Imagine you have received task to perform data cleanup in devices table for several schemas.
It looks like very simple task - delete rows from devices table where device_cookie is 0.

delete from devices where device_cookie = 0

But, although statement looks simple it could potentially freeze whole cluster so before executing delete statement count how many rows you need to delete.

In my case I have to delete few millions of rows which is too much for one transaction so I need to split transaction into smaller chunks.

Almost every big database has it's hot data which is used often, and cold data which is rarely touched. From version 9i I have used transportable tablespace feature to exclude cold (archive) data from database and keep it on cheap storage or tapes.

If someone needs to query some of archive tables it was very easy to plug in tablespace for a few days and after archive data is not needed anymore tablespace could be easily dropped. So I was plugging the same tablespaces more than once.

But when I tried the same process on 12c database I was unpleasantly surprised that Oracle changed behaviour and I could not reattach tablespace.

Oracle Development declared it as "Expected Behavior"
Starting from 12.1, during the TTS import operation, the tablespaces (datafile headers) are put into read-write mode intermittently in order to fix up TSTZ table columns and clean up unused segments in the datafiles.
This functionality was implemented on many customer's request basis. And, hence, this cannot be reversed. Note that, it intermittently only changes the status to "read-write" and the final status will still be "read-only" only.

Oracle database In-Memory option was introduced in 12.1.0.2 patchset. It is great feature to improve performance of analytic queries. For mixed workload OLTP environments In-Memory option could improve performance of analytic queries without significant negative affect on quick OLTP queries or DML operations.

So you have decided that In-Memory option could be great for you and now you want to implement this option for your critical production database.

But in your code you have many SQL hints hard-coded, SQL Profiles implemented or SQL Plan baselines created to solve problems with unstable query performance. What will happen with execution plans if you populate In-Memory column store with critical tables in the database.

Example:
Version : Oracle 12.1.0.2

For test I will use query with fixed plan using both SQL profile and SQL plan baseline.

Note section says that SQL plan baseline is used for this statement, but different than before.
I have "TABLE ACCESS INMEMORY FULL" operation and plan has changed automatically.

In Oracle 12cR1 Adaptive SQL Plan Management is enabled by default. Oracle calculated more efficient plan using in-memory column store and automatically accepted new SQL execution plan for this query. As new SQL plan is added and accepted Oracle was able to change execution plan.

In case we have hint forcing full scan, query will read data from in-memory column store as "TABLE ACCESS INMEMORY FULL" and "TABLE ACCESS FULL" are the same full table scan operations for the optimizer.

Conclusion
If your production application is heavily dependent on SQL profiles and SQL hints it would be hard to use full potential of in-memory column store option in a short time.
With SQL plan baselines it is slightly easier because you could use Adaptive SQL Plan Management to alter plans.

But you must dedicate some time for proper testing, because changing plans and dropping indexes blindly could cause many performance problems.

Few days ago we had concurrency problem with "cursor: pin S wait on X" wait event. This wait event is mostly associated with parsing in some form.

After quick diagnosis I’ve found problematic query. It was fairly complex query which was executed very often with average 0.20 seconds of execution time. As this query was using bind variables, Oracle reused existing plan and problems with "cursor: pin S wait on X" wait events weren’t appearing.

But when hard parse occurred we experienced problems with specified mutex waits. Query execution with hard parsing jumped from 0.20 seconds to over 2,1 seconds.

One session would hold mutex pin in exclusive mode while other sessions were waiting to get a mutex pin in share mode - waiting with "Cursor: pin S wait on X" wait event.
Rewriting query would solve this issue but we needed some solution quickly.

I have decided to perform few tests using SQL plan baselines and SQL profiles and measure effect on hard parse. Tested query is intentionally excluded from the post.

This is big improvement.
Notice elapsed time for parse - from 2.09 secs to 1.21 secs.
Check query statistics - almost three times less recursive calls.

But why?
This is my explanation and I might be wrong so please leave comment below if this is the case.

When we’re using SQL baselines for plan management first step is always generating execution plans from optimizer. Cost based optimizer produces several plans and then compares plans with plans in the SQL plan baseline. Many different plans will be probed as a part of optimizer calculations. SQL plan baseline has no effect on number of calculations.

With SQL profiles we will feed optimizer with estimations and hints before calculation starts. Future plan will be influenced by the SQL profile. Basically we will point optimizer "in the right direction" and optimizer will not perform the same amount of calculations like before. As a result we have less recursive calls and less time spent on hard parsing.

After "fixing" plan with SQL profile, I’ve tried to reproduce mutex concurrency intentionally forcing hard parse but now Oracle managed to perform hard parse without affecting many sessions. Maybe I’ve solved problem temporarily and bought some time for developers to rewrite problematic query.

We have several databases where automatic capturing of sql plan baselines is enabled for a few schemas.

Execution of some queries deeply depend on variables where is not always the best to reuse same execution plan for all executions. For those queries I want to avoid using literals and inefficient execution plans. Also, I want to use SQL plan baselines as I have automatic capturing enabled.

Question is, can I make Adaptive Cursor Sharing to work with SQL Plan Baselines without changing query?
Activate bind awareness for every execution to avoid inefficient execution plans?

I want to avoid even one inefficient execution or wait for ACS kick in automatically, because this one lousy execution could be potentially big problem.

We have note that SQL patch is used and we have bind awareness enabled. For every query execution, during hard parse, Oracle will peak variable and calculate efficient execution plan accordingly. At least, I would expect this.

Few days ago I’ve received complaint that simple count on 2 million rows table is running forever.

This was the statement:

select count(1)
from CLIENT k
where k.expires is null;

I've used fake names for table name and columns.
Database version: 11.2.0.4.0

Indeed, query was running longer than I would expect. Oracle was using FULL SCAN of the table with "db file sequential read" wait events. This was little odd to me as I would expect "direct path reads" or "db file scattered reads".

Hm… why am I having so many db file sequential reads with direct path reads happening also?
This is a table with lots of columns so I might have problems with chained or migrated rows.
Oracle is probably using individual block reads to fetch pieces of each row.

As I had table with more than 255 columns I would expect intra-block chaining, but this shouldn't cause sequential reads. Only if row doesn’t fit in the block I would have regular row chaining.
I’m probably having problems with row migrations.

Chained row is a row that is too large to fit into a block and if this is the root cause of the problem there isn't much I can do to improve performance. If it’s too big to fit into a block, it would be too big after rebuilding table also.

Migration of an row occurs when row is updated in a block and amount of free space in the block is not adequate to store all the row’s data. Row is migrated to another physical block.
This usually happens when you have PCTFREE set to low.

What is important for migrated rows - you can improve performance reorganizing table/partition or simply deleting/inserting chained rows.

Notice "table fetch continued row" statistic. Tanel wrote that this counter is usually increasing when rows are accessed with index access paths.
In my case I have full scan that is increasing the value. This count is number of chained pieces Oracle had to go through in order to find the individual pieces of the rows.
I won’t go any further in detail - just check Tanel’s blog post.

Reorganizing table solved my problem. Finally full scans on the table were running much faster.

There is interesting support note "Doc ID 238519.1" which states that trailing NULLs do not take space in the rowpiece: initially row fits in one rowpiece.
If column beyond 255 is then populated, then all the NULL columns between last populated and this new column now takes up space.
Row has to be split into two rowpieces and the new rowpiece is migrated to a new block - row becomes chained.

In our table we have trailing NULL columns so this probably caused such migration.