MySQL Partitioning – can save you or kill you

MySQL Partitioning – can save you or kill you

I wanted for a while to write about using MySQL Partitioning for Performance Optimization and I just got a relevant customer case to illustrate it. First you need to understand how partitions work internally. Partitions are on the low level are separate table. This means when you’re doing lookup by partitioned key you will look at one (or some of) partitions, however lookups by other keys will need to perform lookup in all partitions and hence can be a lot slower. The gain from updates typically comes from having smaller BTREE on the active partition(s) which allows for a lot better fit. Having potentially fewer level in BTREE is not that significant issue.
So lets see at example:

Shell

1

2

3

4

5

6

7

CREATE TABLE`tbl`(

`id`bigint(20)unsignedAUTO_INCREMENT NOTNULL,

`uu`varchar(255)DEFAULTNULL,

`data`bigint(20)unsignedDEFAULTNULL,

PRIMARY KEY(`id`),

KEY`uu`(`uu`),

)ENGINE=InnoDB

The access pattern to this table is to lookup data by “uu” which has UUID values and when number of deletes by “id” and bunch of inserts. The deletes are mainly clustered around most recent id values.
The table (and index) is much larger than buffer pool size.

The first problem was replication lag, which are mainly due to modifying the uu index. This is because UUID() spreads values prefix very well effectively giving almost uniform access to all BTREE. To solve this problem partitioning was a good choice – PARTITION BY HASH (id div 10000000) PARTITIONS 32 – This allows to partition data to 32 partitions placing sequential ranges of 10M values in the same partition – very handy if you have very active access to values which ave been added to the table recently.

Using this trip replication could be speed up about 10 times as couple of partitions which were actively used could fit in buffer pool completely so replication became CPU bound (single thread) instead of IO bound.

You could celebrate but hey…. you need to check the impact on master too. Master in its turn was getting a lot of lookups by the uu value which is not part of partitioned key and hence we’re looking at 32 logical lookups, one per partition. True only one of the partitions would contain the value but many of them will require physical IO and going down to the leaf key to verify such value does not exist, which reduced performance for random selects by UUID from 400 to 20 per second (from single thread).

Decreasing number of partitions made replication less efficient but the number of selects the table could deliver was increasing and there seems to be a reasonable number which would allow replication to perform better when it is now, while selects still performed in the amount system needs.

What is a take away ? When you’re creating partitions think clearly what you’re trying to archive. Partitioning is not some magic feature which just makes everything a lot faster. I’ve seen some people applying partition to basically all of their tables without much a thought and believe me results were not pretty.

Related

Author

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Share this post

Comments (54)

In a case like this, I usually leave the master unpartitioned and partition only the slaves. You get the performance you want on the master and the slaves, the tradeoff is that slave cloning or rebuilding is a more time-consuming task. Is this a good practice? Why did you decide to lower the number of partitions on the master and the slaves too?

Partitioning and Sharding are technologies with different focus. Partitioning is database side technology which allows in some cases to get query execution more efficient. It is transparent to application which is great. Sharding is application level technology where you change how application does queries to aggregate data from different tables and servers. If you use Sharding in most cases you do not use partitioning, but sometimes it is helpful still – you may shard on “user” and when use partitioning by date/time to separate new data from old.

You touch on very interesting topic. In MySQL partitioning is kind of lame because it is not really designed from scratch but rather added on top simply being more advanced form of “merge table” having multiple table underneath. This means MySQL can only deal with local indexes (each partition has its own index) – there is no support for global indexes; you also can’t partition indexes differently than you partition data (which can make sense in some cases). To have these however you need to have interface to indexes as a separate object, which is not the case inside MySQL – indexes are something which belongs to the table and managed as part of the table by storage engine.

There are some good news though. for some workloads you can get an effect similar to global indexes by doing them as a separate index organized table.

Well, Look at the table in the post. It is partitioned by PRIMARY KEY. Lets say you are also planning to do some lookups by timestamp and you do not want that timestamp to hit all partitions, when you can create the table timestamp_index with primary key(ts,id) and same columns and when you need to do the lookup by timestamp you would lookup in that table and join it to original table.

A UUID field should be
BINARY(36)
for the following reasons…
* Not VAR because the length is constant
* Not (255) because some (which?) queries will unnecessarily use 255 in temp tables.
* BINARY, not CHAR because utf8 (you are Internationalized, aren’t you?!) takes 3 bytes. Alternatively, you could use CHAR(36) CHARACTER SET ASCII, if you really need case folding on the hex characters.

You’re right of course for UUID in general. In this case the data type is designed to store “unique name” where this given installation stores UUID out there. In general BINARY(36) is good if you do not mix case for your UUID or CHAR(36) latin1 if you do. If you can store it in UNHEX BINARY(16) would be best

Hi, I’m currently looking into an issue not to dissimilar to this but not found specific answers to my question and hoping this thread is a good place to put it.

I have a database with 1 table which is MyISAM and has 30 partitions. The table has 3 columns, which are all varbinary of lengths (8,20,65500) the first 2, (A,B), make up the primary key and are used for the partitioning key as KEY(A, B). The table contains 60M rows and expected to reach 100M+, key buffer is 8G and 56% used.

You mention in this article about reduction of partitions improving selection speed (replication is of no concern in this case) but would you expect the second key in the partition to increase performance, as I believe this is why it was added as oppose to ensuring uniqueness?

I have been a software developer and database administrator/developer for decades, focusing mostly on VB and SQL Server. I have worked with many companies, both large and small, but my experience has been mostly with low transaction volumes.

I have a client that has a need for a low data volume (transaction size less than 2k) yet very high transaction volume (300k per second; although if broken down by location this could be as low as 10k transaction per second) application and they want to go cheap.

By “going cheap” I mean they want to start by going with a hosting service using commodity hardware and MySQL with the client side PHP.

My client has a great business concept (one of those “wish I would have thought of that!” things) and plan to start small but once “live” they will grow exponentially very quickly.

I know sql server. I need to learn MySQL, both administration as well as development. What I need to know from you is some direction in what training I need to take. Please be detailed in what you think the training I need is. Any advice would be appreciated.

Hi , i have a 10 GB MYISAM table , which is growing rapidly . Can anyone assist me in the below :
I am trying to implement partitioning by range (based on primary key) . I tried some benchmarking , but the results were more or less the same .
I am unable to understand that whether it is due to the reason that SELECTs on primary key ar also very fast , so partitioning won’t be of much help OR i have missed something while benchmarking .
I will be really thankfull if somebody can suggest me to go with Partitioining or not .
My table has a high number of Selects as compared to Inserts/Updates and Replication is in place and working fine

Hi Peter,
I have a couple of tables on a high transaction site (4000 queries per second), that are in excess of 50m rows. I am trying to get a suitable partitioning schema together.
The idea was/is to partition on date ( To_day(date field)), so we can access the latest quickly , and easily archive off earlier records by copying and dropping a partition . Unfortunately the tables have a PK of char , which doesn’t fall in line with the date syntax . Adding in the date field as the primary key would negate the usefulness of the current Pk column.

Although with 5.5 we can use multiple columns , I don’t think that helps .

Any ideas greatly appreciated. I saw your idea of a lookup table , which could work, but would require more space, and changes in a lot if code.

Hi Peter & everyone,
I’m working on a multi-tenant saas project using MySQL as the DBMS, but it’s really hard to decide on the database structure. At first, I though shared database with multi-schema (or views in mysql) would be a great solution, because it requires minimal handling in code without commingling tenants’ data.
For those who might be interested, this article explains the approach with example:http://blog.empowercampaigns.com/post/1044240481/multi-tenant-data-and-mysql

However, after reading Peter’s article on the performance issue of mysql view (http://www.mysqlperformanceblog.com/2007/08/12/mysql-view-as-performance-troublemaker), I ran a little test to compare query performance between multi-schema (by querying from a view that shows only data of one tenant) & single schema (by using the same base table from which the view is generated, but WHERE tenant_id = XXX is added to retrieve tenant data), and my finding shows that single-schema is approximately 2x faster than multi-schema. Is it because MySQL has to generate the view everytime before processing a query?

But still, I prefer not to handle tenant data separation in code if possible, although some say that with careful design, this is still pretty safe. So now I’m wondering, how does MySQL partitioning differ from view? And is MySQL partition pruning a decent implementation of multi-schema approach?

Firstly ‘Partitoning’ and ‘Views’ are completely different terms, and not related. A ‘View’ is simply a stored piece of Sql that produces the same columns from a table, or group of tables. ‘Partitioning’ is a way in which the DB (MySql in this case), splits its actual data down into separate tables, but are treated from the Sql POV as a single table.

It would be relatively simple to partition your tables on ‘tenant’, so that Mysql would only ever have to look at a given partition. This would be quicker on queries, as it wouldn’t have to trawl through all the other tenants records. You would still need to store the ‘tenant’ id in your copde, or as a Mysql variable.
If you went this route, you could still use views to simpl;ify your code, and ensure that the tenant id is used by the view, and stored for each session.

The difference in time between querying through a view, and querying directly, is that the view will have its own execution plan, which then executes the same single plan that you did directly (if you see what I mean). basically a View generates two queries, one for itslef, and one for teh query in contains.

I am having a similar problem wherein I have around 1billion rows in a MYISAM table, with a primary key consisting of 3 columns (ObjInstance (10000 types), objectClass (100 types) and timestamp of addition). All the 3 columns are also indexed as these (mostly seperately or in a combination of 2) are used for querying.

Inserts are @ 800 inserts per second. Deletes are for records older than 60 days..
We tried both
a) partitions on objectClass –> deletes were really slow to the extent of 50 deletes per sec in batch (NOT good if it does not match insert rate)
b) MERGE tables based on dates (sharding the main table) –> Queries and inserts become really slow
I understand that this might be because all the tables below the merge table are queried. Also, there might be locking happening when doing queries taking longer and inserts at the same time..

I know mysql should have no problem in handling this ampunt of data but am not sure what am I doing wrong..
So, I am wondering

@Ashish — Based on what you said, this is the best thing to do…
ObjInstance SMALLINT UNSIGNED
objectClass TINYINT UNSIGNED
Normalize, if necessary, to shrink the data sizes.

RANGE partition on dates — 1 PARTITION per day. Then do a nightly DROP PARTITION and REORGANIZE PARTITION to add a new day. This will replace the DELETEs with “instantaneous” PARTITION operations.

Multiple indexes + 800 rows/sec == you are lucky to make it work. How much RAM? RAID? SSD? What is key_buffer_size?

Let’s see the actual CREATE TABLE, SHOW TABLE STATUS, and the SELECTs.

> (mostly seperately or in a combination of 2)
Note that INDEX(a) and INDEX(a,b) are redundant; the former is virtually useless.
Note that the optimizer will rarely use two indexes in a single SELECT. Even if it would, usually a “compound” index would be better.

I agree with you about partitions. You may need something bigger in the way of ints, like medium or bigint, but essentially you are right.

800 rows per second is not uncommon, especially if you bulk feed them on a prepared cursor . I have achieved 2 to 3 times that figure over a 6 month period on a development system. Our live system is regularly processing 4000+ queries per second over 4 servers.

I Had tried partitioning based on ObjectClass but dropped the idea as deletes were awfully slow.. am testing below mentioned schema now.. Am still in the testing phase and am evaluating InnoDb as alternative because deletes are awfully slow and my partitioned table with 800 million rows crashed..

3 main SELECTS are ( replaced actual fields with “*” ):
##————————————————–
SELECT o1.* FROM ObjectInstances_T o1
,(SELECT ObjInstance, MAX(UTCTime) AS UTCTime FROM ObjectInstances_T where ObjectClass in (…) GROUP BY ObjInstance) o2
where (o1.ObjInstance = o2.ObjInstance and o1.utctime=o2.utctime);

SELECT o1.* FROM ObjectInstances_T o1
,(SELECT ObjInstance, MAX(UTCTime) AS UTCTime FROM ObjectInstances_T where ObjectClass in (…) and ObjInstance in (…) GROUP BY ObjInstance) o2
where (o1.ObjInstance = o2.ObjInstance and o1.utctime=o2.utctime);

“SELECT * FROM ObjectInstances_T
where ObjectClass in (…) and ObjInstance in(…) and
UTCTime between ‘2010-06-10 03:13:27’ and ‘2014-06-12 08:13:27’ order by UTCTime desc limit 50000;”

If you switch to InnoDB, keep in mind the issues raised in http://mysql.rjweb.org/doc.php/myisam2innodb , starting with
“Disk space for InnoDB is likely to be 2-3 times as much as for MyISAM.” (I highlight this because of your “billion” rows.

( SELECT ObjInstance, MAX(UTCTime) AS UTCTime
FROM ObjectInstances_T
where ObjectClass in (…)
GROUP BY ObjInstance) o2
where (o1.ObjInstance = o2.ObjInstance
and o1.utctime=o2.utctime);

SELECT o1.*
FROM ObjectInstances_T o1 ,

( SELECT ObjInstance, MAX(UTCTime) AS UTCTime
FROM ObjectInstances_T
where ObjectClass in (…)
and ObjInstance in (…)
GROUP BY ObjInstance) o2
where (o1.ObjInstance = o2.ObjInstance
and o1.utctime=o2.utctime);
”
SELECT *
FROM ObjectInstances_T
where ObjectClass in (…)
and ObjInstance in(…)
and UTCTime between ‘2010-06-10 03:13:27’ and ‘2014-06-12 08:13:27’
order by UTCTime desc
limit 50000;
”

I don’t see any obvious optimizations to do. If you go with PARTITIONs, and partition on UTCTime, “partition pruning” will be a slight benefit with “and UTCTime between…”.
I doubt if the first two SELECTs would benefit from _any_ form of PARTITIONing (or MERGE).

Your first two SELECTs may benefit from designing “summary table(s)”, populating them as the data arrives, then SELECTing against them.

The 3rd SELECT may work better in newer versions of MySQL; it seems to do a better job of IN in compound indexes. What version are you running? What does the EXPLAIN say?

If those are the _only_ SELECTs you need, then DROP all your secondary keys; they are possibly useless. This will greatly speed up INSERTs.

> PRIMARY KEY (ObjectClass, ObjInstance, UTCTime),
Including a time in a PK is usually a mistake, unless the application _guarantees_ that it will not generate two rows in the same second!

CHARACTER SET? I ask because you have a VARCHAR. Often people can get away with latin1, which has less overhead than utf8.

Let me guess… SHOW TABLE STATUS says 50GB of data + 30GB of index (MyISAM) or 120GB + 100GB ? (The large PK will adversely impact InnoDB secondary keys.) The numbers will be roughly unchanged with MERGE or PARTITION.

@Peter …
Sure, 4K INSERTs/sec is possible in some applications. XtraDB shines over MyISAM and InnoDB. TokuDB may shine even with lots of indexes. INDEXes can be a killer of INSERT rates on huge tables. His table is much larger than his RAM, so we need to “count the disk hits”.

Dissecting his case…
Inserting row(s) into the data — appending to the .MYD file: essentially 0 disk hits per row inserted
Inserting into 4 indexes — still cannot tell how “random” they are: 0-4 disk hits.
“RAID 1” — Let’s say 200 disk hits/second.
Conclusion: The indexes have hot spots, and are not very “random” if he is getting 800 INSERTs/sec.
Dropping the secondary indexes — can insert faster.

If he switches to InnoDB, it will depend, again, on the randomness of the data.

With either ENGINE, there is a potential benefit of PARTITION by day, even if the keys are random…
IF all the data and indexes for one partition fits in RAM (key_buffer / buffer_pool), and IF data is inserted in (roughly) chronological order — THEN there will be a lot of activity in the cache, but no need to flush cache until the next day gets started. And, then, the flush will (mostly) be a background task. (XtraDB has a significant fix for smoothing out this operation; InnoDB periodically “stalls”.)

In either engine, even without dropping the unused indexes, a 1-day partition will easily fit in his RAM.

So, I stick with my suggestion of partitioning by day. But now it is a “win” for two reasons. (1) DELETE via DROP PARTITION, and (2) no I/O thrashing during the inserts. He may well be able to sustain your 4K/sec insert rate.

Have created a test table with the partitions@ described. Also, modified indexes –> 1 index (ObjectClass, Timestamp), PK –> (ObjectClass, ObjectInstance, Timestamp) – Unique.
Using InnoDb engine as the client is very finiky about his records. Cant lose them due to a crash. Though there are nightly backups of the disk, still better to use InnoDb (though disk requirements increase 2.5 times but that’s separate issue)..

Shall fill records into the same and then test..

I missed mentioning few main points…
1. Had to use multiple indexes as when I tested with the above mentioned index and PK, mysql did not select any index (saw using EXPLAIN) when my queries contained only one of the columns (Strange…??)
2. We are using mysql ndb cluster version Distrib 5.1.56-ndb-7.1.19 (to maintain consistency across the other projects in the same group ). This base mysql version (5.1.56) supports partitions, but does not support partition pruning.
3. The records in the production (as you rightly mentioned) should be in chronological order of timestamp.
4. The underlying disk partition is DRBD, with replication to a second standby server for failover.. (This is not yet active in my setup but I think that this might impact performance..)
5. (Did not mention this before as this is not the right forum for the point). Tried using concurrent_inserts=2 and the inserts fired from same machine still got blocked for long running queries fired before them from the same machine.. (Do not understand the issue.. This was another reason to switch to InnoDb)

Lastly, its been decided today by senior architects that the servers shall also run the application server along with the Database and So, RAM would be doubled and processor cores increased (new server)..

Another point worth mentioning.. This is my first ever encounter with a database, so, request you please ignore any stupidity and guide…

@ashish A quick one on indexes. Don’t always trust explain. It may not always give you the right answer. In this case it did. The optimiser will rarely, if ever, chose a multi column index. However if you tell it to, then it will :

With the throughput on your servers I find it strange they have decided to put the application on there too. That will only increase bandwidth and disk io. I assume there will not be many users involved

INDEX(a,b) will probably not be used for a SELECT that does not mention a.

I seem many cases where INDEX (a,b) is picked by the optimizer. Sometimes it will use only a from the index; the only clue of this is Key_len in the EXPLAIN.

EXPLAIN _usually_ reflects the query plan for the query, but it _may_ not.

> This base mysql version (5.1.56) supports partitions, but does not support partition pruning.
Eh? Is that a NDB deficiency? Pruning “always” works with InnoDB or MyISAM. (I quote “always” because there are cases where it is not as smart as you would like, but there are usually workarounds.)

> This is my first ever encounter with a database, so, request you please ignore any stupidity and guide
I pegged you as being relatively knowledgeable.

USE/FORCE INDEX — Do these only as a last resort. As the data changes, and the query plan needs to change, these can bite you.

DB and App on same machine — This _may_ not be a problem, especially if, say, the db is disk-intensive and the app is CPU-intensive. Do not allow swapping; that will kill MySQL performance.

“50 users” — a useless metric without knowing what they are doing. 1000 light users might not stress the machine, or a single user might bring it to its knees.

Is NDB involved or not? You have mentioned MyISAM and InnoDB; they have nothing to do with NDB.

@Rick
So,
I Have to modify all my queries to have at least time, class OR time, instance, class in all of them.
Shall keep in mind your suggestions about explain and force index use.

The mysql 5.1 Manual Section 18.4 says –>
This section discusses an optimization known as partition pruning, which was implemented for partitioned tables in MySQL 5.1.6
So, to test, I tried selects with the first where clause using partition column and without it. The results were same..

Coming to the application, (I cannot describe the exact nature and app due to various reasons, which I am sure all of you will appreciate)
The application in itself is much more complex than what I might have provided a picture of.
It uses
— 2 application servers, both running on 2 nodes (1 redundant). So, 4 app servers.
— 2 Databases (Separate mysqld instances)
a) Clustered NDB, on the two nodes mentioned above
b) InnoDb/MyISAM on the replicated DRBD partition on these two nodes. (This is what we have been discussing — do not have this in my test setup though)

The first application server does the job of only providing some static data from a bunch of saved files on disk.
This does not interact with the DB at all.

The second App server has to interact with both the DBs, storing records @ 800 rps. deletion from NDB database has to happen very frequently. Deletion from MyISAM/InnoDb would happen everyday for records older than last 60 days.

50 users would simultaneously access the database, with the 3 queries that I mentioned before, almost all the time. So, I guess that there would be a fair amount of load on the database (1 query per user per 1 min – worst case).

Asish,
When you say the queries ‘were the same’ are you convinced you weren’t reading from the cache on the second query? I suspect you were.

However, judging from your description if the application, database, and the fact you have been told to run this in one machine, I have to say that I think it us time to revisit the whole architecture and setup of what you are doing.
Personally, with more than 30 years of experience, I can see no logical reason for using 3 different flavours of MySQL in the same application. I also suspect that your deletion Methodology should be altered.
Sorry if this sounds harsh , but if this is really your first move into database usage at this level , I would strongly suggest you get an experienced DBA on board to give you a hand.

Just wanting to ask you, i got a MySQL table with about 1.200.000 rows, I want to partition it, but I don’t know what’s the best way. My table schema is: rowId (int PK), content (longtext), content_URL(text). I’m searching in the table by content_URL, as much as I understand I should set content_URL as “key”, but I don’t understand by what to partition because this is a text, I know how to partition by int (its simple), but by a text field? I want to speed up my query’s…

Let’s see SHOW CREATE TABLE (without partitions), and the various queries (SELECTs, DELETEs) that you are afraid won’t perform well. If your only search is by content_URL, PARTITIONing will buy you nothing.

What Rick means is that without an index on url (the query you quoted was:
SELECT word FROM tags WHERE url = ‘sample-url’)
the db will start at row 0 (zero), and read, row by row down teh table till it finds a match. If the match is the last row in the table it will read all 1.2million rows. This is called a ‘full table scan’…..and is absolutely something you should never ever do, unless your table has 10 rows.

If you do something like:
create index xurl_indx on table(url(50))
this will; create an index on the first 50 characters of your url, which would then be used in the query and would make life a lot easier for your DB

I had a similar situation where in lookups in master countered the partition effect for writes. I took an alternative approach – The partition on master was different from slave. The slave partition was made to satisfy the read queries for which the partition pruning kicked in. However on the master, without partitioning, the rate of inserts fell as the size of the table increased. So by making sure that if the size of a single partition is less than innodb buffer pool size than we got a constant through put for inserts on the master. Yes we invited maintenance overhead as the partition structures were different and had to be careful enough for upgrades and patches. Is this a good approach? any suggestions?

Michael — Are you sure the Query cache did not kick in? Try SELECT SQL_NO_CACHE (etc).

Raghu — I’m surprised you don’t have trouble on the Slaves. Remember that every INSERT on the Master must be also done on the Slave(s). Are you using SBR or RBR? Would you share your two CREATE TABLEs so we can get a better feel for what you are saying.

In theory you will see a significant increase in your Selects, as you will be querying on a smaller dataset. You shoould have a faster insert as well, as you will be adding into a smaller dataset/index. Partitions are effectively a ‘chunked’ table, which requires the DB finding out which ‘chunk’ to work on, then inserting or reading your data.

The only area you need to be careful of is if you are issuing queries against this table, but NOT using the id column. This could cause a longer running query, as the engine needs to read through all the ‘chunks’.

John, Rick… I feel a discussion coming on 🙂 .
I actually gave the short answer, Rick has a point but there are generally other considerations to take into account. He also didn’t answer the ‘Insert’ part.

1. With a standard numeric key id, set as a Primary Key, it will probably be quicker, or the same response, on a normal table as against a partitioned table.
2. The Insert should be quicker on a partitioned table, as it will be adding to a smaller index, and a smaller ‘chunk’ od table. Depending on how often you write to the table, you may not see the difference.

However….

1. If you have uneven reads across the DB (most appearing in one section …latest writes etc ), it will probably be quicker in the long term to read from a partitioned table, as, depending on the size of the partitions, the whole partition can be cached, as opposed to a single table, where probably only recently read records will be cached, resulting in more disc reads.

2. If you need to delete records from this table, those will be quicker, as they will come out of the partition, not the whole table.

3. Insert on Duplicate Key (I dont do replace… ever)… if across partitions, will be slower, if in the same partition will be faster.

4. If you do queries on any other column, these could easily be quicker on a partitioned table, as Mysql can fire off one thread per partition (at least I have seen this happen in a MyIsam environment on similar sized tables).

What I am really saying is that its not only about the ‘Selects’ and Inserts, its also about what you do with the table. Removing partitions is easier and quicker than deleting a few hundred thousand rows… for example, if you need to delete.

However, thank you Rick for pointing out my error in the original post…. it was misleading, but wasn’t meant to be…

You pointed out an important use case for PARTITIONing (“uneven reads”, Re: #1); let me rephrase the use case —
* The entire table is too big to be cached, but
* One partition is small enough to be cached, and
* Your accesses are to that one partition.

In one instance, I take advantage of that — I have 10 years of data, but usually queries hit the last week or two. I have only 20 partitions; they are PARTITION BY RANGE (to_days(…)). However, the ranges are uneven — older partitions cover larger timespans. The last few partitions are one week each. So, even if the user’s ad hoc query leads to scanning the entire last two partitions, performance is not bad. (The last 2-3 partitions tend to stay in the buffer_pool.)

If I recall correctly, any INSERT/DELETE/etc always opens all partitions, even before thinking about pruning. Seems like this is being fixed in 5.6; not sure about Percona or MariaDB.

(Re #4) There are _no_ parallel queries (within a single connection) anywhere in any ‘free’ variant of MySQL. PARTITIONs are scanned one at a time. (Please provide specific references if you find otherwise.)

I would argue that single-row queries (SELECT, INSERT, DELETE, UPDATE) are similar in speed between PARTITIONed or non-PARTITIONed. When PARTITIONed, first it has to find and open the required partition, then it drills down a BTree that might be one level shallower than the non-partition equivalent.

Even single-row writes are likely to be slower in PARTITIONed tables unless you include something to facilitate pruning.

A billion rows in a BTree is only (about) 5 levels deep. If you split that table into 100 partitions, the BTree in each partition would be about 4 levels. A “point query” in a billion-row (non-partitioned) table will, at worst, hit the disk only 5 times. (Usually 4 of the 5 are cached from previous queries.)

(Re #2) I have rambled on about things as a preface to rebutting “delete records from this table, those will be quicker, as they will come out of the partition, not the whole table”:
* If the DELETE is a single row based on a UNIQUE/PRIMARY key and it can prune, the delete involves (1) find partition (if partitioned), (2) drill down the BTree, (3) remove one record. PARTITIONing makes very little difference.
* If it cannot prune, then partitioned would be slower, since it would attempt the DELETE in all partitions.
* A ‘range’ delete would mostly follow above two points.
* Secondary indexes are also BTrees, so they need updating (although delayed). Yes, their BTrees may be one level shallower when partitioned, hence slightly faster.
Conclusion: Not much difference.

Yes, DROPping an ‘old’ partition is very efficient, much more efficient than DELETEing the equivalent rows. This is, in my experience, the main use case for PARTITIONing. I go into more details (and code) here:http://mysql.rjweb.org/doc.php/partitionmaint

The third use case for PARTITIONing involves 2D-like indexing. I have not found a 4th use case.

When you use partitions are the columns in the partition, if using HASH|KEY method, stored redundantly in each partition? It seems to me since the values of the KEY would be uniform over the partition that this would be unnecessary and these column values could be restored during queries based on the partition they come from. Not so of RANGE partitions, etc,of course.

Thanks Rick, I knew they were implemented as basically separate tables, I thought perhaps it still may be optimized out. I am not sure why it would *not* waste much space, I have some tables that have a 24 byte row length, with a 12 byte PK. I could save about 6 bytes or 25%…nothing to sneeze at! 🙂

I have a requirement that maintaining of daily transactions information of our application in tables for logging purpose and also we are having tools for analyzing the logs based on that tables .

The problem is , we have one table let us say ‘x-table’ one day size is 2GB. we are merging the data every day by using ‘ insert into x-total-table select * from yesterday.x-table’. In this way we are merging the data. then size of x-total-table becomes very huge and merging also takes long time.

Hence, I’m thinking about a solution to get rid these two problems 1) DataSize , 2) Merging Time, For that i would like to create day-wise tables and based on that i will like to create merge table.

I’m Percona’s community manager. Thanks very much for your question. Our discussion forums are a more appropriate place for questions like the one you have. The great thing about the forums is that you can get advice from the community in addition to just Percona experts. https://www.percona.com/forums/

I have this table below , which is partitioned and i need to extend the number of partitions beyond 2016-07-26. Let’s say i want to create partitions untill 2020-01-01. In order to achieve this i must delete the old set of partitions and recreate a new set ? I don’t really understand how this stuff works , and i’m afraid not to mess things up somehow. When these partitions were created , the original table event.Event was not affected ? I mean , the partitions and the original table coexist ? or the original table was altered and the data spread across these partitions ?
Can someone please help me to do this ?

There needs to be a mechanism for time series partitioning, to have a small write partition which is merged into a larger archive partition and then the archived moved forward and a new daily, hourly, whatever you need for write speed partition created

Imagine starting with partitioning by performing the following partitioning.

Now, each day after the first of October, you’d like to have a small partition for inserting into, each day (our experience is that it’s roughly an order of magnitude faster when the innodb_buffer_pool is a write through cache).

Given any particular day moving forward, you would have a partition for each major interval you wish to archive on (month here), and you have a partition for this month with everything from yesterday to the beginning of the month, todays partition, and the future partition.

The exchanges above, which provide the ability to move yesterdays data into the archive for the month, hide the archive data for the month from queries. Realistically, we just need to be able to copy the records from yesterday, into the month archive, exchange the two partitions to bring the archive forward to yesterday, and remove the small partition for yestedays data.

All of this should be capable of being done without locks and without blocking writes, as the affected partitions are not going to be written to (only future is written until we create todays partition).

Reorganizing partitions is painful in write heavy applications and timeseries data in particular has nice windows of "write" vs "read only" access. If a partition could be marked as "read only", that would allow the reorganization to happen without worries of mutable content that could corrupt the data traversal and movement.

The copy of data which violates the key on the partition (copying yesterdays data into the archive partition) is only an issue related to it not being visible in the archive partition, until the keying on the partition is changed. The exchange of partition and table is painful because of key analysis blocking write access to the partition while the comparison of keys is being done. The table should be locked, but not the partition that will be swapped to. If it is deleted before the swap occurs, then fail at the time of the actual swap.

Somehow, this kind of "small write partition" with "large archive partitions" needs to be readily accounted for in MySQL partitioning.

An additional detail to this, is simply that if I am partitioning on a to_days(date) key, the swap of a partition and table takes a full table scan to evaluate whether the table fits into the partition. There is ample opportunity for optimization here. Each of these functions needs an internal function which can interact with the partition swap to see that the keyset works just fine. In the case of to_days, all we need is the minimum and maximum date values from the primary key to discover that the to_days(date) function returns values that fit within the partitions keyset. It takes a very long time, currently, for the partition to swap with the table. It appears to me that a full table scan is happening, given the time. In the above implementation, the trailing archive table is growing larger and larger, which means that over time, the delay for write access back to the table that the partition is in (timeseriesData table) will create a huge back log of data meaning that the applications using the data, will have a delayed view of reality which is a huge problem for mean applications using timeseriesData.