30 September, 2018

Unlike Range or List Partitioning where you define the rule which identifies which Partition a row will be inserted into (based on the value in the Partition Key Column(s)), Hash Partitioning relies on Oracle applying a "hashing formula (algorithm)" to "randomly" distribute incoming rows across the available Partitions in the table.
This would be useful when you want to break up a table into smaller physical segments (maybe into even separate Tablespaces on different disks) without considering grouping of data. In Date based Range Partitioning data is grouped into different Partitions -- i.e. physical segments on disk -- based on the Date value (e.g. by Month or Year). In List Partitioning, data is grouped based on the value in the Partition Key Column.

In this definition of the table, I have "randomly" distributed incoming rows across 4 Partitions in 4 different Tablespaces. Given the incoming "data_item_number" values (either machine generated or from a sequence), each of the 4 Partitions would be equally loaded.
(In contrast, in Date based Range Partitioning of, say, a SALES table, you might have fewer rows in older Partitions and an increasing number of rows in new Partitions as your business and Sales Volume grow over time !).

Unlike Range Partitioning, Hash Partitioning will not perform well for a "range based query" (e.g. a range of sales dates or a range of data item numbers). It is suitable for "equality" or "in-list" predicates. If you do need a range based query, you would need a Global Index.

Note that it is advised that you should use a Power of 2 for the number of Hash Partitions.

Note that I have inserted the 10,000 rows from a single session. In the real world, you would have multiple sessions concurrently inserting rows into the table.
Based on the Hashing algorithm that Oracle used (note : this is internal to Oracle and we cannot use any custom algorithm), Oracle has more or less evenly distributed the incoming rows across the 4 Partitions.

(The first two queries returned rows with values greater than 8000 simply because I didn't specify a range of values as a filter and those rows came from the first few blocks that Oracle read from the buffer cache).
Note how the DATA_ITEM_NUMBER values indicate "near-random" distribution of rows across the Partitions. It is likely that if I had created multiple sessions concurrently running inserts into the table, distribution of the rows would have been even more "random".

Note how the P_MISCELL Partition can host multiple values for the REQUEST_STATUS column.
The last Partition, has is specified as a DEFAULT Partition (note that DEFAULT is a keyword, not a value like the others) to hold rows for REQUEST_STATUS for values not mapped to any of the other Partitions. With List Partitioning, you should always have a DEFAULT Partition (it can have any name, e.g. P_UNKNOWN) so that unmapped rows can be captured.

If you go back to my previous post on Row Movement, you should realise the danger of capturing changing values (e.g. from "SUBMITTED" to "RUNNING" to "COMPLETED") in different Partitions. What is the impact of updating a Request from the "SUBMITTED" status to the "RUNNING" status and then to the "COMPLETED" status ? It is not simply an update of the REQUEST_STATUS column alone but a physical reinsertion of the entire row (with the consequent update to all indexes) at each change of status.

After the INSERT, I realise that the year in the SALE_DATE is wrong -- it is 2019 instead of 2018. I need to update the row to set the year to 2018.
(Since the SALES_DATA table is partitioned to have a separate Partition for each year, this row has gone into the P_2019 Partition).

The ALTER TABLE ... ENABLE ROW MOVEMENT is a DDL command (needs to be issued only once to allow any number of subsequent updates to the tables rows) that allows a row to move from one Partition to another Partition. In this case, the row moved from P_2019 to P_2018.

Moving rows from one Partition to another Partition is expensive. Each row moved in such a manner results in
(a) marking deletion of the row from the original Partition
(b) physically inserting the *entire* rows (irrespective of length of the row) into the new Partition -- not just the SALE_DATE value but every column has to be written into a block in the new Partition
(c) updating *every* index (Global or Local) on the Table

Edit 14-Sep-18:Also see two earlier posts :"Enable Row Movement"and"Enable Row Movement with MSSM"
That is why it is not a good design to have frequently updated Partition Keys resulting in a row moving from one Partition to another. You may have to reconsider the Partitioning definition or data and transaction flow in the application.

(Do you know where else ENABLE ROW MOVEMENT is required ? There are other cases, not related to Partitioning, where you may have to ENABLE ROW MOVEMENT for a table. By default when you CREATE a Table, ROW MOVEMENT is not enabled unless you explicitly enable it).

08 September, 2018

Oracle 12c has introduced a new feature called "Partial Index" whereby selective partitions of a Table are indexed. This is useful, for example, where you have a large historical table and you know that older Partitions are infrequently accessed and no longer need to be indexed. For such tables, you can afford to "lose" the index for these older Partitions.

How would you do this in 11.2 ?

Let me go back to the SALES_DATA table with data from 2016 to 2018 populated. This is the status of the index partition segments :

You will notice that although the P_2016 Partition in the Table has data, the corresponding Index Partition no longer has a segment -- no space is allocated to it (although the logical definition of the index exists). This is possible with the "deferred_segment_creation" parameter set to TRUE in 11g.

In fact, you will notice that although the table has Partitions for 2019 and 2020 and MAXVALUE, corresponding Index Partition Segments do not exist (because no data has been inserted into those Table Partitions yet) !