How to Delete Data From a DB2 Table and Release Disk Space

First, a disclaimer. This process is only going to work for tables in AST tablespaces. It also will only work for tables that are not MDC, and do not have a clustering index. It will work for most WebSphere Commerce 7 tables. This is intended as a simple instruction set – assuming you want to delete a bunch of data from a table, and then release some space back to the filesystem.

Delete Your Data

The first step is obviously to delete the data. Depending on the amount of data you’re talking about, and what tables, this could be difficult and time-consuming and use a bunch of log files. In databases with a lot of RI (like WebSphere Commerce databases), some deletes take a very large amount of time (like deletes from MEMBER or USERS in a WebSphere Commerce database). If it’s an option to delete everything, consider using the TRUNCATE TABLE command.

If I simply have to delete a large amount of data, I use a trick with MOD to split up the deltes. It probably also slows them down, but if you have a unique or evenly distributed numeric field, it works well. First, I figure out how many rows I want to delete, then divide that number by the size of each delete – I usually go with 10,000 or even less if there’s a lot of RI on the table. Say this comes up with 4 – then I execute delete commands that look like this:

delete from wscomusr.staglog where stgprocessed!=0 and stgfilter is null and stgstmp < current timestamp - 14 days and mod(stgrfnbr, 4) = 0;
delete from wscomusr.staglog where stgprocessed!=0 and stgfilter is null and stgstmp < current timestamp - 14 days and mod(stgrfnbr, 4) = 1;
delete from wscomusr.staglog where stgprocessed!=0 and stgfilter is null and stgstmp < current timestamp - 14 days and mod(stgrfnbr, 4) = 2;
delete from wscomusr.staglog where stgprocessed!=0 and stgfilter is null and stgstmp < current timestamp - 14 days and mod(stgrfnbr, 4) = 3;

Of course, this assumes you leave autocommit on. My data pruning script automatically generates the right number of statements. But you can also fairly easily use Excel and the concat function to generate the statements if you have a large number to generate. I'm currently running a loop with 14,670 groups on one of my servers.
You can delete from a select like this:

delete from (select * from wscomsur.staglog where stgprocessed=1 and stgstmp < current timestamp - 14 days fetch first 1000 rows only);
delete from (select * from wscomsur.staglog where stgprocessed=1 and stgstmp < current timestamp - 14 days fetch first 1000 rows only);

My data pruning script automatically generates the right number of these, but as the helpful commentor notes below (thanks, Ian!), you can also loop through until you get SQLSTATE 02000. This method is an order of magnitude faster than the old method I offered, and still breaks up data to prevent filling up your active transaction logs. I used this method for the same delete I was running for about 7 million rows in the STAGLOG table, and it completed in less than an hour what had been taking days - so it's at least an order of magnitude faster. I'm altering my data pruning scripts to use it.

Reorg the Table

After your delete is complete, assuming you're using AST tablespaces (DMS are similar), you will not see any change in the amount of space taken up in your filesystem. In order to see the difference, you must first reorg the table. You can do this online or offline. The advantage of offline is that it takes less time. The disadvantage is that you need up to 3 times the size of your table to reorg it, which may or may not be reasonable depending on your situation. The advantage of online reorgs is that you don't need a large amount of free space to reorg it. However, online reorgs can use a vast amount of transaction log space - with the table I'm currently working on, I have to compress my archived transaction logs every few minutes while an online reorg is running to avoid filling up my logging filesystems. The syntax for an online reorg looks like this:

In the second row of output, you're looking at the numbers towards the end (scroll to see them above) - in this case, 531704 and 531704. The output above is after the reorg completed, so it says "Done". When it's in progress, you'll see "Started", and the first of those two numbers will be lower than the other. In the case of a delete, the reorg will complete before the first of those numbers reaches the second. In this particular reorg, that second number started above 600,000. The numbers represent the number of pages processed and total number of pages to be processed.

Now, remember that an online reorg is not 100% online unless you also specify the "notruncate" keyword on the reorg. However, in this case, the whole point of the reorg is to tructate the unused space, so we must be able to tolerate the short lock drain and exclusive lock at the end of the online reorg.

Also, remember that you should do a runstats both before AND after the reorg. My favorite runstats syntax is:

db2 runstats on table STAGLOG with distribution and detailed indexes all

Reduce the Tablespace

After the post-reorg runstats completes, you'll want to execute the command to reduce the tablespace size. Assuming you're using AST tablespaces, you'll want to do:

Obviously, you have to know which tablespace a table is in for this to work. The TBSPACE column of SYSCAT.TABLES can help you find this, if you don't already know.

Lather, Rinse, Repeat

In my case, I have a bunch of deletes I'm doing for a single multi-million record table. Unfortuneately, I can only run deletes between 11am and about midnight. I can only delete a million records a day. So every morning I kick off the deletes, and every evening, I stop them and run a reorg and runstats. My other option would be to arrange an outage for the table (which is possible on this table), and export only the data I want to keep, then import/replace it back in. This particular situation is right on the cusp of where that's worth it. I can complete the deletes in a week, so I'm going with the daily restarts of the process.

Ember is always curious and thrives on change. Working in IT provides a lot of that change, but after 17 years developing a top-level expertise on Db2 for mid-range servers and more than 7 years blogging about it, Ember is hungry for new challenges and looks to expand her skill set to the Data Engineering role for Data Science. With in-depth SQL and RDBMS knowledge, Ember shares both posts about her core skill set and her journey into Data Science.
Ember lives in Denver and work from home for XTIVIA, leading a team of Db2 DBAs.

I guess I’m in the habit of the before runstats – for normal regular maintenance, it would go runstats, reorgchk, reorg only tables needing it, runstats(on reorged tables), rbind, but I guess if you take the reorgchk out of the mix (as in this scenario), then you might not need the before runstats either.

I have refactor’d DBClean. I use session.InStorageTables to store the Primary Keys of the selected ‘eXpired data’. Spin around the session table of PK’s PrepareStmt & commit in a LUW. Nice & fast, central logging, limited concurrency issues.