Background

Have you ever seen:"SQL0670N The row length of the table exceeded a limit of "<length>" bytes. (Table space "<tablespace-name>".)"

When creating or altering a table DB2 imposes limits on the total worst case size of a row.This size depends on the page size of the tablespace in which the table resides:

4005 bytes in a table space with a 4K page size

8101 bytes in a table space with an 8K page size

16293 bytes in a table space with an 16K page size

32677 bytes in a table space with an 32K page size

But when you create a table how do you know how big its signature will be? How much space do you have left in the page to add additional column without having to migrate to a bigger pagesize?

How many rows can fit at least into the page?

To answer these questions there is a substantial table in the CREATE TABLE documentation of the Information center.So it's either "Good luck!" to you in wading through it with a notepad, sharp pencil, and an eraser at hand.Or you can just use the handy routine I have scribbled up below.

Algorithm

In the end the algorithm isn't all that complex:

For most data types such as numbers, strings and datetime the SYSCAT.COLUMNS.LENGTH column actually contains the physical length in bytes within the row.

DECIMAL is the off man out because here the LENGTH column contains the PRECISION.Since the data type is implemented using BCD formatting that works out to a byte for every two digits.We need to round up for odd number of digits of course and we need to also store a sign.In the end that leaves us with (precision / 2) + 2 bytes.

Long data types which are stored in LOB space use LOB descriptors.So their actual length is irrelevant for the row size. Their footprint within the row is stored in the INLINE_LENGTH for base tables.For non views however the value needs to be derived from the LENGTH.

Variable length types such as VARCHAR or LOBs are not stored in a fixed position within a row. Therefore two bytes for a row offset are required as well as two more bytes for their actual length in the row.

If the type is nullable, then an extra byte is needed to store the NULL

In that format variable length data types take two bytes less and fixed length types take two bytes more.In addition the fact that the row is compressed costs two more bytes per row.Sounds confusing? Keep in mind that we are computing the defined row size here.This is the worst case length. The goal of value compression is to reduce the average, actual row length.

What about SQLCODE -670?

The example above displayed the row size for existing tables.But when you get a SQLCODE -670 "table to wide" you never actually had a table.How to find out by how much the row has "overshot"?The easiest way to do that is to create a view with that same signature and then measure its rowsize:

Many organizations are struggling to comply with data security and compliance mandates, while also reducing costs. With the new row-permission and column-mask features, IBM DB2 10 takes security and ease of use to the next level.

Join IM Information Management Chief Security Architect Walid Rjaibi for a deep dive technical discussion of the new Row and Column Access Control (RCAC) features in DB2 10 for Linux UNIX and Windows and InfoSphere Warehouse 10. Walid will also teach you proven methods to implement data security in the most cost-effective way.

On a personal note I will not be the host of this particular webcast because I''ll be on my way to the airport in Bangalore at that time, heading home from a three week India trip. Rav Ahuja will make a guest appearance as returning host to fill in.

Intro

Anyone who has ever attended a talk of mine on SQL compatibility has heard my claim that"to support Oracle applications in DB2 we had to add a completely new date-arithmetic library of functions since no two functions operating on date appeared to be the same in DB2."
Well, it appears PostgreSQL has its own set of functions yet again.
So, sooner or later a request for the DATE_PART() function in DB2 was bound to pop up.
Not having PostgreSQL available to test for an exact match, here is what I came up with:

DATE_PART(<measure>,<timestamp>)

CREATE OR REPLACE FUNCTION date_part(measure VARCHAR(20), datetime TIMESTAMP)
RETURNS BIGINT
SPECIFIC date_part_timestamp
DETERMINISTIC NO EXTERNAL ACTION CONTAINS SQL
RETURN CASE UPPER(measure) WHEN 'CENTURY' THEN (EXTRACT(YEAR FROM datetime) + 99) / 100
WHEN 'DAY' THEN EXTRACT(DAY FROM datetime)
WHEN 'DECADE' THEN EXTRACT(YEAR FROM datetime) / 10 WHEN 'DOW' THEN DAYOFWEEK(datetime) - 1
WHEN 'DOY' THEN DAYOFYEAR(datetime)
WHEN 'EPOCH' THEN (DAYS(datetime) - DAYS('1970-01-01')) * 24 * 60 * 60
+ EXTRACT(HOUR FROM datetime) * 60 * 60
+ EXTRACT(MINUTE FROM datetime) * 60
+ EXTRACT(SECOND FROM datetime)
WHEN 'HOUR' THEN EXTRACT(HOUR FROM datetime)
WHEN 'MICROSECONDS' THEN EXTRACT(SECOND FROM datetime) * 1000000
WHEN 'MILLENIUM' THEN (EXTRACT(YEAR FROM datetime) + 999) / 1000
WHEN 'MILLISECONDS' THEN EXTRACT(SECOND FROM datetime) * 1000
WHEN 'MINUTE' THEN EXTRACT(MINUTE FROM datetime)
WHEN 'MONTH' THEN EXTRACT(MONTH FROM datetime)
WHEN 'QUARTER' THEN QUARTER(datetime)
WHEN 'SECOND' THEN EXTRACT(SECOND FROM datetime)
WHEN 'WEEK' THEN WEEK_ISO(datetime)
WHEN 'YEAR' THEN EXTRACT(YEAR FROM datetime)
ELSE RAISE_ERROR('78000','Unknown measure') END;
/

Conclusion

I think what I provided here is a reasonably close approximation to PostgreSQL's DATE_PART() function.
There is one difference which I am aware of:
SECOND does not return tenth of seconds.
This could be corrected easily enough by switching from BIGINT as a RETURNS type to DECIMAL(20, 1).BIGINT() cast will need to be added for all measures other than SECOND which may produce fractions. However I decided against it in favor of processing speed.
If you find any wrong results beyond this one, please let me know and I'll try to fix.

The new PureData expert integrated systems are optimized for delivering data applications with simplicity and speed.

Join the architect of the PureData transactional and operational warehousing PureData products for a look at these new offerings, which feature DB2 for Linux as a core technology. We will explain how which type of workload goes on which system, how to leverage patterns, factory-optimized scalability and much more to speed deployment, ease administration and reduce development efforts.

Background

When our team added compatibility for DB2 with Oracle applications we learned that no two functions were compatible between Oracle and DB2 as far as date-time arithmetic was concerned.
We ended up adding an entirely new library to achieve compatibility.
It appears the difference between DB2 and MS SQL Server is equally profound in this area given that I was recently asked to provide a match for the DATEADD() function.

DATEADD()

This TSQL is a bit of an odd one.
It takes three arguments:

A "measure"The measure can be anything from a year to a nanosecond including weeks, weekdays and a variety of shorthands and synonyms fro these.

A 'delta"The delta is a 32bit integer value of that measure which denotes how many units of the measure are to be added or subtracted.

A "date"This can be either a timestamp or a date to which the delta measures are to be added.

If the "date" is a datetime (TIMESTAMP(0) then rounding can occur in so far that split seconds are rounded up or down to the next full second.That part is fine.

The problem arises with the fact that the "measure" argument is not a string. It is one of many keywords.So instead of DATEADD('HOUR', 3, CURRENT DATE)you must specifyDATEADD(HOUR, 3, CURRENT DATE)

While it's easy enough to provide a UDF in DB2 which matched DATEADDs function in DB2 handling the keywords requires a small amount a trickery.It is this trickery which I find justifies a blog post..

The DB2 solution

For simplicity I will concentrate on a generic DATEADD() function for TIMESTAMP(9).If you want to mimic the datetime (TIMESTAMP(0)) I do recommend a DISTINCT type.That will allow for overloading of DATEADD() and thus proper implementation of the rounding semantics.

CREATE OR REPLACE FUNCTION MSSQL.DATEADD(datepart VARCHAR(20),
number INTEGER,
date TIMESTAMP(9))
RETURNS TIMESTAMP(9)
NO EXTERNAL ACTION DETERMINISTIC CONTAINS SQL
RETURN CASE datepart WHEN 'year' THEN date + number YEARS WHEN 'quarter' THEN date + (number * 3) MONTHS
WHEN 'month' THEN date + number MONTHS WHEN 'week' THEN date + (number * 7) DAYS WHEN 'day' THEN date + number DAYS
WHEN 'hour' THEN date + number HOURS
WHEN 'minute' THEN date + number MINUTES
WHEN 'second' THEN date + number SECONDS WHEN 'millisecond' THEN date + (number * 1000) MICROSECONDS WHEN 'microsecond' THEN date + number MICROSECONDS
WHEN 'nanosecond' THEN date + BIGINT(number / 1000.0) MICROSECONDS
ELSE RAISE_ERROR('78000', 'Unknown datepart: ' || datepart) END;
VALUES MSSQL.DATEADD('year', 2, CURRENT TIMESTAMP);1-----------------------------2014-10-14-19.31.45.328000000

1 record(s) selected.

We have the right functionality, but how do we get rid of the single quotes?If the function is invoked from an application adding the quotes can get quite ugly.Luckily DB2 introduced "global variables" in DB2 9.5 (I think it was 9.5 - so long ago).So what we can do is define global constants which act as keywords to our function:

Conclusion

It is often fairly easy to extend DB2 to match functions from an other SQL dialect.In this case the use of keywords in TSQL can be compensated easily through the usage of global constants.What remains to be said, for completeness, is that these constants can of course be used anywhere in the database where variable are allowed.

They obey the rules for scoping. So beware of using column names which match these keywords in conjunction with the DATEADD() function:

Background

Roberto from Italy sent me the following question:Is there something like the STATS_MODE() function in DB2?He says STATS_MODE() is supposed to return the most frequently occurring value from a multi-set of values.

A bit of rifling through the internet, steering clear of zones forbidden to me, reveals that STATS_MODE is an aggregate function similar to SUM() or AVG().The function returns the most frequent value within a group.

If there are two equally frequent values one of them will be picked with no specific rule.

DB2 does not have a function by that name (It's a non intuitive name to me I must say).Neither does DB2 have a function under another name that does the same thing.

But how hard can it be to solve the same problem using common SQL structures? Let's take a look.

Scenario

I couldn't resist the subject line, so my scenario will, for once, not use employees.

Imagine a bunch of kids and suburbs.We are social workers (or social marketing experts staring at Facebook...) trying to find the key influencers in each neighborhood.

Things are clearing up a bit. Looking at the result we know we want: Mona, Charley, Quentin, Anton and Olga.
How can we filter them out? We need to get the first row of this result set for each suburb.
For such tasks I always like the ROW_NUMBER() OVER() function.

So STATS_MODE() seems like a handy function if you do this sort of thing a lot.
Now, is the solution we have above a workaround, an emulation perhaps which is slower than the real deal would be?
For that we should look at the explain and see what DB2 does with this query

Well, I can't say for certain that this is the best possible plan. Of course having the right index could eliminate SORT(7).But there may be a smart way to avoid SORT(4). All we want is to preserve the row of the maximum "CNT" per partition.And the partitions are already sorted.

Join Chris Eaton and myself for a discussion of DB2 Compression capabilities.

DB2 has been supporting compression for several releases and we have
continuously improved the story and stayed ahead of the competition.He will compare DB2 10 for LUW to Oracle Database and other industry databases, explaining the features and advantages that help to maximize valuable storage resources.

Motivation

Yesterday one of my coworkers from India asked for my help in migrating the following SQL statement from Sybase to DB2.

SELECT DISTINCT c1, c2 FROM t ORDER BY c3

When this statement is run in DB2 (assuming the appropriate definition of "T") the following error is being returned:

SQL0214N An expression in the ORDER BY clause in the following position, or starting with "C3" in the "ORDER BY" clause is not valid. Reason code = "2".

Why does DB2 raise this error?

The reason is simple: It's not self evident what the answer should be!

To look into this issue a bit close and come up with a possible interpretation and implementation of the request we look at an example

Guestbook

My pals and I like to go out for lunch to a chine mall next to IBM. Most of the time we go the same vendor, so you might say we are regulars. The ladies taking the orders know us, and if one of us doesn't join we get a jovial: "Where is your friend today?"

As a business it pays to keep track of your regular guests. If they don't come for a while the reasons could be innocuous (vacation, sickness, travel etc) or they could be relevant to you as the owner. Maybe the food isn't as good anymore, or the menu has gotten boring. So a reasonable query may be:

Well, we now know that Jeremy visited a long time ago, but he actually keeps coming back.
Gerald, on the other hand came only once and never returned.
What happens when we add a DISTINCT to eliminate the duplicates?

SELECT DISTINCT name, firstname FROM guests ORDER BY visit;
SQL0214N An expression in the ORDER BY clause in the following position, or starting with "VISIT" in the "ORDER BY" clause is not valid. Reason code = "2".

Right... we knew this would happen. To get this query to compile we need to separate the ORDER BY from the DISTINCT.

The ORDER BY got lost in the outer query. This is to be expected and according to the rules of SQL.
The output is actually sorted by name which is a side effect of the DISTINCT processing.
We could flip the order around and first do DISTINCT, but then we would have to include the "VISIT" column which would only cause us to eliminate
duplicate same-day visits.

So, are we stuck? Not at all! There are two ways to attack this problem. One without and one with OLAP:

Going old school

The classic approach requires us to do a join. We will look up the most recent visit per customer and eliminate all other rows of that same customer:

That did the job, but it requires a nested loop join. Although an index on (name, firstname, visit DESC) might to wonders to speed up this query.

OLAP to the rescue

Another approach to solve the problem uses some very basic OLAP function.
Essentially we will number each persons visits from the newest to the oldest starting with "1" and then eliminate everything but the first entry per person.
If that sounds familiar you are right. In this blog I did use the technique before for de-duping.In the end this is precisely what we are doing here.

So how does this relate to DISTINCT and ORDER BY again?

Note how in the previous examples I used MAX(visit) and ORDER BY visit DESC?
That was appropriate for the business problem. We picked the most recent version of the (name, firstname) tuple for the distinct processing.
But we could have taken the oldest (MIN(visit) and ORDER BY visit ASC respectively).In fact a smart DBMS could process these semantics much faster since the same ordering used for ROW_NUMBER can be used to produce the result set.We could also legally have taken the "first" or "last" row touched.
The SQL Standard does not tell which semantics would be right, so it's safer to return an error, then what might be perceived as a wrong result.

In this episode of the DB2 Tech Talk Series Cliff Leung and Holly Hayes will discuss how you can use Optim Query Workload Tuner and Optim Performance Manager to easily get to the bottom of common database problems.

For those who have not looked at OPM and OQWT in a while, prepare to be amazed.

We hope to see you at this very informative Tech Talk on Thursday August 30th at 12:30PM - 2PM ET!