Placement of the Varchar Columns

Jorg Lueke

Placement of the Varchar Columns

December 23, 2004 10:50 AM

List Members,

I am getting confused the more I look into the recommended
placement of
Varchar colunms fro DB2 for z/OS V7. In the admin guide, the
recommendation
for read access of all types is to put varchars at the end. This
is
undisputed. When we talk about tables that are updated frequently
the
recommendations change based on what I read.

One company put out Top 20 performance myths and in it one of the
myths is
that Varchars should always be placed at the end of the row. They
argue
that frequently updated columns should be placed after Varchar
columns
because varyling length rows are logged from the first updated byte
to the
end of the row. Thus why log a whole varchar? However the admin
guide
tells a slightly different story:

"For varying-length rows, data is logged from the first changed
byte to the
end of the last updated column. (A varying-length row contains one
or more
varying-length columns.)"

Based on this statement as long as the Varchar is not between
updated
columns it won't get logged.

Question #1: For varying length rows is the update logged from the
first
changed byte to the end of teh row or from teh first changed byte
to the
last upadted column.

Interestingly, the admin guide then goes on to say:

"Keep all frequently updated
columns near the end of the row.
However, if only fixed-length
columns will be updated, keep
those columns close to each other
at the beginning of the row."

Question #2: What is the point of this statement? If I am updating
only
fixed length columns I can keep the varchar at the end. So, why not
simply
say Varchars should always be at the end of the row? Based on the
statemtn
above it's just saying keep updated varchars after non-updated
varchar, or
am I missing something here...

Rob --- Sr. Database Administrator --- CFS Crane

By placing columns that get updated together as a set, you can
limit what has to be logged. V7 introduced the ability to not have
to log to the end of the row for varchar tables assuming conditions
are meet (TS not using compression, table does not have edit proc,
the update of variable column does not change length).

The attached PowerPoint slide might help. The notes below are some
items we try to follow. There are many things that prevent the
ideal situation from occurring in the real world, and the
guidelines below reflect the world I get to work in so may not
pertain to your environment.

1) Instances where column placement does not effect what is logged
for an update, since the entire row is logged in these
scenarios.
a. Tablespace has compression turned on.
b. Table has an editproc.
c. Update of variable column changes length (increase or
decrease).
2) Column placement to minimize the impact to the business should
be a goal.
a. Business needs may change assumptions about what we consider to
be "static" varchar columns. As the business grows it is not safe
to assume what is a static varchar today will be a static varchar
tomorrow.
b. Column groupings or "sets" can change with application releases
and are difficult to flush out.
c. For existing structures ease of maintenance along with shorter
change outages favor adding columns to end of existing tables
rather than unloading, dropping, recreating and loading the new
structure to achieve the column placement that might help log
performance.
3) New Table Creates
a. Update groupings not yet established.
b. Presenting a consistent methodology to the programmers we
support is desirable.
c. VARCHAR will be placed at end of the table, unless those columns
are part of the primary key, in which case they will be placed in
primary key order at the beginning of the table.
d. Tablespace compression should be favored. Going forward we
should show why "not" to use compression, instead of why to use
compression.
i. Exception for image data.
ii. Exception for data that does not compress well(DSN1COMP
figures).
iii. Compression caused a performance problem.
1. Some update intensive applications could possibly see decreased
performance with compression in rare cases.
2. Having more rows on a page typically increases performance.
3. Commit logic needs to be present since each page should have
more rows and lack of committing could increase lock contention in
batch jobs. Those batch jobs should be sorting their input files in
clustering order and committing properly.

I am getting confused the more I look into the recommended
placement of
Varchar colunms fro DB2 for z/OS V7. In the admin guide, the
recommendation
for read access of all types is to put varchars at the end. This
is
undisputed. When we talk about tables that are updated frequently
the
recommendations change based on what I read.

One company put out Top 20 performance myths and in it one of the
myths is
that Varchars should always be placed at the end of the row. They
argue
that frequently updated columns should be placed after Varchar
columns
because varyling length rows are logged from the first updated byte
to the
end of the row. Thus why log a whole varchar? However the admin
guide
tells a slightly different story:

"For varying-length rows, data is logged from the first changed
byte to the
end of the last updated column. (A varying-length row contains one
or more
varying-length columns.)"

Based on this statement as long as the Varchar is not between
updated
columns it won't get logged.

Question #1: For varying length rows is the update logged from the
first
changed byte to the end of teh row or from teh first changed byte
to the
last upadted column.

Interestingly, the admin guide then goes on to say:

"Keep all frequently updated
columns near the end of the row.
However, if only fixed-length
columns will be updated, keep
those columns close to each other
at the beginning of the row."

Question #2: What is the point of this statement? If I am updating
only
fixed length columns I can keep the varchar at the end. So, why not
simply
say Varchars should always be at the end of the row? Based on the
statemtn
above it's just saying keep updated varchars after non-updated
varchar, or
am I missing something here...

*******************************************************
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.
*******************************************************

Jorg Lueke

Thanks for teh powerpoint. I see that part of the differences comes
from
DB2 V6 to DB2 V7. You do note an exception when logging is used. In
your
power point it states that the entire row from the update back is
logged
when there is a change in row length, an edit proc, or compression.
On the
latter point, the admin guide merely states:

"v Logging effects
If a data row is compressed, all data that is logged because of SQL
changes to
that data is compressed. Thus, you can expect less logging for
insertions and
deletions; the amount of logging for updates varies. Applications
that are
sensitive to log-related resources can experience some benefit with
compressed
data."

It makes no indication in this spot that the whole row would be
logged. Is
this documented elsewhere?

Thanks for teh powerpoint. I see that part of the differences comes
from
DB2 V6 to DB2 V7. You do note an exception when logging is used. In
your
power point it states that the entire row from the update back is
logged
when there is a change in row length, an edit proc, or compression.
On the
latter point, the admin guide merely states:

"v Logging effects
If a data row is compressed, all data that is logged because of SQL
changes to
that data is compressed. Thus, you can expect less logging for
insertions and
deletions; the amount of logging for updates varies. Applications
that are
sensitive to log-related resources can experience some benefit with
compressed
data."

It makes no indication in this spot that the whole row would be
logged. Is
this documented elsewhere?

Thanks

*******************************************************
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.
*******************************************************

Michael Ebert

I *suspect* that DB2 handles logging in this way (it's how I would
do it,
and I don't really have any doubt):

DB2 compares the "before" and "after" images of the row. It
starts
comparing from the start of both rows until it finds the first
difference.
Then it continues scanning and remembers the last difference. The
bytes
between the first and the last difference are logged ("before" and
"after"
parts as well as start position). Now consider:

If the row changes in length, then there will definitely be a
mismatch once
the shorter row ends. Thus the last changed byte would be the end
of the
row. (This could be avoided by scanning from the front of the row
for the
first mismatch, and from the end backwards for the last mismatch,
but
there's a machine instruction for the forward compare if I remember
right,
while a backward compare would not mesh well with how memory is
accessed -
it would probably carry a heavy performance penalty). Also remember
that a
VARCHAR starts with a length byte or halfword, so if you change the
length,
the first difference will be right at the beginning, even if you
only add a
char at the end.
Compression creates variable-length rows, and any update is likely
to
change the length of the row, resulting in the previous case. Also,
because
compression replaces several bytes with fewer bytes, your
difference would
most likely start earlier in the row, possibly even at the start.
This
might get telescoped into "the whole row gets logged", which I
suspect is
not true. Editproc-encoded rows would be a similar case.

Thus you you would get the rule to place updated columns after
non-updated
columns, generally ordering them from "least likely to be updated"
to "most
likely to be updated" at the very end. VARCHARs complicate the
matter,
because they carry a higher logging cost in case of length
change,
depending on how far it is from the end of the row. Thus it might
be more
efficient to place a less often updated VARCHAR after a more often
updated
CHAR. If you want to tune your update logging down to that level,
you need
figures on how likely each column is to be updated (for VARCHARs,
whether
or not the length changes); also you need to know which columns are
likely
to be updated together - a sort of update cross-correlation matrix.
Then
try all permutations of column ordering to see which one would
result in
the least logging (sounds like an NP-problem to me)....

If you have to distill a few docu guidelines from that simple
algorithm
with its complicated consequences, then you're likely to end up
with the
messy rules you're currently trying to figure out.

Thanks for teh powerpoint. I see that part of the differences comes
from
DB2 V6 to DB2 V7. You do note an exception when logging is used. In
your
power point it states that the entire row from the update back is
logged
when there is a change in row length, an edit proc, or compression.
On the
latter point, the admin guide merely states:

"v Logging effects
If a data row is compressed, all data that is logged because of SQL
changes
to
that data is compressed. Thus, you can expect less logging for
insertions
and
deletions; the amount of logging for updates varies. Applications
that are
sensitive to log-related resources can experience some benefit
with
compressed
data."

It makes no indication in this spot that the whole row would be
logged. Is
this documented elsewhere?

Jorg Lueke

Yes, that is the problem with rules, guidelines, processeses and
procedures.
IBM has distilled this in the following way:

"For updates, the amount of data that is logged for update
depends
on whether the row contains all fixed-length columns or not.
For fixed-length rows, changes are logged only from the
beginning
of the first udpated column to the end of the last updated
column.
For varying-length rows, data is logged from the first changed
byte to the end of the last updated column."

If these two statements are true what is the real difference in
logging? I
see the possibility that a Varchar update would need to log less
data since
it starts at the first changed byte rather than the first chnaged
column.

The one thing it seems one should not do is surround a Varchar
field with
two columns that are often updated as then the whole span must be
logged.
But otherwise, even if the row length changes, only the changed
bytes are
logged in either case. The statement that changes are logged to the
end of
the row seems only to have been applicable in V6 and/or prior
versions.

Jorg Lueke

Basically, if all the columns are fixed length, then DB2 will log
the changes
from the first column changed to the last. If an updated
column has it's length change, then, in effect, it's causing a
"push" to
all the subsequent columns which will cause it to log all the data
from
the first changed byte to the end of the row. Does that make it
a
little clearer?

[login to unmask email]

Beware the difference between a "changed column" and a "changed
byte",
especially in the case of COMPRESS NO tablespaces. In this case, a
"row"
consists of a 6-byte prefix concatenated with the column data. If
you
change the length of a VARCHAR column value, this changes the total
length
of the row . . . which is stored in the row prefix. In this case,
the
"first changed byte" of the row is in the Prefix. Logging is then
from
this first changed byte to the end of the changed column.

One can then conclude that, if the length of the row is going to
change on
SQL Update, to reduce logging one wants the updated columns at
the
beginning of the row.

Yes, that is the problem with rules, guidelines, processeses
and
procedures.
IBM has distilled this in the following way:

"For updates, the amount of data that is logged for update
depends
on whether the row contains all fixed-length columns or not.
For fixed-length rows, changes are logged only from the
beginning
of the first udpated column to the end of the last updated
column.
For varying-length rows, data is logged from the first changed
byte to the end of the last updated column."

If these two statements are true what is the real difference in
logging? I
see the possibility that a Varchar update would need to log less
data
since
it starts at the first changed byte rather than the first chnaged
column.

The one thing it seems one should not do is surround a Varchar
field with
two columns that are often updated as then the whole span must be
logged.
But otherwise, even if the row length changes, only the changed
bytes are
logged in either case. The statement that changes are logged to the
end
of
the row seems only to have been applicable in V6 and/or prior
versions.

Jorg Lueke

>Jorg,
>
>Beware the difference between a "changed column" and a "changed
byte",
>especially in the case of COMPRESS NO tablespaces. In this
case, a "row"
>consists of a 6-byte prefix concatenated with the column data.
If you
>change the length of a VARCHAR column value, this changes the
total length
>of the row . . . which is stored in the row prefix. In this
case, the
>"first changed byte" of the row is in the Prefix. Logging is
then from
>this first changed byte to the end of the changed column.
>
>One can then conclude that, if the length of the row is going
to change on
>SQL Update, to reduce logging one wants the updated columns at
the
>beginning of the row.
>
>Lock Lyon
>Compuware Corp
>
Lock,

If that statement is true along with the previous message about a
changed
row length causing a push and logging to the end of the row then it
would
seem that teh whole row would get logged sicne teh first byte is in
the
prefix and the changed length of the row causes the whole row to
be
logged.

More food for thought...

Jorg
>
>
>Jorg Lueke <[login to unmask email]>
>Sent by: DB2 Data Base Discussion List
<[login to unmask email]>
>12/27/2004 10:01 AM
>Please respond to
>DB2 Database Discussion list at IDUG
<[login to unmask email]>
>
>
>To
>[login to unmask email]
>cc
>
>Subject
>Re: [DB2-L] Placement of the Varchar Columns
>
>
>
>
>
>
>Yes, that is the problem with rules, guidelines, processeses
and
>procedures.
> IBM has distilled this in the following way:
>
>"For updates, the amount of data that is logged for update
depends
>on whether the row contains all fixed-length columns or
not.
>For fixed-length rows, changes are logged only from the
beginning
>of the first udpated column to the end of the last updated
column.
>For varying-length rows, data is logged from the first
changed
>byte to the end of the last updated column."
>
>
>If these two statements are true what is the real difference in
logging? I
>see the possibility that a Varchar update would need to log
less data
>since
>it starts at the first changed byte rather than the first
chnaged column.
>
>The one thing it seems one should not do is surround a Varchar
field with
>two columns that are often updated as then the whole span must
be logged.
>But otherwise, even if the row length changes, only the changed
bytes are
>logged in either case. The statement that changes are logged to
the end
>of
>the row seems only to have been applicable in V6 and/or prior
versions.
>
>
>
>
>---------------------------------------------------------------------------
------
>Welcome to the IDUG DB2-L list. To unsubscribe, go to the
archives and
home page at http://www.idugdb2-l.org/archives/db2-l.html.
From that page
select "Join or Leave the list". The IDUG DB2-L FAQ is athttp://www.idugdb2-l.org. The IDUG
List Admins can be reached at DB2-L-
[login to unmask email] Find out the latest on IDUG conferences
athttp://conferences.idug.org/index.cfm
>

Roger Miller

The prefix was handled quite a while ago, so that is not a problem.
As
you noted, what you have is summary statements. Taking a
summary
statement and then presuming there are no exceptions is likely to
cause
some erroneous conclusions.

Roger Miller

On Mon, 27 Dec 2004 18:18:06 -0500, [login to unmask email]
wrote:

>Jorg,
>
>Beware the difference between a "changed column" and a "changed
byte",
>especially in the case of COMPRESS NO tablespaces. In this
case, a "row"
>consists of a 6-byte prefix concatenated with the column data.
If you
>change the length of a VARCHAR column value, this changes the
total length
>of the row . . . which is stored in the row prefix. In this
case, the
>"first changed byte" of the row is in the Prefix. Logging is
then from
>this first changed byte to the end of the changed column.
>
>One can then conclude that, if the length of the row is going
to change on
>SQL Update, to reduce logging one wants the updated columns at
the
>beginning of the row.
>
>Lock Lyon
>Compuware Corp
>
>
>Jorg Lueke <[login to unmask email]>
>Sent by: DB2 Data Base Discussion List
<[login to unmask email]>
>12/27/2004 10:01 AM
>Please respond to
>DB2 Database Discussion list at IDUG
<[login to unmask email]>
>
>
>To
>[login to unmask email]
>cc
>
>Subject
>Re: [DB2-L] Placement of the Varchar Columns
>
>
>Yes, that is the problem with rules, guidelines, processeses
and
>procedures.
> IBM has distilled this in the following way:
>
>"For updates, the amount of data that is logged for update
depends
>on whether the row contains all fixed-length columns or
not.
>For fixed-length rows, changes are logged only from the
beginning
>of the first udpated column to the end of the last updated
column.
>For varying-length rows, data is logged from the first
changed
>byte to the end of the last updated column."
>
>
>If these two statements are true what is the real difference in
logging? I
>see the possibility that a Varchar update would need to log
less data
>since
>it starts at the first changed byte rather than the first
chnaged column.
>
>The one thing it seems one should not do is surround a Varchar
field with
>two columns that are often updated as then the whole span must
be logged.
>But otherwise, even if the row length changes, only the changed
bytes are
>logged in either case. The statement that changes are logged to
the end
>of
>the row seems only to have been applicable in V6 and/or prior
versions.
>