The Dictionary in the Data Dictionary

September 25, 2007

Todays journey takes on a
trip into Oracles data dictionary, specifically, the dictionary part or aspect
of it. Its interesting and informative to take a peak under the covers and see
how Oracle, (the product, that is), is put together. Aside from trying to
reverse engineer things, looking at the underlying components presents an If
Oracle did it this way, maybe I can too learning paradigm. For example,
normalization is a huge part of database design, so its fair to ask (or look,
in this case) how well the SYS tables are normalized. Much, and practically all
for that matter, of what you query from or against in the data dictionary is
based on views, and as we all know, views are usually based on tables. What is
in those tables and how does it all fit together?

DICTIONARY and DICT

The DICTIONARY view contains
metadata about other data dictionary items. Most people familiar with Oracles
data dictionary will say that the data dictionary is based entirely on views on
tables owned by SYS. Well, yes, but to be more precise (using the DICTIONARY
view in this case), thats not entirely correct. Instead of being a view, to
you as a member of PUBLIC, DICTIONARY from your perspective is a synonym. This
synonym points to the SYS-owned view named DICTIONARY, and by the order in
which Oracle resolves
an object name, a public synonym is evaluated before a table or view. If
you were to query against SYS.DICTIONARY, you would be accessing the view named
DICTIONARY owned by SYS.

Closely related to
DICTIONARY is a public synonym named DICT. This synonym is nothing more than a
shortcut name for DICTIONARY. Can you think of another shortcut name of a view
in the data dictionary? The TAB synonym is a pointer to part of what is in the
DBA/ALL/USER_TABLES views. So DICT or DICTIONARY, either one suffices as both
ultimately are derived from several SYS-owned tables.

The columns of DICTIONARY
are table_name and comments. In Oracle 10g R2, there are 659 records, of which,
interestingly enough, 91 have no comments. What good is a dictionary without
words? Almost all table name-only records are from the DBA/ALL/USER family
(just an observation, no real reason why theyre from this category).

Where do the comments come
from? This is where we start looking at the underlying tables. The source code
behind this view can be found in the catalog.sql script. The top part of the
create view statement is shown below.

remark
remark VIEW "DICTIONARY"
remark Online documentation for data dictionary tables and views.
remark This view exists outside of the family schema.
remark
/* Find the names of public synonyms for views owned by SYS that
have names different from the synonym name. This allows the user
to see the short-hand synonyms we have created.
*/
create or replace view DICTIONARY
(TABLE_NAME, COMMENTS)
as
select o.name, c.comment$
from sys.obj$ o, sys.com$ c
where o.obj# = c.obj#(+)
and c.col# is null
and o.owner# = 0
and o.type# = 4
...continues...

The OBJ$ table (owned by
SYS) is like the uber table of the data dictionary. Every object has an ID
number, an owner number, and a type number (among other attributes). If OBJ#
were a primary key, we would expect to see primary key type of information
about that column. As it turns out, OBJ# is not the primary key per a this
column is defined to the primary key for this table. What does a primary key
have in terms of other constraints? OBJ# does have a unique index and a not
null check constraint, so that must be close enough from Oracles perspective.

The OBJ$ table also has
another interesting index related characteristic, and that is the index named
I_OBJ2. This index uses six columns, which is fairly rare in terms of the
number of columns involved. The vast majority of indexes I see are limited to
one, two or three columns, with the exception of some data warehouse indexes
with five or six columns. All of the objects users create must be registered in
this table, so this is a table that will continue to grow. How and when are
statistics created for the table? That depends on the version of Oracle youre
using. From older to newer, the advice or guidance has ranged from dont do it
to have at it.

The difference between TYPE#
and OWNER# is that the object types are known ahead of time. Oracle only has so
many types, so that information is static, whereas owners (aside from the
internal or fixed type users of Oracle) come and go. With this difference in
mind, wed expect to see a lookup table that relates a textual name versus an
owner number. The USER$ table serves this function.

Is there a lookup table for
types? We see what the types are in object tables (the object_type column), but
internally, at least as far as the label versus ID number, Oracle does not have
this. So where does the type name come from? The DECODE of type as seen in the
view creation statements for the _OBJECTS views is what produces the text
labels we see.

Going back to the question
posed earlier about where the comments come from, we can see that a table name
COM$ is referenced in the create dictionary view statement. The COM$ table
consists of OBJ#, COL# and a COMMENT$ columns. In a higher level
normalized form, wed expect to see a table with OBJ#, COL# and a
comment number or ID, along with a table that then matches the comment number
versus text string or description.

The screenshot below shows
how Oracle (at least this table, but there are others like it) did not go to
third normal form. How many places does a string like A date function used to
compute the NEXT_DATE occur? Another string, Name of the object, appears 85
times. Doesnt that clearly violate what every book on database design and
normalization talks about?

Hacking the Data Dictionary

Perhaps youd be interested
in hacking the data dictionary and adding your own comments into the COM$
table. I wouldnt recommend doing that (who would?), so if you wanted to fill
in the missing comments, it would be better to create another view that used a
support table to supply that information (and add your own). If you were going
to do it (using DUAL as an example), find the object number for DUAL and then
use that value for the insert into the COM$ table.

Note that the DUAL table
appears more than once, so be sure to tag the one owned by SYS (the owner
number for SYS is zero). When inserting into the COM$ table, dont include a COL#. The
statement that creates the DICTIONARY view matches on where the COL# number
is null. The @dict script (mine) just selects from DICTIONARY where the table
name is equal to what was entered.

The views with missing
comments are relatively unimportant, so performing this hack using existing
objects is certainly not essential. As for adding new lines to the DICTIONARY
view via inserts into the COM$ table, as long as youre using existing object
numbers, youre probably safe. Keep track of what youve inserted because if
the data dictionary were to be rebuilt, your work is likely to get
overwritten/deleted.

In Closing

Overall, it looks like the data
dictionary, specifically, the SYS schema, is a lot like schemas you encounter
in the workplace, that is, some objects were created following the rules, and
some were not. Are queries against the data dictionary faster because there
arent lookup tables? Or, is the lack of a lookup table for comments more of a
convenience for the engineers who work on the SYS schema tables? Regardless, at
least now you can reverse engineer how the dictionary was put together.