Featured Database Articles

Leveraging the Information_schema

By Neil Boyle

A while back, I
received a call from a colleague who was testing out a third party database and
client application. She had a simple question - she wanted to know all of the
tables and columns in the database that contained a specific string. My
colleague did not want the names of the tables or columns containing the
string; she needed to know where, in any row of data, the string appeared.
Imagine looking for the word "Smith" anywhere in the PUBS database:

An author, or an employee, might be called "Fred Smith"

A bookshop or a publisher might have the word "Smith" in their name

A book might be titled "History of the Blacksmith"

Anybody's address might be on "Smith Street"

It might seem a
strange query, but I knew why she was asking. A software house had supplied the
database and the client, but no documentation to say how data in the database
mapped onto what the end user saw on screen. My colleague had the client app,
and an ODBC connection into SQL Server so she could view the data, but she
needed this question answered in order to make sense of the results she was
getting while testing.

Now the obvious
question here is "Didn't the supplier provide this info in the documentation?"

Well, no they did
not.

It's a big database,
with a lot of tables, and many columns, and no documentation. The job of
testing the application was not an enviable one, the task of checking every
single character column, in every single user table in the database, to look
for a particular string even less so. Fortunately - it's not necessary.

One thing that a
relational database does, or should do, is describe itself. SQL Server does.

This description,
beginning with SQL Server 7, is called the "Information Schema." Old hands like
me sometimes still use outmoded terms like "system tables."

The Information
Schema is a collection of tables in every database that describes the layout of
the database (even describing the Information Schema itself) and one of these
tables in particular was going to help turn this nasty little job into a very
easy one.

Try running the
command "select * from
information_schema.columns" in
any database, a list of every column in every table in the database will be
returned. Even information_schema.columns themselves will be included. For the
sake of an example, I am going to search for the string "Smith" in the PUBS
database.

I know I am
looking for a character string, so I can eliminate all the assorted columns in
the database that contain binary or numeric information. There is no Text data
type in pubs; it's only necessary to search the CHAR and VARCHAR datatypes,
along with their UNICODE variants NCHAR and NVARCHAR. There is also no need to
search the information_schema itself, as it contains no user data. A
simple query against information_schema.columns will narrow down the search to a specific
set of tables and columns:

Next, write a select statement for each table and column combination, to see
which one contains the word "Smith." Of course, I don't intend to type out all
of those select statements myself: A simple bit of dynamic SQL will do the job
for me:

Wrapping the text
of the "Select" statement inside the previously written SQL statement causes
SQL Server to write the bulk of the SQL. All that remains to be done is cut
& paste the results of the query above into a new Query window and the dynamically generated
SQL will search the database for any "Smith" that's in there.