SQL Server 7 technical review

Thursday Jun 25th 1998 by Sharon Dooley

Share:

I spent two days at a technical reviewers briefing on the new version of SQL Server.

First, the official name for the release is now SQL Server 7.0. The code name "Spinx" is no longer being used. In this document, I will summarize what Microsoft presented and give you my impressions of the new release. Please note that I am writing this document from the presentation slides and my notes and recollections. It is possible that I have misheard or misunderstood something. None of this was presented under non-disclosure. I will italicize my impressions so that it is clear that they are not Microsofts information.

This presentation was based on Beta 2. It is possible that there will be changes between now and the actual release of SQL Server 7.0

Many thanks to Sharon for providing these notes - drop her a note at sharond@compuserve.com and tell her thanks!

25 Jan 98

The overall themes for this new release are

Ease of use

Scalability

Data Warehousing

SQL Server 7 will run on both NT and Windows 95/98. There is single source code for the two products. The only differences will be those imposed by the operating system. These include SMP support, asynchronous IO, and integrated security, which are not available on the Windows 9x platform.

Many of the existing components have new names.

SQL Server 6.x

SQL Server 7.0

SQL Executive

SQL Server Agent

SQL Trace

SQL Profiler

ISQL/W

Query Analyzer

Tasks

Jobs

Some of the features previously described as being part of the release are not included. These are

User defined functions

Language independent stored procedures (this means that there is no VBA just Transact-SQL)

Shared nothing clusters

25 Jan 98

There was a panel discussion at the end of the briefing. During this discussion as well as in a couple of other points in the briefing it was clear that SQL Server will become an alternative to the Jet engine for Access. Microsoft did not explicitly confirm this but did not deny it and several presenters mentioned it. Jet is not going away, however.

A new upsizing Wizard for Access will come with the next release of Access, not with SQL Server 6.0.

Many thanks to Sharon for providing these notes - drop her a note at sharond@compuserve.com and tell her thanks!

25 Jan 98

SQL Server 7.0 has dynamic memory management and dynamic disk space management. In the area of memory management, it cooperates with the operating system and takes or gives back memory based on what is happening at any given time. There is a single buffer pool shared by many different DBMSs.

Databases can be set up to autogrow and autoshrink. The example presented was a Sales History table, which starts out at 128 MB at the beginning of the year. Through the year, it grows to 4 GB automatically as data is added. In January of the following year, the history is purged and the table shrinks back to 128 MB. The freed pages are given back to the operating system.

There is an autoupdate facility for the optimizer statistics. This is not a performance drain as it uses a sampling technique rather than scanning the entire table. Paul Flessner stated that this technique is almost instant and produces good information on selectivity and cardinality.

One presenter noted that 20 configuration options are gone. User connections is supposed to be gone by the next beta.

Many thanks to Sharon for providing these notes - drop her a note at sharond@compuserve.com and tell her thanks!

25 Jan 98

SQL Enterprise Manager

The SQL Enterprise Manager is integrated with the Microsoft Management Console and has a whole new "look and feel". You cannot use it to administer 6.x or 4.2 servers. You have to use the old Enterprise Manager.

The Visual Data Tools, which have been shipping with the Enterprise Edition of VB 5 and Visual Studio, are embedded in Enterprise Manager.

Alerting will allow thresholds as well as the current response to error messages.

Jobs can be multi-step and allow conditional execution.

Jobs can be defined once and downloaded to many servers. Once a job is handed off to a server, it has a life of its own and can be scheduled at different frequencies, etc.

SQL Server Profiler

The old SQL Trace has emerged in a new form which is much more powerful. In 6.5, SQL Trace used ODS and picked the queries "off the wire". In 7.0, it is inside the engine. This means that it can see what stored procs are doing, which it couldnt before. It has the ability to playback a series of events. It has a much better filtering capability, though I didnt see the ability to filter on specific SQL statements. It may well still be there. It can organize results by execution time, object, and so forth. It has a nice drill-down interface. You can save information to use as a workload for the Index Tuning Wizard.

Index Tuning Wizard

This takes a workload (set of queries) and figures out what indexes best suit the queries. It interacts with the query optimizer in doing this. It does not deal with concurrent users in this process.

I personally have a great distrust of wizards and am anxious to see if it is as good as I am!

SQL Server Query Analyzer

This is a "new look" ISQL/W which doesnt actually look all that different. It has a graphical SHOWPLAN that is much better than the 6.0 version. There is an "auto pilot" which will suggest indexes for a single query. The editor uses color-coding like other Microsoft products. You can direct the results to a grid, which makes it much easier to read them. You can shrink column widths so that there is more data visible in the results window.

DBCC

The goal is to eliminate the need to run DBCC on a regular basis. There will be new check commands: checkfile, checkconstraint. It will be faster. They are prototyping a repair facility but dont know if it will be in the final release.

DBCC is using a patented single pass algorithm to scan the database.

bcp

Supports both TDS 4.2 (DBLIB/Legacy) and TDS 7.0 clients. Supports full index maintenance, constraints and triggers. The load/unload performance is reported to be twice as fast as 6.5. It does a parallel load into a table that has no indexes.

Many thanks to Sharon for providing these notes - drop her a note at sharond@compuserve.com and tell her thanks!

25 Jan 98

Data Transformation Services (DTS)

In many ways, this was the coolest thing I saw at the briefing. It was hidden away in the data warehousing presentation but has uses far beyond that. I was totally unclear about this features function from a limited scan of the beta documentation.

The purpose of DTS is to make it easier to import, export and transform heterogeneous data. It provides for data lineage if desired. It supports transformations between the source and target data sources (which need not be SQL Server). These transformations include such things as

It has the concept of a package, which is a series of tasks to be performed as a part of a transformation. This provides a simple "workflow".

It has its own engine (In process COM Server) and can be used independent of SQL Server.

It allows you to write scripts (for each column) in VBScript, and JavaScript. Perl is supposed to be added as a scripting language.

The demo I saw quickly moved a database from Oracle 8 and an Excel spreadsheet to SQL Server.

This is clearly a super tool for conversion and massaging of data.

Internals

Microsoft has made a distinction between the Query Processor ("Relational Engine") and the Storage Engine. They are distinct subsystems. The overall design of both is controlled by Gertz Grefer (I am not sure of the spelling). He comes from Tandem. He did not present but did participate in the final panel discussion.

The query processor (QP) has been completely redesigned. Goals are to provide excellent decision support performance, support for very large databases, and to support heterogeneous query and update. The following picture illustrates the ultimate goal. Note that the SQL is pseudo code. Initially there will only be support for SQL Server as a database engine.

Many thanks to Sharon for providing these notes - drop her a note at sharond@compuserve.com and tell her thanks!

25 Jan 98

Bulk load processing is also a part of the query engine and has greatly improved performance.

There are new query processing techniques. These include

Multi-Index operations

Merge join

Hash join

Hash aggregation

Single index processing is still used and is still an important technique.

Multi-index processing allows several indexes to be used to resolve one query, even on a single table. The example provided was an order table. The query is "SELECT * FROM orders WHERE cust_id = 987 and order value >= 10000. With an index on cust_id and an index on order_value, the QP will search the cust_id index for rows that match 987. Then it will search the order_value index for rows that are >= 10000. It will intersect the two sets of row ids to get the desired result set for the query. It also uses this technique to create the equivalent of a covering index from existing indexes which themselves do not cover the query. It also uses this for an "on the fly" join index.

The merge join in the new release is not the "pseudo merge-join" of 6.5. It is used when inputs are sorted, typically when the user wants the results in the same order as the indexes. Processing strategy is

Get row from outer table

Get next row from inner table with same key

If found, output and loop on inner table

If not found, loop on outer table

In this pseudo code, it looks like the current nested iteration, but it is done in a single pass rather than in multiple passes so it will be faster.

Hash joins are used when there are not ordered inputs:

No appropriate indexes

No requirement for order of output

This method is highly applicable to ad hoc queries where indexes cant be anticipated. The basic algorithm is

Read the smaller table. Hash the key value and put key and record id into a bucket. Repeat for all the rows in the smaller table.

Read the larger table. Hash the key value and see if it is in the hash. If so, output key and both record ids. Repeat for all the rows in the larger table.

Many thanks to Sharon for providing these notes - drop her a note at sharond@compuserve.com and tell her thanks!

25 Jan 98

Nested iteration is still used, particularly for non-equality joins that dont lend themselves to the other methods.

Hash Aggregation is used for SUM, etc. In this technique, the algorithm is

Read a row from the input table

Hash the key value to a bucket; if key is already there, compute aggregate. If not, put it in a bucket.

Loop through input

Output key values and computed aggregates

Note: this produces unordered output. This means that you can no longer depend on GROUP BY to produce ordered output with out an ORDER BY clause. (Ive been trying to tell you all for years that you should, by ANSI standard, use the ORDER BY!) There is a mechanism in place so that when you upgrade a 6.5 database you will get ordered results.

There are changes in the I/O strategies as well. First of all, pages are now 8K and extents are 64KB (see the Storage Architecture section for more details). During table scans, I/Os are also 64KB. Unordered scans can do parallel reads from files. Read-ahead works on heaps as well as the clustered index. The QP drives the read-ahead.

Bulk update processing has been improved. The QP does the index maintenance. On insert, update or delete, changes are sorted in index order and applied in a single pass over each index. Constraint processing is integrated into bulk updating as well. This technique is used in DBCC and bcp. Microsoft reports that a typical "data massaging" application ran 10 times faster.

The query processor understands partitioning solutions. This means that if you were to define tables to hold monthly data (Jan, Feb, Mar, etc.) and combine them with a UNIONING view, then just request one month from that view, it will only search that table. It appears that you need to define a check constraint limiting the month table to a particular month for it to be able to do this. SQL Server does not support partitioning by value, that is, spreading a table across several different storage areas based on key values as is done in DB2, for example.

Query optimization is still cost-based. I/O is still the dominant factor.

There is a lot of support for parallel operations on SMP machines. This is automatic, you dont have to use any configuration or hinting directives. The degree of parallelism will be adjusted down on loaded systems or systems with insufficient memory. For a single query, the optimizer runs on one cpu. However, multiple queries can be optimized simultaneously if there are multiple cpus.

There is support for distributed queries. This support uses OLE DB. It accesses SQL Server directly. With the Jet OLE DB provider it can access Jet files, Excel, FoxPro and dbase. With the OLE Provider for ODBC, it can access Sybase, Informix, DB2, etc. Microsoft has written an OLE DB provider for Oracle which it uses to access Oracle 7 & 8. There is also interaction with Active Directory Services (NT 5) and Index Server. It uses something called a "Linked Server" for these. I am not clear on the difference between a Linked Server and the current Remote Server concept. It is also possible to use something called the OpenRowset function to retrieve data from an OLE DB data source without making it a linked server. [Interaction with Index Server was mentioned a few times at the briefing. I got the impression that there was some integration between Index Server and text data types but this was not clearly stated. Perhaps my total lack of knowledge of Index Server interfered with my comprehension.]

Many thanks to Sharon for providing these notes - drop her a note at sharond@compuserve.com and tell her thanks!

25 Jan 98

Statistics handling is completely reworked. The optimizer can see that the statistics are getting out of date and will recompile them using the sampling technique I mentioned earlier. It is possible to compute statistics on a non-indexed column. The most common reason for doing this would be for composite key indexes. The statistics gathering is still only on the high-order (left-most) column of a composite key.

There are some significant changes to SQL Processing. First of all, there is no longer a separate procedure cache. The usage pattern dictates which pages are used for data and which for procedures. Plans are now shared only one copy is needed for many users. [I did not get a chance to ask what happens if the plans for two different users should be different based on input parameters or whatever.] There is a scheduler that can grant lots of memory to big queries. Internally the QP uses a PREPARE and EXECUTE strategy and caches SQL Statements. This is a big improvement over the current implementation of this feature in ODBC queries that result in filling up tempdb with temporary stored procs. There is a cache for ad hoc SQL. There can be some reuse of statements even if there isnt an exact match (parameter values, for example). This is expected to give a performance boost to ADO, RDO, and DAO apps that dont use stored procs.

Storage Architecture

The goal has been to completely separate the relational engine (query processor) from the storage engine and to have the QP communicate strictly through the storage engines OLE DB layer. This has not been perfectly achieved; there are still some calls to the storage engine directly from the QP.

Many thanks to Sharon for providing these notes - drop her a note at sharond@compuserve.com and tell her thanks!

25 Jan 98

Disk storage

As previously mentioned, the page size is now 8K. Note that, due to OS file limitations, the I/O is not atomic; a page is written in 512KB chunks. There is a mechanism for detecting "torn" pages. Pages are managed with a slot array pointing to the records on the page (similar to the way pages containing variable length records are currently managed.)

Devices are gone. Databases live on files. Files can be organized into File Groups. Database objects can be placed on any file group (similar to old segments). It is no longer possible to have the transaction log in the same file as the data. You cannot have multiple databases in the same file. Database files can grow and shrink automatically.

File groups provide the ability to distribute data over different physical disk drives and support a future key based file partitioning.

Some extents will now contain pages from different objects. Mostly this is used for small tables.

The storage of text/image data is different. Instead of one pointer to a linked list, there is a b-tree structure that points to the fragments of text/image data. Small text objects share pages.

Every row has a unique RowID. If there is no clustered index, there is an assigned rowid which does not change even if the row moves. If there is a clustering index, the rowid is the clustering key. An uniquifier is added if the clustered index is non-unique.

Secondary indexes use the rowid or clustering key to point to the rows. This means that, for a secondary index, it will first look up the row id. If the underlying table is a heap (no clustered index), it will use the row id. If the underlying table has a clustered index, it will use the clustered index to locate the record.

There are no bit mapped indexes.

The Max row len is 8060.

Many thanks to Sharon for providing these notes - drop her a note at sharond@compuserve.com and tell her thanks!

25 Jan 98

Memory management

Memory is allocated dynamically. It is not possible to fix portions of memory for specific databases. The SQL Server process space does not change size. Only the buffer pool will grow and shrink.

Buffer management is being done by a clock algorithm rather than by a least recently used algorithm. This is reported to improve performance and increase scalability.

I/O

The scanning thread directs Read Ahead. There are no configuration parameters. The absence of separate read-ahead threads reduces context switching.

Index reads can be issued from intermediate nodes rather than needing to go to the leaf level in all cases.

Concurrency control

There is full row-level locking. It is not necessary to configure the number of locks. The lock escalation configuration options are gone.

Key range locking is used in indexes. This means that there is more concurrent update and also reduces the amount of locks required for page splits.

The lock granularity is determined at run time based on the lowest cost strategy. This is based on input from the Query Processor. Several different lock granularities may be used for a single query, for example, there may be page locks for index scans and row locks for reading from the table.

Many thanks to Sharon for providing these notes - drop her a note at sharond@compuserve.com and tell her thanks!

25 Jan 98

Logging and recovery

The transaction log is no longer a "table" consisting of a linked list of pages. It does not compete with data pages in the buffer space. Instead, it is a set of logical files containing log rows. It will grow automatically. Truncation is quick.

Logical log files can be backed up as long as they dont contain active pages.

Backup can still be done while users are on-line. Microsoft reports minimal degradation (5%) in system throughput when this is done.

Restore automatically creates a database.

There is the capacity for incremental backups (all pages changed since last full backup). This will speed recovery. Log backups are still supported. [I have not figured out what the tradeoffs are here.] Files and file groups can be backed up so that one can recover from media failure or isolated corruption. This is primarily for very large databases with a tight backup window.

New Transact-SQL and Other Developer Features

Unicode

Unicode is supported throughout SQL Server. There are several new Unicode datatypes:

NCHAR, NVARCHAR up to 4000 characters

NTEXT Unicode blob

Unicode object names permit up to 128 characters.

Many thanks to Sharon for providing these notes - drop her a note at sharond@compuserve.com and tell her thanks!

25 Jan 98

Datatypes

In addition to the Unicode datatypes, SQL Server 7.0 has

8000 byte CHAR/VARCHAR

8000 byte BINARY/VARBINARY

UNIQUEIDENTIFIER 16 byte GUID

Bits can now contain nulls. Fixed length fields that permit nulls are now still fixed length fields rather than turning magically into variable length as they did before.

[Note the AS in the CREATE TABLE. Dont know if that is an addition or a mistake.]

DDL and Schema changes

There can be 1024 columns per table and currently 32 tables per query. The latter is expected to increase to at least 128 by the final release.

It is possible to drop columns with the Alter table statement. It is possible to change the datatype of an unindexed column as long as it is a legal conversion. There is an ALTER TRIGGER, VIEW, Procedure statement which allows you to change the definition of these without losing the existing permissions as you do when you drop and recreate them.

You can have multiple triggers per update action. The order of execution of these is undefined. If you create a trigger with the same name as an existing trigger, it replaces the trigger. If it has a different name, it is added. Trigger recursion (if a trigger performs an operation on its own table, the trigger fires again) is now supported with a database option [too high a level in my opinion; should be at a table level.]

Security

Groups are gone. Roles are here. Users get placed in roles. The model isnt quite consistent with NT groups. There are predefined roles and you can define your own. Its much more flexible than the old group scheme.

Cursors

Cursors are now scoped and can be declared as local or global. They can be passed as output parameters from a stored procedure. There is a cursor variable in T-SQL which allows an SP to create a local cursor and pass it around.

Schema info

There is support for ANSI/ISO Information Schema

System functions are still supported. New system functions include ObjectProperty, ColumnProperty, TypeProperty, IsMember, and Permissions

Delayed name resolution

This allows you to, for example, create a table and immediately refer to it in a stored procedure.

DML

Theres a new TOP clause for the SELECT statement. Returns top n or n% rows. Can be used in subqueries. Examples:

SELECT TOP 10 emp_name FROM employees ORDER BY emp_salary DESC

UPDATE employees SET salary = salary * 1.10 FROM (SELECT TOP 10 WITH TIES * FROM employee ORDER BY rating) AS e WHERE employees.emp_id = e.emp_id

[There are some subtle changes to the UPDATE statement buried in this example.]

You can do a substring on Text or Image data.

You can get real empty strings (optional) not a single space.

Cursor status functions (no further details)

Datetime addition and subtraction (no further details)

STDEV/STDEVP/VAR/VARP functions (no further details)

CAST (no further details)

"[ ]" identifier quoting (no further details)

OSQL

Like isql but uses ODBC instead of DBLIB.

Many thanks to Sharon for providing these notes - drop her a note at sharond@compuserve.com and tell her thanks!

25 Jan 98

ADO: routine access to data. Seen as vehicle of choice for VB developers.

OLE DB: lower level COM-based access to data. Used by provider developers. Performance comparable to ODBC. Adds an OPENROWSET function to be used in FROM clause. This is supported in views and most users will create views so that the SELECT statements dont get littered with OPENROWSET.

ODBC: Not going away. "Going to be around for a long time". Has new 7.0 data types. There is a DSN wizard and password encryption. ODBC round-trips are minimized for Internet access.

DBLIB: not improving. No access to 7.0 features.

DMO: The new DMO is not backward compatible with the old DMO. It is a different OLE Object. It accesses the database through ODBC, not through DBLIB as the 6.x version does.

Many thanks to Sharon for providing these notes - drop her a note at sharond@compuserve.com and tell her thanks!

25 Jan 98

Upgrading

Upgrades must be from 6.x. 4.2 must be upgraded to 6.x first.

Can be done on one machine or onto a different machine.

Replication allows all combinations of 6.0, 6.5 and 7.0 to work together. Servers involved in replication can be upgraded in any order but a separate distribution server must be upgraded first.

The upgrade is managed with a wizard.

Upgrade speeds (using a server-server pipeline, not tape, which is also supported):

1G -- < 1 hour

10 G -- < 4 hours

50 G -- < 12 hours

100G -- < 24 hours

Microsoft reports that upgrading their 80GB SAP server took 14.5 hours. If you use tape, it will take roughly twice as long.

A compatibility mode will be automatically set for 6.x databases. This will

Prevent keyword conflicts

Have an implicit ORDER BY for GROUP BY

Provide 6.x treatment of empty strings

Replicate an obscure bug in the UPDATE statement

Most upgrade problems to date have been

Applications which made direct access to system tables 17%

Problems already present in user objects which were manifest when the object was upgraded 6%

No other problem has appeared in more than 1% of the tests

Many thanks to Sharon for providing these notes - drop her a note at sharond@compuserve.com and tell her thanks!

The data warehousing efforts use the Microsoft Repository. This will support a variety of information models (UML, CDE, etc.) This is an extension of the repository that comes with Visual Studio. It has been extended to support database schema, transformations and OLAP.

The OLAP server is code-named "Plato". [In this writeup I am assuming knowledge of multi-dimensional processing] It supports the different types of storage used for multi-dimensional processing of data: ROLAP (data stays in relational data store), MOLAP (data is converted to multi-dimensional data store) and HOLAP (combination of ROLAP/MOLAP).

It has lots of wizards. One of these determines when pre-aggregation of data is going to have a payoff.

Plato security will be integrated with SQL Server in Beta 3. Plato is integrated with office and back office.

There are some third party clients already for Plato: Arbor, Knosys and Panorama.

OLE DB For OLAP extensions support this kind of processing. There is a new object called dataset which is the result of querying a cube or set of cubes.

There is a subset of Plato called Data Cube Server which is an in process desktop multi-dimensional component.

Many thanks to Sharon for providing these notes - drop her a note at sharond@compuserve.com and tell her thanks!