Search This Blog & Web

Thursday, May 31, 2012

XML datatype first introduce in SQL SERVER 2005 as a column and it has a lots of benefits and implementation in today's environment. There are some restrictions in defining xml as a data type but still it has a huge benefits for large objects and complex structures like student information history, patient history in hospital management, logical comparison of properties in Property based database and CV detail of employees on any CV site. All of above have huge data to store but very little to update only thing required is to compare and return.

I have assigned task to optimize a table for its DML and
reading operation and this table has XML column in int. I am not a beginner in
xml implementation but I have assigned task to optimized it first time in last
5 years so this was quite a good task for me I have searched for it and
complete this task by improving around 75% time. During this process I have
learned many new things and I want to share it from start till end step by step.

I have gathered this information by performing operations on
my environment and also using following links

Tuesday, May 29, 2012

I have gone through some difficult experience to compare strings much time during database design and Querying and think about we need to get some different solution to perform same operation. Many times there are requirements to create Varchar field as Primary column like Account Number in Chart of Account table in Accounts software or Comments column in any table with varchar(max) or (100) etc.

Problem

When we try to join string based column or use in our where clause even to find a single record with like or equals clause it increase our query cost. I came across same scenario in recent time to compare a string based column to just check whether it equals to parameter string value or not and it took 76% of total query cost. SQL SERVER Optimizer hints to create a cover index using Primary Key plus this column which improve query performance by at least 50% but at the same time I came across Hash Key function that satisfied my needs, storage and optimization. So I am going to discuss what is Hash key and how can we implement it using SQL SERVER 2008 and above. I have gathered information from many sources and msdn is one of them.

What is Hash Function

Hash is the value (int or varbinary) result of an algorithm (known as a hash function) applied to a given string. You just need to provide your string as input and you will get a unique hash value as an output. If we provide a complete page string to this function and then change just a character to pass value to it. It will return different values. There is small possibility to repeat same value and this will known as hash collision.

Where can I use Hash Function?

·Security implementation as encryption of string data.

·Reduce network traffic for cross DB queries because of small size required to compare instead of string values.

·Less space requires to store and campare values because it returns int and binary values instead of string characters.

·Performance increase due to small db types and indexes uses if we implement index on it.

·If we create hash column in our DB design along with string columns then it is easy for us to implement joins on these column that will act like string joins as these are hash values of string values.

Careful using Hash Function

·Hash collision may return more than your expected result set by returning same hash value. For this reason, CheckSum might return more values then you expected but if you want to use this column to identify your column changes then consider HashBytes. When an MD5 hash algorithm is specified, the probability of HashBytes returning the same result for two different inputs is much lower than that of CHECKSUM.

·

Types of Hash function

SQL SERVER provides multiple hash functions like CheckSum, CheckSum_Binary, HashBytes etc. I am going to put example against CheckSum and HashBytes to just shows the function of hash values in logic.

Hash Keys in Database Design

You can efficiently design your table to use Hash key columns along with regular string column but it depends on your string column requirements in queries. Like in following table design I have created a varchar(1000) column and store 2 row in it. When we try to compare run query on this table to some value from String1 column it will compare as string and consume 100s of bytes of space and resources as well.

===================== Starts =======================

--- drop table if already exists

drop table Hashfunction

go

create table Hashfunction

(id int identity primary key,

string1 varchar(1000),

checksumCol as CHECKSUM(STRING1) ,

HashByteCol as HashBytes('SHA1',string1) )

Insert into Hashfunction(string1)

values ('Returns the checksum value computed over a row of a table, or over a list of expressions. CHECKSUM is intended for use in building hash indexes.'),

('Returns the MD2, MD4, MD5, SHA, or SHA1 hash of its input.')

===================== Code Ends =======================

If you look at the CheckSumCol and HashByteCol. I have added two computed columns to add hash key values and if you look at their parameter passed is String1 column it means CheckSum and HashByte column will return hash values against String values for both rows as shown in following picture

CHECKSUM

Returns the checksum value computed over a row of a table, or over a list of expressions. Output of checksum value is integer. CHECKSUM is intended for use in building hash indexes. You can create Checksum value for more than one column like

select CHECKSUM('this is my test','this is checksum test')

returns -506357463

CHECKSUM returns an error if any column is of noncomparable data type. Noncomparable data types are text, ntext, image, XML, and cursor, and also sql_variant with any one of the preceding types as its base type.

CHECKSUM satisfies the properties of a hash function: CHECKSUM applied over any two lists of expressions returns the same value if the corresponding elements of the two lists have the same type and are equal when compared using the equals (=) operator. If one of the values in the expression list changes, the checksum of the list also generally changes. The order of expressions affects the resultant value of CHECKSUM.

The CHECKSUM value is dependent upon the collation. The same value stored with a different collation will return a different CHECKSUM value.

Hash Indexes

CHECKSUM computes a hash value, called the checksum, over its list of columns or arguments. The hash value is intended for use in building hash indexes. If the arguments to CHECKSUM are columns, and an index is built over the computed CHECKSUM value, the result is a hash index. This can be used for equality searches over the columns.

---- Hash Indexe on CheckSum int Column

CREATE INDEX Hashfunction_index ON Hashfunction (CheckSumCol);

The checksum index can be used as a hash index, particularly to improve indexing speed when the column to be indexed is a long character column. The checksum index can be used for equality searches.

--- Comparision2 as Int

select * from Hashfunction where CheckSumCol = CHECKSUM('Returns the checksum value computed over a row of a table, or over a list of expressions. CHECKSUM is intended for use in building hash indexes.')

Creating the index on the computed column materializes the checksum column, and any changes to the column value will effect to the computed column. Alternatively, an index could be built directly on the column indexed.

In above created table you can see HashBytes computed column in Hashfunction table. You can use it as following

--- Comparision3 as VarBinary

select * from Hashfunction where HashByteCol = HASHBYTES('SHA1','Returns the checksum value computed over a row of a table, or over a list of expressions. CHECKSUM is intended for use in building hash indexes.')

Allowed input values are limited to 8000 bytes. If you pass values more than 8000 values it will generate hash value only for first 8000 characters like

Monday, May 28, 2012

One of my friends finds difficulty to get detail about Advance features in SQL SERVER 2008 Express Advance features and wants to know weather SSRS supported in it or not.
Look at the detail of features in SQL SERVER 2008 Express with Advance Features

·On-demand report
processing for each user who views a report. When a user opens a report, the
report is initialized, the query is processed, data is merged into the report
layout, and the report is rendered into a presentation format.

·Rendering formats are
available for HTML, Acrobat, and Excel.

·Report data sources must
be SQL Server relational databases that run locally in SQL Server Express.

Other editions of SQL
Server include a larger set of Reporting Services features. The following list
describes the features that are documented in SQL Server Books Online, but
cannot be used in this edition:

The following table
lists the additional SQL Server 2008 database features that are not supported
in this version of SQL Server Express. It also lists database features from
earlier versions of SQL Server that are not supported.

SQL Server features
not supported in SQL Server Express

SQL Server
features from earlier versions not supported in SQL Server Express

Friday, May 25, 2012

While implementing Database mirroring a major problem with the database sizes are to handle log file size of a database.

Reasons of Log Size Increases

1- As it is necessary to set recovery modal of a database as "FULL" all the transactions and DML operations are stored into log file before data file write. With the passage of time it will increase Size of your log file. Log file size can increase for multiple reasons we has discussed one lets discuss more reasons

2- While implementing database mirroring when fail over occurs or mirror database is unavailable due to network connections or server downtime. Log file holds all pending transactions until mirror will be available again and data will commit to the mirror. Due to this pending process Log size increase in a huge size in very low time.

3- If drive space will low to hold database files on mirror server then principal server database is unable to send and commit data on mirror server. In return it hold up all the structural and DML changes until mirror is available.

Shrinking Log File

When any of the above reason occurs you need to apply different solutions to overcome this problem. Lets discuss how can we avoid from such conditions specially when database mirroring is in place

1- Plan a full backup of your database for backup purpose as per your plan.

2 - Plan transactional log backup of your database after some time within Full backup.

In this way your Log file size will be in control because every time transactional backup took place it will remove all the committed transactions from the log file and write to backup media. this is how it decreases the Log file size in the transaction. for more detail about backups have a look at this link

In above conditions you need to monitor very carefully that Jobs implementation is not on regular bases (this will increase fragmentation and effect performance) and ran successfully and you have mentioned that release space must return to the OS. You can create Shrink Log Job through Management Studio from this path... but this will shrink your database when job successfully executed otherwise log size will grows.

Monday, May 21, 2012

Some time database enters into restoring mode like in database mirroring condition and you need to change it to normal mode to restore latest backup or delete mirror db. Changing database from Restore to Normal is a critical update because all your uncommitted data from log file will be truncated.

Could not write a checkpoint record in database ID 17 because the log is out of space. Contact the database administrator to truncate the log or allocate more space to the database log files.RESTORE DATABASE successfully processed 0 pages in 12.209 seconds (0.000 MB/sec).

I have found a very good question discussion about how to find database physical size and table physical size using SQL Query. This query also returns number of rows and data size for each table in MB and GBs.

==================== Query ===============================

-- Script to analyze table space usage using the
-- output from the sp_spaceused stored procedure
-- Works with SQL 7.0, 2000, and 2005, 2008, R2

select[FileSizeMB]=convert(numeric(10,2),sum(round(a.size/128.,2))),
[UsedSpaceMB]=convert(numeric(10,2),sum(round(fileproperty( a.name,'SpaceUsed')/128.,2))) ,
[UnusedSpaceMB]=convert(numeric(10,2),sum(round((a.size-fileproperty( a.name,'SpaceUsed'))/128.,2))) ,[Type] =case when a.groupid is null then '' when a.groupid = 0 then 'Log' else 'Data' end,[DBFileName]= isnull(a.name,'*** Total for all files ***')
fromsysfiles a
group bygroupid,a.namewith rollup
havinga.groupid is null ora.name is not null
order bycase when a.groupid is null then 99 when a.groupid = 0 then 0 else 1 end,a.groupid,case when a.name is null then 99 else 0 end,a.name