Yukon-Katmai Discussion

Search This Blog

Wednesday, January 18, 2012

I am sure those of you who have been through the replication latency issues know that these issues are not easy to crack .Gaining expertise in Replication especially issues related to latency cannot be learned or taught in a class.

-I am sharing this one with you where actually there was no latency due to any of these as I
mentioned above.
-There was no blocking
-CPU,memory and Diskes were doing good .
-There was no network issue
-We shrunk the publisher database which did not help us at all .
-Cleanup job was running fine .We ran update stats with full scan on msrepl_transactions and
msrepl_comands which did not help either .

When latency started piling up , we first had a look at the log reader and distributor agent history in distribution database but could not get much information due to another issue (out of scope of this post) because of which the log reader history was not showing up and distributor history was showing that almost no data is coming to subscriber ..Normally every 5 mins the throughput of logreader and distribution agent threads (reader and writer threads) are written in to mslogreader_history and msdistribution_history system tables .But we were getting false entries there .

We had no choice but to take the verbose log for distribution and log reader agents .From the logs it was clear that the log reader agent was delivering more than 1000 commands/sec while distribution agent was delivering less than 100 cmds/sec .

We were totally clue less because :
- Log Reader was running at its usual speed .So there was no issue with the Publisher .Nor there was any issue in pumping the data to Distributor.The log Reader Verbose log cleared it .So this was ruled out .

- Data from Distributor to Subscriber was slow .But we were not able to find out whether its slow in reading the data from distribution db or slow in pumping the data to the subscriber .db .What we could clearly see was that there was almost no activity on the subscriber .

-No Resource bottlenecks .Its just that something was stopping the data to flow from distributor to subscriber .This was like when we see an hour glass where there is a lot of sand at the top but the hole is so small that very less can go out of it .

I then saw the distribution verbose logs again and found that there were very frequent entries related to committing the transactions .The 5 similar sentences in 5 continuous lines were not very much clear but I could feel that something related to committing transactions was happening .But still very unclear .These entries were repeating in a group of 5-6 very frequently .The time consumption was around 2 seconds for each spell of 5-6 entries .

This gave us some hint and we immediately jumped to the Distribution agent profile to see if someone has not modified any setting which might be causing this issue.We did a mistake again because we just right clicked replication >> Distributor properties >> agent profiles >> Distribution agent .This was looking fine .

However, one of our colleagues opened up the profile of the distribution agent that was giving us the issue .This was the right way guys .We needed to do this . We found that -CommitBatchSize (default Value 100) and -CommitBatchThreshold(default Value 1000) values were changed to 10 each .

We then changed it back to the default and recycled the Distribution Agent .Thats it , the story ends here .
Due to this the rate at which the transactions were delivered at 1 cmd/transaction .

Even though it was a small setting that caused us slogging for hours , but the experience of working on the replication latency issue when there was actually no resource bottleneck (I call it as false latency) was amazing ..

Saturday, January 14, 2012

Answer is NO ..but to come to this conclusion , I had to spend some time .One of my colleagues came to me with this question .My instant answer was a clear NO ...But then I asked him with a curiosity the reason for asking this question .As per him the nightly index reorg job that should run for a very long time , finished in just 3 hours .Also we cannot check the index fragmentation since it takes around 2 hours .Our tables are huge ..

I thought of 3 reasons :

1) Since there is a logic in our job to do re-org only if there is certain level of fragmentation , that day there might be no index coming in the rebuild category.This possibility was less but cannot be ruled out .

2) The restore actually reshuffled the pages and in that attempt cleared out some leaf level fragmentation .I remembered that restores take more time than backups .So I started believing this .

3) There might be some Re-org activity happening during the time the backups were happening .The backup might have a copy of well re-orged pages and this might have resulted in a less fragmented database .

I first started off with point 2 and soon realized that I was not correct .This did not take me much time .For point one , we added the log in the job so that as the job finishes , it creates the log which we can read .But this will take time to generate.

Now , I was left with option 3 .I had the table with > 99% fragmentation and 24085822 rows .The table size was around 4GB . DBCC Showcontig output is shared below :

Friday, January 13, 2012

posting after a long time gap and might still have not posted until yesterday when I got to know how smart Replication is .

Brief Summary :We have a very large OLTP environment where millions of small queries do inserts and updates (No deletes) .The same is replicated to other subscribers .the data is so much that most of the time we firefight latency . So because of the size of the data we started archiving ,which also started deleting the data in batches from OLTP environment and replicating the same to the subscribers .This further added to latency for obvious reasons.

So to reduce latency ,we started thinking of replicating the execution of Stored Proc that deletes the rows in batches .No, I am not trying to say the just because we can replicate the SP execution , that the replication is smart .This feature is quite old now and perhaps you all might be aware of this already.

Issue that we thought we might face :We already have had all the required tables added as articles in the respective publications .And now if we add the Stored procedures in the publication then we thought Replication will try to update and insert the data twice . For example Lets say there are 2 articles in the publication .The first one is a table (lets say REPL_TAB) and the second one is a SP (say REPL_SP) .REPL_SP deletes x rows from REPL_TAB.

Now, if we execute REPL_SP , we thought that it should affect the Subscriber table twice .One when the SP deletes the rows and two , since the rows are being deleted REPL_TAB should also replicate the same .So we thought this might not work .We then thought of creating another publication with this SP added as an article but had same reservations .

But logically , I thought that Log Reader agent should pickup the command from the T-Log and should be smart enough to replicate it once .I mean if I run EXEC XYZ which deletes 10 rows in a table ABC ,then it should only replicate EXEC XYZ and not the delete command because the rows are being deleted from the table and that table is also an article in the same or for that sake different publication .

I first tested this in SQL Server 2005 SP3 CU2 and got it partially working .In the same publication If I have both the articles the executing the stored procedure will fill msrepl_commands and merepl_transactions with 1 row each .But If there are 2 publications with one article in each and I execute the SP to delete x rows its replicated twice .First 1 command and 1 transactions and then x commands and 1 transactions .Distribution history confirms the same .

I then tested the same on SQL Server 2008 SP1 and it worked like a charm .Below is the proof of concept for your reference :

Test 1:We have 2 publications on Adventureworks database .One is publishing a table and the other is a stored procedure execution (by default Stored procedure execution is not enabled).Stored Procedure Del_stats_scan, deletes top 10 rows from table dbo.Stats_State in the Publisher database Adventureworks)

Data replicated only once .The second Distribution agent did not Do anything . The reason you are seeing 2 transactions in log reader is because both are same images .Log Reader for one database is only ONE .

Fire these queries to find out what is being replicated :

select * from distribution.dbo.MSrepl_commands

select * from distribution.dbo.MSrepl_transactions

Query 2 :

exec del_stats_scan

Results :Data Replicated only once.Only the SP executed and replicated .The other distribution agent did not do any thing . The reason you are seeing 2 transactions in log reader is because both are same images. Log Reader for one database is only ONE .

Run these queries to find out what has been replicated

select COUNT(*) 'No. of rows in Repl_cmds' from distribution.dbo.MSrepl_commands

We have 1 publication on Adventureworks database with 2 articles.One article is publishing a table and the other article is publishing the stored procedure execution (by default Stored procedure execution is not enabled).

SQL Server 2008 Replication(log Reader) is smart enough and replicate data only once from the transaction log to Distributor .Distributor then distributes the command to the subscriber.There is a bug in SQL Server 2005 SP3 CU2 where the second test works fine but not the first test and replicate twice if we execute the SP. You will have to find out which CU in 2005 fixed this or might want to directly apply SP4.

Friday, July 22, 2011

Many a times this question has been asked (either in interviews or just out of curiosity) that what does resourcedb contain and why is it so important to SQL Server ?

Normally we can't see it and hence can't use it .However, there is a way you can use resource database .But be careful. If you mess up anything you might end up paying a heavy cost.

Since we are discussing about this ,there is one more point that I would like to touch here .Starting from SQL 2005 there are no system tables but DMVs for us to query .However, if you query sys.objects and filter it on type ='S' ,you will notice a lot of system tables listed in the output .So there are system tables and we can see them .But if you try to query them ,you will get an annoying 208 error stating that the object does not exist which is not correct.So in this post we will see how to query resource database and also in a similar manner other system and user database .

Let us see how can we use resource database and also query system tables.Start SQL Server with -m switch ( in single user mode ) . There are 2 ways :

1) Through services console (after adding -m do not click on OK but click on start )

2) Through DOS prompt

Once SQL Server has been started in single user mode , we can make only one connection . We will connect to SQL Server using DAC .DAC option can only be used in sqlcmd utility and not in OSQL or ISQL .Again there are two ways to do this .But before trying to attempt for DAC connection make sure you have enabled remote admin connection option via sp_configure (you can see the run value of 1):

1) connecting to SQL Server with DAC (using SQLCMD) using -A option

2)Connecting via MGT Studio
Open Management studio , It will prompt you to enter the instance name . Just before the instance name add admin:

you might or might not get this error :

If you get this error then click OK ( this error window will go away ) and then instead of Connect click on cancel .
you will see a clean screen like below :

Click on new query

You will again see the same connection popup :

This time click on connect and it will work :) .A new query window will open even though you will not see the databases in the left hand side pane but the connection is there and working ...

Run the query 'use mssqlsystemresource' and press F5 :) ...It will work :

Also if you query sys.sysdbreg sytem table (an alternative for sys.sysdatabases DMV )you will see the resource database :

This database is currently in read-only mode (trust me :-) ) . If you want to cross check this run dbcc shrinkdatabase (mssqlsystemresource) and you will get to know .
You can set it to read_write mode though by running : alter database mssqlsystemresource set read_write.Now the very fact that this table is Read-Only and that we cannot take the backup of this database ,proves that this database might not contain very important information ...if you query the tables you will see that certain static information is stored which the engine might be using from time to time .Something like we store the values in a temp table or a variable .So coming back to the original question , resource database contain a lot of static information which the engine might need from time to time for its internal use .

At the end you might ask "Whats the need to touch the system tables in the database ?"The answer is : We normally do not need to do this (especially Resource DB ) but there are other databases (system as well as user ) having some information we can use to resolve some issues (by updating those tables as needed ). And to resolve those issues , we need to login in this way ....

Hope you have found it interesting .But remember , BE VERY CAREFUL WHEN YOU TRY TO PLAY AROUND WITH SYSTEM TABLES (as i said in the beginning).
Happy Learning !!

Sunday, June 26, 2011

Recently there was a request on the MSDN forums where the poster wanted to archive the replicated partitioned tables in the publisher database . I think it would be good to share the solution with everyone in this forum as well .
In this post we will see :
Part 1) archiving the replicated non-partitioned (normal) tables .
Part 2) archiving the replicated partitioned tables .

At the end ,you will notice one nice to know feature of partitioned tables .

Now we need to archive the test table in publisher database but want to keep the Subscriber untouched i.e. the rows in subscriber should not change .For this example we will delete all the rows of the publisher table .

How should we do it ?
If I delete any row on publisher , the same will be replicated to subscriber . One way might be to stop the log reader agent and then delete the rows .After this , I can fire sp_repldone on the publisher and start the logreader agent .yes this is perfectly achievable . Here we go ...

We will enable the LogReader agent now and will see that there are no transactions to be replicated.

Select count * from both the tables will show 0( zero ) and 10000 rows respectively .

After this we will insert 1000 rows in publisher table ( we need to be careful as the tables have primary key).As a result the subscriber now, has 11000 rows and publisher has 1000 rows .This is going to be costly when there will be millions of rows because delete (or update or insert )is a logged activity.There is one more drawback and perhaps more critical.The log reader agent is one per database .So if there are more than one publications on the same database and we run sp_repldone , then we will hurt other subscriptions and publications .So we have to be careful. Other way is to truncate the table (after moving the data to an archived table) but replicated tables cannot be truncated (why ???.....simple , truncate is a non-logged activity and log-reader agent reads the log file to find the transactions marked for replication using sp_replcmds).So to truncate the table ,you need to remove the article from the publication .If you want to do that the steps are :

*******TEST THIS BEFORE IMPLEMENTING IT IN PRODUCTION********
1) Stop the log reader agent and distribution agent
2) Drop the article(s) from the publication
3) Archive the table to another table ( this will be a logged activity ) by Bulk insert or BCP or import export wizard
4) Truncate the table 5) Add the article again
6) Change the properties of all the articles in the publication properties to "Keep existing object unchanged" for option action "if name is in use " .This is the most important step and please cross check it a few times to make sure that "Keep existing object unchanged " is set
6) generate the snapshot again ....
7) start the log reader agent and distribution agent and initiate the new snapshot ....
*******TEST THIS BEFORE IMPLEMENTING IT IN PRODUCTION********
Trust me .you are done :) ..But don't you think its lengthy and a bit risky ..Now lets see something new ...

Part 2) Archiving the replicated partitioned tables .
Let us first create 2 new databases followed by creating partition functions followed by partition schemes followed by partitioned tables followed by inserting data in the tables .

--The following partition function will partition a table or index into four partitions.
USE test
GO
CREATE PARTITION FUNCTION [PF_test](int) AS RANGE LEFT FOR VALUES (1,100,1000)

-- Creating partition scheme
use test
GO
IF NOT EXISTS (SELECT * FROM sys.partition_schemes WHERE name = N'PS_test')
create PARTITION SCHEME [PS_test] AS PARTITION [PF_test] TO ([second],[third],[forth],[fifth]) --[Note if you want to have one filegroup for all the files then : create PARTITION SCHEME [PS_test] AS PARTITION [PF_test] All TO ([secondary]) ]

Once you are done create the publication on database TEST and the add the article TEST.Once Its done, the Test_Pub Publication is ready to publish.After this we will create the Subscription on this publication .Our subscriber database is test_sub.Once the initial snapshot is synchronized you will see the following values :

select OBJECT_ID('test..test')
select OBJECT_ID('test_sub..test')
select * from test.sys.partitions where object_id in (2105058535) order by partition_number
select * from test_sub.sys.partitions where object_id in (133575514) order by partition_number

So as of now everything is as per plan . The data is synchronized in the correct partitions .Now if we need to archive the publisher table we can try the same old approach that we used in stage 1 .However , we will try something new here .That something new is SWITCHING OF PARTITIONS in the table .I will not explain what does it means because you will see it in few seconds(or you can refer BOL) .

Now , just check the number of rows in the tables test and test_archive on publisher test and the test table on subscriber test_sub:

Thats the magic :)..Did you also notice that we did not create any partitions for the test_archive table .Lets query the syspartition DMV and notice the partitions in test_archive table .

Thats the beauty .You did not have to delete or truncate a single row.Nor did you remove the article or stopped any agent . Now if you add rows to table TEST it will be replicated to subscriber as usual .lets try this by inserting 1000 rows in table test in the publisher and then check the subscriber table --inserting new values in test table ( in publisher database )
declare @val int
set @val=2000
while (@val > 1000)
begin
insert into test..test values (@val)
set @val=@val-1
end

Suggestions are welcome as we are here to help each other technicaly grow.Happly learning

Saturday, June 18, 2011

The complete error is :
select xactts, subguid from MSrepl7 where pubsrv = ? and pubdb = ? and indagent = 0 and subtype = 0
ORA-00942: table or view does not exist

We were setting up heterogeneous replicaton between SQL and Oracle 9i.This was done successfully.But when we tried synchronize the Articles (actually we were replicating only a view)we got stuck at this error.

It was clear that the error is coming from the subscriber .But we were not replicating the object : MSrepl7 and we dont know whether this is a table or view .Since we were not replicating it , I was sure that this is a table or view that replication is creating .I saw a KB article which talks about this table for DB2 : http://support.microsoft.com/KB/313332 .

Later I found from other oracle subscribers that MSrepl7 is nothing but a replica of msreplication_subscriptions table in SQL Server subscribers. This table is looked up and matched with msrepl_transactions .The columns that are compared are transaction_timestamp in MSrepl7 with xact_seqno in msrepl_transactions table .

moving forward we wanted to find why this table does not exist on the subscriber . I suspected that it should be created while Subscription setup or while we reinitialize subscription or while synchronizing .

To see if its really being created I enabled tracing on Distribution agent since its this job which is failing . -Output C:\Temp\OUTPUTFILE.txt -Outputverboselevel 2.

Looks like when we start synchronizing it does not create sufficient objects at Oracle side due to permission issue .Sadly , SQL also does not throw any errors that we cannot create the object due to permission issue (might not have put the Try catch for this error).

So,I requested the Oracle DBA in charge of this to give the appropriate permissions to the login that is executing the distribution agent job and execute the Distributor job again .

Had I been asked this question a few years ago , I would have said "I would search for a solution" .

Most of us do this i.e.we first try to find the solution .Sometimes we succeed but most of the times we do not.After many unsuccessful attempts I realized that the step to find the solution goes through another step first, and that is Finding the problem .Not going deep in to it .

A couple of months back one of my collegue came to me with a problem " There is a job that fails on every Monday" .This job takes some values from somewhere and inserts it in SQL Server tables.The error was :Msg 241, Level 16, State 1, Line 2Conversion failed when converting datetime from character string.

Earlier ,my collegue explained the client that this is not a SQL issue and suggested the poor client to touch base with the application team .But the client was not a fool .He said that there is some problem in SQL Server and he don't want to go to DEV without proof.He is not a techie though.

What Should I do , Google it or BING it :-) .we did not do that .
you can see this message in sysmessages . [select * from sys.messages where message_id=241]

We decided to reproduce the issue and with in 15 mins , we proved that the format in which Date is entered at the application level should be incorrect and datetime datatype is not recognizing it .

This is more clear
Msg 241, Level 16, State 1, Line 2
Conversion failed when converting datetime from character string.

So the reason could be :
The application adds single quotes to any entry .For example NULL will be converted to 'NULL' and 2010/07/28 will be converted to '2010/07/28' . In this case 'NULL' will give 241 but the date will be absolutely correct and will not throw error when inserted in the table (inside SQL Server).

Conclusion :
Always try to look for the reason behind the error/issue first rather jumping for solutions here and there .It might take time but you will learn more .

Around 4 months back we faced a latency issue in replication. We used tracer tokens and found that the distribution agent was lagging behind .Before I go forward, let me explain you that the distribution agent has 2 threads .Reader thread reads the value from the MSrepl_transactions table (this activity happens in parallel with the log reader agent where the rows are pumped in to the msrepl_transactions table) from distributor database and the writer thread apply those commands to the subscriber .

To find out where we are getting delayed we configured the verbose logging with level 2 in the distribution agent job (http://support.microsoft.com/kb/312292). In the output we saw that there is a good time gap after the sp_MSget_repl_commands command is fired and the next command .Now this command is fired to read the distribution database and populates the rows in the memory tables .These rows are then read by the writer thread and inserted into the subscriber database.

It was clear that the reader thread was taking a lot of time in retrieving the rows as compared to the writer thread writing those commands to the subscriber database .

The next stage is to check the msrepl_transactions and msrepl_commands tables .We checked the number of rows in those tables and found that there a millions of rows which are replicated but still showing up in the in those tables . This was strange . We checked the cleanup job and found that the job was running fine .

So whats the issue ? Why the job is not removing the replicated rows ??
upon digging deep , we found that the DBA selected the "Create a snapshot immediately and keep the snapshot available to initialize subscriptions" option when he configured the replication .As a result Replication is suppose to keep all the transactions cached in the Distribution database for the entire Retention Period.You will also see all the snapshot files in the snapshot folder on the distributor .

Why it happens :
Every new subscriber that is added with in the subscription retention period needs the initial snapshot and then the data from the logreader is applied over the snapshot that is accumulated in the distribution database. But the Snapshot gets OLD too as the database image changes from time to time .Therefore every time a new snapshot is needed first. Because of this option set, the same old snapshot is applied first and then all the remaining LogReader entries from the distribution database .That is the reason all the old transactions are kept till the subscription expires.

Why is it configured ?
This command is configured in the environment where there is a need to create the subscription quite often and also if the snapshot increases in size quite considerab;y over the time .

How is it configured ?Commandline :
via sp_addpublication

Drawback :
Due to this option the size of Msrepl_transactions and Msrepl_commands increases ( which is more than normal) which slows up the synchronizations and clogs up the system.

Introduction
We all know that data retrieval will be fast if the data pages are found in RAM .If the data pages are not in RAM, they are fetched into RAM from the disk. This causes a physical IO .The page remains in the RAM until it’s again kicked off to Disk.
But the process and threads do not access the Physical memory (RAM) directly .Instead the RAM is accessed indirectly through Virtual Memory or Virtual address space (VAS) pointers. On a x86 operating system the number of such pointers in virtual memory that can point to physical memory is 4,294,967,296 (2^32) .This is equal to 4 GB .Out of this 4GB VAS pointers, 2GB worth of pointers are located in the Kernel address space and remaining 2GB in the User Address Space .It’s this 2GB of user address space which is used by the Processes and threads for their use and to map it to RAM. Other 2GB Kernel Address space is also mapped to RAM for the OS routines and APIs .So, normally on a 32 bit windows OS, SQL Server will use 2GB RAM (1.66GB Buffer Pool region and 384MB Mem2Leave region) .
But what’s the need for Virtual memory when we have Physical memory and it is the real memory .Let me correlate this to a smart Bank. The bank started with $5000.A customer deposited $1000 and the bank will return $1100 after a year .after 1 month, Another customer deposited $2000 for 1 year and the Bank will return $2200 after a year. So the bank has now $8000 for around 2 years .Then someone took a loan of $3000 for 1 year and the bank will get $4000 after a year .In between if the earlier 2 clients want to withdraw their deposit before time, they can pay the penalty and Bank has sufficient money to give back from the initial investment .In reality the banks or the moneylenders keep revolving the money which they might not even have.

I hope you have some idea now. The OS also works like this .It assures every process 4GB of memory .Right now there are 119 processes running on my laptop .If we go by this fact then the OS is ensuring 476 GB RAM to the processes .But I have only 2GB RAM on this laptop. That’s where the Virtual memory comes into picture .This 476 GB is actually a virtual memory address space and nothing else ; which does physically not exist .OS memory managers maps this virtual memory to Physical memory (RAM) .During this process the Page file on the disk is also used if a thread needs more physical memory than available .

Let’s skip discussing about AWE, /3GB, /USERVA and PAE for now as it will divert us from the topic which is to know how the Virtual addresses get translated to Physical addresses in RAM.

Address translation is the process of translating the virtual memory to physical memory.

Every Virtual Address has 3 components:

The Page Directory Index : For each process the OS memory manager creates this directory to use it to map all the page tables for the process.The address of this directory in stored somewhere in the address space called as KProcess Block (Kernal Process Block). To keep this subject less complex I will not explain what Kprocess Block is .The CPU keeps track of this Page Directory Index via a register called as CR3 or Control Register 3 .This Register also resides in KProcess Block .So the CPU’s MMU knows where the Page Directory Index is located with the help of This register (MMU: ttp://en.wikipedia.org/wiki/Memory_management_unit ).So the first 10 bits of the address space pointer has Page Directory Index value (there are a lot of page directory entries).This tells Windows which page table to use to locate the physical memory
associated with the address.

The Page Table Index : The second 10 bits of a 32-bit virtual address provide an index into this table and indicate which page table entry (PTE) contains the address of the page in physical memory to which the virtual address is mapped.

The Byte Index: the last 12 bits of a 32-bit virtual address contain the byte offset on the physical memory page to which the virtual address refers. The system page size determines the number of bits required to store the offset. Since the system page size on x86 processors is 4K, 12 bits are required
to store a page offset (4,096 = 2^12).

Let’s summarize it now:
1. The CPU’s Memory Management Unit locates the page directory for the process using the special register mentioned above.
2. The page directory index (from the first 10 bits of the virtual address) is used to locate the (P)age(D)irectory(E)ntry that identifies the page table needed to map the virtual address to a physical one.
3. The page table index (from the second 10 bits of the virtual address) is used to locate the PTE that maps the physical location of the virtual memory page referenced by the address.
4. The PTE is used to locate the physical page. If the virtual page is mapped to a page that is already in physical memory, the PTE will contain the page frame number (PFN) of the page in physical memory
that contains the data in question. If the page is not in physical memory, the MMU raises a page fault, and the Windows page fault–handling code attempts to locate the page in the system paging file. If the page can be located, it is loaded into physical memory, and the PTE is updated to reflect its location. If it cannot be located and the translation is a user mode translation, an access violation occurs because the virtual address references an invalid physical address. If the page cannot be located and the translation is occurring in kernel mode, a bug check (also called a blue screen) occurs.

How the Address translation happens with PAE in place:
Everything is same as above except that:
1)There is a new table which is above PDEs and PTEs .Its Page Directory Pointer Table.
2)The PTEs and PDEs are 64 bit wide as compared to 32 bit wide when PAE is not enabled.

Hi Guys ,
A couple of weeks back we had a setup issue where the SP3 setup was failing on Notification services and Client components .I am sharing my experience with you because it took us a lot of time to figure out that a few registry keys were missing .This explanation has 3 parts viz.

First Part : Solution of the issue .Second Part : Reproducing this issue and finding a solution in a better way so that the same type of process can be followed for other similar setup issues as well .Third Part : 2 bugs (Unfortunately , I could not reproduce it on my machine, but repeatedly reproduced it on clients machine .Will still try on my machine and file them later.

First Part : Issue and its solution .Issue
As per the Security Bulitin MS09-004 we rasied a change to patch the DEV (and later would be Prod Servers)server to SQL Sevrer 2005 SP3 .Since we were failing on the DEV Server , we were not able to initiate the setup on Prod until the setup on Dev is resolved . Till then the Prod was under potential threat of SQL injection. The issue was that we were not able to upgrade SQL Server 2005 RTM (32 bit) to SQL Server 2005 SP3 (32 bit) on Windows Server 2008 (64 bit) .

Troubleshooting done
-> The setup was failing only for Notification services and Client tools .Rest all components were successfully upgraded to SP3 .
-> Found the Both Notification services and Client tools were failing because of same error :

-> So the issue was with registering Notification Services DLL file which was common to both NS and Client components .As you can see RegSvcs was being called from inside the function Do_sqlAssemblyRegSvcs.

-> Since COM+ Catalog was also showing up in the error ,I ran the SQL Server RTM setup and found the the SCC ( System configuration check ) is failing on COM + Catalog requirements .
->So,I checked the Component Services (DCOMCNFG) and found that there are no issues there .Everything was working fine .We also noticed that there were no errors related to COM components in the Application logs as well except the error related to MSDTC .The 32 bit COM components were also running fine .This was strange but made me believe that this error might be misleading .

-> We checked the SQL Sevrer errorlogs and found the same entry related to MSDTC , but in a slightly different manner :
QueryInterface failed for "ITransactionDispenser": 0x8004d027XACT_E_UNABLE_TO_READ_DTC_CONFIG)

-> Now the picture was clear that MSDTC had issues for sure. Also the COM+ Catalog warning is related to MSDTC issue (I had resolved the same issue a few years back :http://ms-abhay.blogspot.com/2009/10/msdtc-was-unable-to-read-its.html. But it was on Win server 2003)

-> The first error code is same 0x8004d027 but the second error code , which is coming before 0x8004d027 is 0x00000002 is nothing but telling us that there is some registry key missing ( there might be more ) . error 2 means "System cannot find the file specified".

-> Since we were not sure of which key was missing we decided to uninstall and reinstall MSDTC . This will automatically recreate all the missing registry keys.

-> But there was some more twist left. On windows Server 2008 we cannot simply uninstall MSDTC by using command "msdtc -uninstall" .We have to remove the MSDTC server role >> reboot the server >> re-add MSDTC in the Server roles.

-> We tried that ,but even after removing MSDTC from the server role , it was still showing as running in the services console .We tried a few times without success.

-> In between we also tried to repair .net framework 3.5 SP1 which did not help .

-> I also went through this article which talks about making sure that all the MSDTC related keys are present in the registry hive:
http://msdn.microsoft.com/en-us/library/dd300421(v=WS.10).aspx(This was the 1st step towards solution , although it did not work.)

-> However , the keys mentioned in this article were present in both WOW mode and normal mode in the Registry .

-> Then on my machine I tried to find the registries with the value "OLETransactionManagers". The first Key hit was : HKEY_CLASSES_ROOT\OLETransactionManagers

-> Since SQL Server was running in WOW mode on client's server , we tried to find HKEY_CLASSES_ROOT\Wow6432Node\OLETransactionManagers key and got it in first attempt.

-> However , on my laptop this key had some values while there were no values on the Client side registry key (mentioned above).

-> We found that these 64 bit MSDTC registry keys were there but the 32 bit (WOW mode) registry keys related to MSDTC were missing .

->We restarted MSDTC and SQL Server . We checked the SQL Server errorlogs and found that MSDTC related error is no longer showing and also the SQL Server SCC check was not showing that error related to COM+ Catalog any more.

->This gave us some hope and with that hope we hoped that the setup will run successfully .This in turn did happen (",).

Second Part: Reproducing the error
I reproduced this on my machine by removing all the keys from HKEY_CLASSES_ROOT\OLETransactionManagers

When I ran the setup it gave me the same warnings on SCC :

I will still go ahead with the setup and select only the client components to install :

The setup will encounter this error:

Let’s check the Application logs:

At this stage First let me introduce a tool that can show you how to find the missing registry key or if there are any permission related issues .I have been using this tool since long now .This tool is Procmon from sysinternals http://technet.microsoft.com/en-us/sysinternals/bb896645".We will use this tool to find the missing registry keys .However this tool captures a lot of information of all the processes .So we need to first PAUSE it as soon as we launch it .

Now, we will find the process ID of MSIEXEC.exe from Task Manager since as per the Application logs its the msiexec command that is failing:

I will now filter it on those 3 PIDs

Once it’s done, uncheck the PAUSE icon (the magnifying glass) and click on the retry option on the setup window. Quickly after that ,again PAUSE the Procmon by clicking on the magnifying glass icon. Notice the output in the Procmon. You will see a lot of different keys there .We now need to delete not necessary entries:

Right click on SUCCESS and select:

Right click on RegOpenKey and select:

You are now left with only 6 keys to look at(You can follow the same steps for other missing Reg keys issues ) and all these keys are REGQueryValue:

Now ,notice the error copied above from application log:
*******

Now, double click on each of the 6 keys (start from the bottom most) and select the option >> PROCESS. You will notice the same command line string as showing in the application logs:

So the first key that is missing is HKCR\OLETransactionManagers\DefaultTM .The reason it’s not showing all the missing keys is because it’s failing on first key itself. If you create the first key and then click on retry button on the setup the Procmon will show you the next missing key.

But what should be its value .We can check on other servers (preferably same server version).Create the missing keys and click on retry button .The setup will be successful (in our case it was SP3 and not the initial setup .But the resolution is same).

Third part : The Bugs
1) The issue is , when the SP3 setup fails it should ideally rollback everything to normal .So the NS and client tools should work fine.But On the client's server we saw that the Management Studio stop working and throws an error ( I dont have the screenshot now . Will try and reproduce it ) . The only solution is to uninstall the client tools and reinstall again .During the setup you will again get this error .Click on ignore and the setup will complete .

2) On my laptop , I saw that it also corrupt other components of SQL Server 2008 like Books on Line ..Again I do not have that proof now but will reproduce it .The solution is to uninstall client tools and reinstall tools of SQL Server 2008.

Today , I was asked by someone how SQL Server behaves in a non-preemptive way while still running on the OS which is preemptive .Even though I explained this theoritically ,I was requested if there is a way we can see it practically . It wasn't difficult but it really proved to what I explained .I alos felt that if we add practicals to theory it will have much deeper impact ...

NOTE : I am doing all the testing on SQL Server 2008 RTM EVAL as what I am trying to explain can be done in 2008 onwards.On 2005 you will not see what is written here.So I would request to use SQL 2008 and above .

Windows Scheduling (Preemptive) :
Starting from Windows NT 3.1 (XP ,2000,2003 etc) Windows scheduling was priority driven i.e preemptive scheduling .So every thread will have a priority associated with it .Based on this priority the threads will get the time slice (Quantum) to run on the CPU .So , even if thread of lower priority is running and all of a sudden another thread of a higher
priority comes up , the low priority thread will be preempted interrupted) and the higher priority thread will be scheduled to run on the CPU.However , the scheduler is smart enough .It will keep the preempted thread on the top of the waiting threads by adjusting its priority (Lets not get too deep in to this at this point).

Prior to this OS scheduling was non-preemptive i.e. cooperative scheduling .Remember the days when Windows 98 use to hang and we use to reboot the server quite often ,to get rid of it.Cooperative scheduling is good if all the threads leave the CPU after some time and give chance to other threads (including kernal mode threads which are more
Important and get a chance to run whenever required) after a fixed interval of time . But that normally does not happen .some nasty application threads don't yield and hence blocks other threads.

SQL Server Scheduling (non-preemptive) :
SQL Server has its own scheduling mechanism and it does not follow OS scheduling (looks strange as it runs on the preemptive OS) .Its called as UMS (User Mode Scheduling) in 2000 and SOS (SQL OS) in 2005 and above .BUT :1) Why SQL Server does not hang just like windows 98 use to ?Answer: SQL Server does not hang because its threads yield every voluntarily.In case a thread does not yield in 60 seconds (unlike the faulty application where the threads does not yield)SQL Server throws non-yielding scheduler hung error and throws a mini dump with the stack information of all the threads in it .

2) How SQL Server manages to schedule in the non-preemptive way ?Answer: Windows OS will not schedule any thread which is running in infinite wait loop and simply ignores it. SQL Server (actually UMS) takes advantage of this and cleverly puts all the threads which it does not want to schedule to infinitely sleep by calling WaitforsingleObject function in an infinite loop .When SQL Server wants the thread to run it simply signal the thread and it comes out of the sleep modes .Its the Windows which then schedules the thread .Its important to know that UMS schedules only ONE SQl Server thread per CPU .However , there is an exception to this . There are moments where to complete a task the thread leaves the SQL Server scheduler and goes to preemptive mode scheduling . For example using xp_cmdshell to open notepad or running an extended stored procedure that deals with filesystem (like reading a file) or a linked server query
etc.In that situation , you will see more than one thread on a single CPU in runnable status .That is because one thread is scheduled via UMS\SOS and the other one directly via OS scheduling.

Let me show you a Demo since my Laptop has only one dual core processor (Its SQL Server 2008 RTM):Lets first run a simple query and find runnable and sleeping threads Select Status ,* from sysprocesses where status not in ('background')

you will notice that all the SPIDS will show you the status of sleeping and there is only one SPID that will be showing you the status of runnable .Its waittype will be PREEMPTIVE_os_WAITFORSINGLEOBJECT .Notice that only runnable SPID has a KPID associated with it .This KPID is nothing but the worker thread associated with the SPID.You can run it a few times but the output will not change except the KPID which means that one thread is yielding to another after the context switching .The reason why we see runnable state and not running ,because by the time we get the query output the thread again goes to runnable state.You might also see other runnable or suspended SPIDs but its because they are running in preemptive mode .

Now lets open another Query window and execute the same command there .Select Status ,* from sysprocesses where status not in ('background'). This time its SPID 53 (current SPID on my machine) which is showing us the Runnable state while SPID 52 (the previous SPID)is now sleeping .

Lets do one more experiment.Open a new query window (SPID 51 in my case) and run select @@servicename around 1000 times . Come back to SPID 53 window and notice if the runnable state is showing for SPID 53 or SPID 51.You will notice that we have the SPID 51 doing its task .But why is it showing as sleeping while CPU value is still increasing ? The reason is that when we run the query via SPID 53 , during (and only) that time 51 shows as sleeping because 53 needs to run . So thread related to 51 yields voluntarily. when this query finishes , SPID 51 again picks up , but we cant see that since we have only one processor :) ...

Anyway , let me show you the small test when the SQL Server thread goes preemptive .We have 2 Query windows.One with sysprocesses query (SPID 53) and one with calling xp_readerrorlog 100 times (SPID 51).I further modified my sysprocesses query by filtering sleeping SPIDs.

Select Status ,* from sysprocesses where status not in ('background','sleeping') .Lets run the query through SPID 51 and then by SPID 53.Notice that you have 2 runnable SPIDs now . Thats because SPID 51 is scheduled by OS and not SQL OS \UMS .

Wednesday, April 27, 2011

Hope this article will help you in some or the other way one day :) .....

Introduction:
This small article is applicable for finding optimal number of CPUs for long running CPU intensive queries/workload that doesn’t frequently wait for other resources and is not applicable if your queries/workload is often waiting for resources (like I/Os, Locks, Latches etc.) without consuming CPU in a stretch .it can also provide information on uneven CPU load across NUMA nodes and uneven CPU load within same NUMA node (load_factor effect).
It is recommended to analyze Windows Performance Monitor Counters for monitoring CPU pressure. Processor utilization greater then 75% to 80% indicates CPU pressure. Using Windows Performance Monitor should be the 1st step, the procedure suggested in this article should be considered as an additional step.
Further ,it is very important to find ways to optimize the queries/workload by tuning the database schema before attempting to add additional CPUs.

Description:
When a customer asks you: I am running a resource consuming SQL job and it takes x amount of time, how can I reduce the time so the SQL job completes sooner, can I add more CPUs ? if yes, how many ?
When you see CPU pressure, there are 2 options: you can either upgrade to faster CPUs or add additional CPUs [assuming that the queries are well tuned and normalized]. Upgrading to faster CPU will always help. Adding additional CPUs may not always help the SQL job to run faster unless that SQL job can take advantage of additional CPUs [read Max Degree of parallelism form BOL]. If the customer already has the fastest CPUs available in the market then they have to wait for the next release of faster CPUs. One more choice woiuld be to add additional CPUs and see if it helps, the below procedure will help you identify if this is the case.
This method calculates total user waits for CPU during the SQL workload and suggests additional CPUs if necessary. If CPU usage is at 100%, but no one waited for CPU during the workload, then adding additional CPU will not help; this is the basics of this calculation.
Current recommendations that are available on this topic calculates ‘signal wait time’ to ‘wait time’ ratio to suggest CPU pressure – but this cannot help one easily identify number of additional CPUs necessary.

Procedure:
When concurrent users apply simultaneous CPU intensive workload, there could be CPU pressure. We can conclude presence of CPU pressure when at any given moment during this time period at least one or more user tasks waited for CPU resource.
In this case one can run the below query to find out how many CPU on an average will help to scale(out) the workload better. It might be more informative to collect the below information in short time intervals (many samples) than just once to understand during which time of the workload application the CPU pressure was the most. Single sample will lead to average additional CPUs necessary for the entire workload duration.
1. Reset Wait Stats
dbcc sqlperf('sys.dm_os_wait_stats', clear)
2. Apply workload (you can find sample workload query at the end of this article, you need to execute the sample workload query simultaneously in many sessions to simulate concurrent user tasks).
3. Run the below query to find Additional CPUs Necessary – it is important to run the query right after the workload completes to get reliable information.

Example:
When you have 2 CPUs and you run the sample workload with just 1 or 2 concurrent sessions – you will see no recommendation for addition additional CPUs – unless there is unbalanced user task distribution across CPUs. On the other hand if you run the workload with 4 concurrent sessions – you will notice the query suggests you to add 2 additional CPUs. If you run with 6 concurrent sessions – you will notice the query suggests you to add 4 additional CPUs.
If each workload runs in parallel (MAXDOP not 1), then you will notice additional CPU suggestion, you need to be careful in this case. For example with 2 CPUs when you run the workload (in parallel, MAXDOP 0/2) with 2 concurrent sessions, you will notice the suggestion to add 2 additional CPUs – this just indicates the workload is more scalable with more CPUs – parallel query execution as you can imagine can consume as many CPUs as you have and can consume more!!
The results are not reliable when other applications are running in the system. Also the results might be incorrect on a hyper threading enabled system.

Explanation:
When there are more user tasks concurrently needing CPU than available CPU, the excess user tasks will wait for CPU (there are exceptions when the workload is not evenly distributed across CPUs). In this case each user task uses its quantum, then goes into a wait state (waiting for CPU with wait_type SOS_SCHEDULER_YIELD. sys.dm_exec_requests doesn’t show this wait type, probably by design to avoid showing user tasks in wait state when they are waiting for CPU. But sys.dm_os_wait_stats will include these waits) until all other runnable user tasks have used their quantum. If one measures how many tasks went into this wait state and for how long while the workload was applied – it is possible to calculate the CPUs necessary to scale the workload better.
runnabkle_task_count from the dm_os_schedulers is also a indication for CPU pressure, but it is just a probe – one cannot reasonably conclude the number of CPUs necessary for a given workload.

Exception:
There is an exception(for OLTP like workload) where a user tasks doesn’t consume all of its quantum(goes into some other wait state before the quantum expires, waiting for I/Os, Locks, Latches etc.) in a stretch, but continues to run in a loop using CPU without using its full quantum(You know what quantum is ...right :D). The method mentioned here cannot calculate the necessary additional CPUs in this case.. Most common example is short transactions using part of its quantum and starts WRITELOG waits and continues in a loop – inserts using implicit transactions in a loop is a typical example.