Search This Blog

Pageviews

Ads

Wednesday, January 18, 2017

Imagine a situation when you are working on a query for couple of hours and suddenly your management studio is crashed or your PC is restarted and you forgot to hit the “CTRL + S” in between. Worst nightmare of any developer. It is always extrmely difficult to re-create the same query again.

In the past I used to go to this folder to find my unsaved sql scripts : C:\Users\YourUsername\AppData\Local\Temp

But somehow this was never realiable. There were few occassions when I didn’t find the query I was looking for. Few days back I came across this table valued system fuction sys.dm_exec_sql_text which can save our day from the nightmare of not saving the scripts. More information about this function can be found on the Books Online here: sys.dm_exec_sql_text

Now striaght away jump to the query which can help us to recover the unsaved SQL query scripts from the management studio.

Few notes: 1. You will need VIEW SERVER STATE permissions to execute this query. 2. Since there could be many queries running on the server you would like to use WHERE condition TEXT LIKE '%SOMETHING%'. 3. You can recover the queries only till SQL Server keeps them in the memory. 4. DM_EXEC_QUERY_STATS has few more useful columns that can help you with some extra information related to your query. You may like to add them in this query.

Tuesday, November 17, 2015

In SQL Server 2012 Microsoft did lot of enhancements in SSIS, especially by introducing the “Project Deployment Model”. After this so many things got better in SSIS. While there are so many good things about this change one thing started to irritate me and that is the error in the job history. With this change we stopped getting the exact error messages in job history. Instead, when we open the job history to check the reason for failure we see this message: To view the details for the execution, right-click on the Integration Services Catalog, and open the [All Executions] report.
To get the exact job failure reason we need to go the Integration Services Catalog, right click on the Project name >> Reports >> Standard Reports >> All Executions. These reports are very nice and detailed but every time going to these reports and trying to find the error is very irritating. Sometimes too much information is not that good. I have seen people making mistakes in reading these execution reports. Most of the times we are interested in error messages only and not in other execution events. Worst part is we can’t even copy the error message to quickly Google the error for solutions. So I thought of doing something about it. Every time I’m not going to open execution reports to see the error. --------------------------------------------------------------------------------------------------------DECLARE @DATE DATE = GETDATE() - 7 -- This is to restrict the data for last 7 days, used in ON condition SELECT O.Operation_Id -- Not much of use,E.Folder_Name AS Project_Name ,E.Project_name AS SSIS_Project_Name ,EM.Package_Name ,CONVERT(DATETIME, O.start_time) AS Start_Time ,CONVERT(DATETIME, O.end_time) AS End_Time ,OM.message as [Error_Message] ,EM.Event_Name ,EM.Message_Source_Name AS Component_Name ,EM.Subcomponent_Name AS Sub_Component_Name ,E.Environment_Name ,CASE E.Use32BitRunTime WHEN 1 THEN 'Yes' ELSE 'NO' END Use32BitRunTime ,EM.Package_Path ,E.Executed_as_name AS Executed_By

WHERE OM.Message_Type = 120 -- 120 means Error AND EM.event_name = 'OnError' -- This is something i'm not sure right now but SSIS.Pipeline just adding duplicates so I'm removing it. AND ISNULL(EM.subcomponent_name, '') <> 'SSIS.Pipeline' ORDER BY EM.operation_id DESC --------------------------------------------------------------------------------------------------------
A quick SQL query to get the exact error would be nice. And as I expected, all the package execution logs and error messages are stored in the SSISDB internal tables. I created below sql script to query the SSISDB to get the exact error messages:
This was simple script to get the errors from SSISDB instead of going to Integration Services Catalogs execution reports. Though this query is still work in progress, there could be scope for performance improvement or adding more information. I’m still understanding the tables in SSISDB. I’ll keep on posting the updates if any.

Mangal Pardeshi

Saturday, November 14, 2015

Today I’m going to show you how to write a PIVOT query when column names are dynamic. A dynamic pivot is very necessary when you don’t know the column names or column names can change with the time or due to any other reason. I’m not going to discuss the pros and cons of the dynamic sql or even the Pivot itself (in my opinion pivoting of data should be handled at the reporting level.). So let see how to write a dynamic pivot query with a very simplified example.

First lets create some sample data: -------------------------------------------------------------------------------------------------------- -- create table CREATE TABLE Orders ( OrderID INT, ProductName VARCHAR(50), Quantity INT, OrderDate DATE ); -- insert some sample dataINSERT INTO Orders VALUES(1, 'Pen', 100, GETDATE()-1), (2, 'Pen', 200, GETDATE()-2), (3, 'Pen', 300, GETDATE()-3), (4, 'Pencil', 500, GETDATE()-1), (5, 'Pencil', 600, GETDATE()-2), (6, 'Pencil', 400, GETDATE()-3), (7, 'Eraser', 30, GETDATE()-1), (8, 'Eraser', 20, GETDATE()-2), (9, 'Eraser', 10, GETDATE()-3), (10, 'Pen', 100, GETDATE()-4), (11, 'Pencil', 500, GETDATE()-4), (12, 'Eraser', 30, GETDATE()-4); -- verify the data SELECT * FROM Orders --------------------------------------------------------------------------------------------------------This is how our data looks like: Requirement: Now I want to Pivot this data on the column OrderDate. I want to show the last 3 dates of OrderDates in the column and respective quantity of each Product under the Date. Here is the expected output, though you need to remember that you will get different column names(dates) as my sample data itself is dynamic: Now first understand the challenges. Our requirement is to show the last 3 days of OrderDate. Today I’m writing this post that’s why last days are 11 Nov – 13 Nov, but tomorrow I’ll want my query to show the dates 12 Nov – 14 Nov. It will continue to change. Every time changing these dates in our query can be difficult/painful/irritating. And that’s where dynamic SQL comes into the picture. The main challenge is to get these ever changing column names and pass them into the PIVOT block without hard-coding and also into the SELECT clause.

Lets do one thing at a time. First we will try to get the last 3 OrderDates and concatenate them in desired order and format. For that I’m going to use XML PATH. Note that, there could be other methods to concatenate the column names into a single variable. Here is my code to get the column names dynamically into a variable: --------------------------------------------------------------------------------------------------------DECLARE @ColumnNames VARCHAR(1000)

SELECT @ColumnNames = STUFF(( SELECTDISTINCTTOP 100 ', ' + QUOTENAME(CONVERT(VARCHAR(10), OrderDate, 120)) FROM Orders WHERE OrderDate > DATEADD(DD, -4, GETDATE()) ORDER BY ', ' + QUOTENAME(CONVERT(VARCHAR(10), OrderDate, 120)) DESC FOR XML PATH('') ), 1, 2, '') -- Just to check how we are getting the column namesPRINT @ColumnNames -------------------------------------------------------------------------------------------------------- Few notes on above query: 1. DISTINCT is needed because one OrderDate is appearing multiple times in the table. 2. TOP 100 is needed because I want to order the dates in the descending order. Since our query is actually a sub-query, TOP hack is needed. I put the number 100 just randomly, I could have simply used TOP 3 because I want only top 3 dates. When you are not sure about the number of columns you can either put a relatively large number like 100 or TOP 100 PERCENT. 3. In WHERE clause I'm restricting the dates to last 3 dates from getdate() i.e. from "TODAY". 4. I'm also converting OrderDate to varchar and using the style 120 to get the dates in the desired format. In my case I want the dates in yyyy-mm-dd, that's why the style 120. This is how we get column names from above query: [2015-11-13], [2015-11-12], [2015-11-11]

Mangal Pardeshi

Friday, November 6, 2015

Recently I was working on creating a custom dashboard to monitor the health of all the database servers. One thing I wanted to add to the dashboard was “Available Disk Space” on the data drives of SQL Server instances. And I thought I’m not going to get this information in any system table or DMV and I’ll need to get the disk space information via some window commands or something. Then I came across this dynamic management function sys.dm_os_volume_stats , which provided the exact information I was looking for.

sys.dm_os_volume_stats:

Returns information about the operating system volume (directory) on which the specified databases and files are stored in SQL Server. Use this dynamic management function to check the attributes of the physical disk drive or return available free space information about the directory.

Here is the script that gives you the useful information about the total and available size of attached data volumes on SQL Server instance. -------------------------------------------------------------------------------------------------------- SELECTDISTINCT d.logical_volume_name AS LogicalName ,d.volume_mount_point AS DriveName ,CONVERT(NUMERIC(32,2),d.available_bytes *1.00/(1048576.0 *1024 ))AS FreeSpaceInGB ,CONVERT(NUMERIC(32,2),d.Total_Bytes *1.00/(1048576.0 *1024 )) AS TotalSizeInGB FROMsys.master_files f CROSS APPLY sys.dm_os_volume_stats(f.database_id, f.FILE_ID) d ORDER BY FreeSpaceInGB --------------------------------------------------------------------------------------------------------

This script is very useful in monitoring the data volumes, tracking the growth of the data, detecting the low disk space on the data drive of SQL Server etc. I have implemented the low disk space alert on my Servers using this script.

Thanks Mangal Pardeshi

Sunday, November 1, 2015

Finding the row count of a table is probably the most frequently performed task by SQL developers/programmers. So the obvious thing we all look for is: what is the fastest way to find the row count of a table? Of course there are few ways, I’ll discuss them in this post. However, if you ask me the same question my immediate response without blinking an eye will be using built-in stored procedure SP_SPACEUSED. It is very easy to use with simple syntax: -------------------------------------------------------------------------------------------------------- sp_spaceused your_table_name -------------------------------------------------------------------------------------------------------- and within a second you will get the rowcount of the given table, doesn't matter how big the table is. To know more about the SP_SPACEUSED please see: sp_spaceused (Transact-SQL)

Remarks:

Now there are certain things you need to keep in mind while using sp_spaceused: 1. sp_spaceused relies on DMV sys.dm_db_partition_stats to get the rowcount. So there is a chance that you will get the approximate row count if statistics are not updated. To get the exact count you can update the statistics using DBCC UPDATEUSAGE . However this can take long time if the table is really huge.

2. If the table is part of the default dbo schema then you don’t need to pass the schema name in the syntax. However if the table is created under some different or custom schema then you need to pass the schema name also. But if you try to execute sp_spaceused schema_name.your_table_name you will get the error: Msg 102, Level 15, State 1, Line 1 Incorrect syntax near '.'.

This become even more difficult if schema name has your domain name as well, e.g. your table name is domain\user_name.table_name and you try to usesp_spaceused domain\user_name.table_name you will get the error: Msg 102, Level 15, State 1, Line 1 Incorrect syntax near '\'.

In that case you just need to pass your table name within the single quotes: -------------------------------------------------------------------------------------------------------- sp_spaceused'schema_name.your_table_name'go-- orsp_spaceused'domain\schema_name.your_table_name'--------------------------------------------------------------------------------------------------------

Now coming to back some positives: This is my favorite method despite the fact that sp_spaceused sometimes can give me approximate row count. First reason I like sp_spaceused is, it is extremely fast, syntax is simple, I don’t need to remember some DMV names. Another reason is, I can use it even when some DML operations are being performed on the table. For example you are inserting large number of rows in a table, query is still executing and you want to know how many rows are inserted in the table. sp_spaceused can you give you the row count of a table even middle of a INSERT query. Very very useful in tracking the progress of big INSERT query.

Now having said that row count we get from sp_spaceused or the DMV sys.dm_db_partition_stats is not always accurate, personally I have never observed any difference. Even if there is a difference I guess it is negligible.

Now quickly discuss the couple of other methods to find the row count of a table.

Using DMV sys.dm_db_partition_stats:

This is another very quick method to get the rowcount of a table. -------------------------------------------------------------------------------------------------------- SELECTSUM (row_count) FROMsys.dm_db_partition_statsWHEREobject_id=OBJECT_ID('table_name') AND index_id < 2; -------------------------------------------------------------------------------------------------------- Interesting thing is even sp_spaceused relies on the same table to get the row_count. But of course writing and remembering sp_spaceused table_name is much easier than this, right? You can also get the row count using sysindexes, but I’m skipping that.

Using COUNT function:

Now this is the most reliable, known but also a slow method to find the row count of a table, especially if the table size is in millions or billions. Actually I don’t need to talk about the COUNT function, because everyone knows about it, but I’m mentioning it in this post because of a myth. There is some misconception going around that if you write COUNT(1) instead of COUNT ( * ) you will get faster result. That is a total myth. Many experts have tested it, written about it and you can find many articles online comparing the performance of COUNT(1) and COUNT ( * ) and concluding that there is absolutely no performance difference between them. Here is one such article by Beaulin Twinkle: SQL Server: count(*) or count (1) or count(”) Which is better?

Thanks Mangal Pardeshi

Monday, October 26, 2015

Recently somebody asked this question on Stackoverflow forums “How many tables we can create on a SQL Server database”. And I thought I should write about it on my blog. Though for normal SQL programmers this shouldn’t matter and mostly some weird people ask this question in interviews, and I don’t know why.

Anyways, since we are talking about it let me answer this question. Actually in SQL Server, Microsoft has not defined any limit on number of tables but there is a limit on the number of Objects in a database.

According to Maximum Capacity Specifications for SQL ServerTables per database are limited by number of objects in a database. Database objects include objects such as tables, views, stored procedures, user-defined functions, triggers, rules, defaults, and constraints. The sum of the number of all objects in a database cannot exceed 2,147,483,647.

So if there are NO other objects created in a database you can create 2,147,483,647 number of tables in SQL Server database.

And if I’m I’m right then I guess this limit is pretty much constant since SQL Server 2005 to SQL Server 2014.

Thanks Mangal Pardeshi

Saturday, October 24, 2015

Back in 2009 I wrote this post about How to Calculate the Running Totals in SQL Server. Back then(before SQL Server 2012) this was one of the “difficult” queries to write. We used to do that by writing a co-related sub-query or some time even Cursor was used for calculating running totals.

Then Microsoft came up with this much awaited enhancement in OVER() clause in SQL Server 2012. Now the OVER() is extended fully to aggregate functions like SUM, COUNT and AVG. In simple words now we can write ORDER BY in OVER() clause for aggregate functions as well. And this feature has made running total calculations extremely easy and efficient.

On the same line with my previous post, the different scenarios I’ll be covering in this post are -

-------------------------------------------------------------------------------------------------------- This is how our sample data looks like:

Case I : Running Total For all records. Now this is the simplest case where I’ll show you how to calculate the running totals for all the records in the order of ShipDate. If you remember old ways then we needed to write some weird looking co-related sub query to achieve that. But this can be easily achieved now by simply SUM(Quantity) OVER(ORDER BY ShipDate). Note: I’m assuming you know the basics of OVER() clause.

-------------------------------------------------------------------------------------------------------- SELECT * ,SUM(Quantity) OVER(ORDER BY ShipDate) AS RunningTotal FROM #SALES ORDER BY ShipDate -------------------------------------------------------------------------------------------------------- Here is the result returned by above query :

Case II: Running Total on each Group of Records. Now instead of calculating the running total for each record we will calculate the running total for each OrderId. To do that we just need to ad PARTITION BY OrderID in the OVER clause.

-------------------------------------------------------------------------------------------------------- SELECT * ,SUM(Quantity) OVER(PARTITION BY OrderId ORDER BY ShipDate) AS RunningTotal FROM #SALES ORDER BY OrderId, ShipDate -------------------------------------------------------------------------------------------------------- Output:

Case III: Running Total on each Date. Now we will see how to calculate the Running Total for each day. For that of course we need to put ShipDate in the PARTITION BY but also need to remove the time portion from the date. We can do that by simply converting ShipDate to DATE format. See below query:

-------------------------------------------------------------------------------------------------------- SELECT * ,SUM(Quantity) OVER(PARTITION BYCONVERT(DATE, ShipDate) ORDER BY ShipDate) AS RunningTotal FROM #SALES ORDER BY ShipDate -------------------------------------------------------------------------------------------------------- Output:

I hope you like this. Do give me your feedback.

Thanks Mangal Pardeshi

Wednesday, July 22, 2009

Many times you may have heard this “Use UNION ALL over UNION whenever possible.” The question arises - why? To answer this question in one statement - UNION ALL performs faster compare to UNION. Then again question arises - Why UNION ALL performs faster? Also - Why whenever possible, why not always?

Let me answer the 2nd question 1st – Though both UNION and UNION ALL combines the results of two or more queries into a single result set, there is fundamental difference between these two. UNION returns only DISTINCT result set, while UNION ALL returns basically all rows available, this includes duplicates.

Note that id=2 and names=Ram is there in both the tables. That will help us in understanding the difference between UNION and UNION ALL. Now lets execute the following 2 queries, 1st is with UNION and 2nd is with UNION ALL. –

-- with UNION SELECT id, Names FROM A UNIONSELECT id, Names FROM B GO -- with UNION ALLSELECT id, Names FROM A UNION ALL SELECT id, Names FROM B GO

The result:

Observations :1. 1st query with UNION returns 5 rows, and UNION ALL query returns 6 rows.2.Row for ID=2(for Ram) appears twice in UNION ALL result set.3. Result set for UNION is sorted on ID column. For UNION ALL all the rows of table A appeared 1st followed by rows of table B(no sort).

As you can see, UNION eliminates any duplicate rows from final result set while UNION ALL returns basically all rows available including duplicates. That is the cause of UNION being slow. For each row UNION operator checks whether the entire row exists in previous rows or not. And for making this validation UNION by default 1st sort the result set on the 1st available column of the result set. In our example UNION has sorted the result set on ID column even though I haven’t specified any ORDER BY clause. If you see Name “Sham” (which is in table A) appeared last in the UNION result because it has the highest id 5 while it appeared on 2nd row of UNION ALL result. A look at the query execution plan can help you visualizing it better :

As you can see cost of the UNION query is 73% compare to 27% for UNION ALL. And measure reason being the “Distinct Sort” that UNION operator performs to sort and to eliminate the duplicate rows. While UNION ALL doesn’t really bother about sort and duplicates. And that is why UNION is slow compare to UNION ALL.

So again going back to question – why not use UNION ALL always? And one more question to be added - when to use which one?

- You should use UNION when you don’t want the duplicates in your final result set, and you are not sure (or may be you are sure) that duplicate records exists in the different queries involved in the UNION.

- You should be using UNION ALL when :1. You are not bothered about the duplicate rows in the result. 2. You are sure there are no duplicates in different queries involved in UNION. e.g. if you are combining results from 2 or more different years(sales orders) with each query reruns result for individual year with some unique id for each row. Or combining result for 2 or more different departments.

All this long I’m talking about UNION and UNION ALL as if they are 2 different things all together. Are they? Not exactly. Reason I’m saying this because, when one of my friend asked me about UNION ALL and I advised him to look into the Books online, and he came back complaining me that “books online doesn’t say anything about UNION ALL”. Reason – he was thinking that Books online must be having some separate section dedicated to UNION ALL, as if it is different from UNION.

Friday, July 3, 2009

I know this has been done many times, but still here is something from my side. What is the difference between DELETE and TRUNCATE?Well one reason I wanted to write this post was, on so many of the blogs, forum threads I keep seeing statements like “TRUNCATE cannot be rolled back”. This is also one of the most frequently asked questions in Interviews. And since my target audience is people who are just started learning SQL (and NOT SQL Experts), I thought I should write this one. Instead of just focusing on ROLLBCAK I will try to cover all the differences between DELETE and TRUNCATE.

Remove Data : First thing first, both can remove the data from a table. But a DELETE can be used, to remove the rows not only from a Table but also from a VIEW or the result of an OPENROWSET or OPENQUERY subject to provider capabilities.

FROM Clause : With DELETE you can also delete rows from one table/view/rowset_function_limited based on rows from another table by using another FROM clause. In that FROM clause you can also write normal JOIN conditions. Actually you can create a DELETE statement from a SELECT statement that doesn’t contain any aggregate functions by replacing SELECT with DELETE and removing column names. With TRUNCATE you can’t do that.

WHERE : A TRUNCATE cannot have WHERE Conditions, but a DELETE can. That means with TRUNCATE you can’t delete a specific row or specific group of rows. TRUNCATE TABLE is similar to the DELETE statement with no WHERE clause.

Performance : TRUNCATE TABLE is faster and uses fewer system and transaction log resources. And one of the reason is locks used by either statements. The DELETE statement is executed using a row lock, each row in the table is locked for deletion. TRUNCATE TABLE always locks the table and page but not each row.

Transaction log : DELETE statement removes rows one at a time and makes individual entries in the transaction log for each row. TRUNCATE TABLE removes the data by deallocating the data pages used to store the table data and records only the page deallocations in the transaction log.

Pages : After a DELETE statement is executed, the table can still contain empty pages. TRUNCATE removes the data by deallocating the data pages used to store the table data.

Trigger : TRUNCATE does not activate the delete triggers on the table. So you must be very careful while using TRUNCATE. One should never use a TRUNCATE if delete Trigger is defined on the table to do some automatic cleanup or logging action when rows are deleted.

Identity Column : With TRUNCATE if the table contains an identity column, the counter for that column is reset to the seed value defined for the column. If no seed was defined, the default value 1 is used. DELETE doesn’t reset the identity counter. So if you want to retain the identity counter, use DELETE instead.

Replication : DELETE can be used against table used in transactional replication or merge replication. While TRUNCATE cannot be used against the tables involved in transactional replication or merge replication.

Rollback : DELETE statement can be rolled back. TRUNCATE can also be rolled back provided it is enclosed in a TRANSACTION block and session is not closed. Once session is closed you won't be able to Rollback TRUNCATE.

Restrictions : The DELETE statement may fail if it violates a trigger or tries to remove a row referenced by data in another table with a FOREIGN KEY constraint. If the DELETE removes multiple rows, and any one of the removed rows violates a trigger or constraint, the statement is canceled, an error is returned, and no rows are removed. And if DELETE is used against View, that View must be an Updatable view. TRUNCATE cannot be used against the table used in Indexed view. TRUNCATE cannot be used against the table referenced by a FOREIGN KEY constraint, unless a table that has a foreign key that references itself.

Lets see How to create a linked server with Excel 2007. Well most of the stuff you'll find on the net on how to create a linked Server is using Microsoft Jet 4.0 OLE DB Provider which is no longer able to connect to Excel 2007.

Now you can query your Excel just like any table - SELECT * FROM XLFile...[sheet1$]

Or if you don't want to create a Linked Server, and just want to execute the Excel then do - SELECT * FROM OPENROWSET ( 'Microsoft.ACE.OLEDB.12.0', 'Excel 12.0; Database=E:\Mangal.xls;HDR=No', 'Select * from [Sheet1$]' )And if you execute the above query and you get the error Msg 15281, Level 16, State 1, Line 1 SQL Server blocked access to STATEMENT 'OpenRowset/OpenDatasource' of component 'Ad Hoc Distributed Queries' because this component is turned off. That means first you need to enable the 'Ad Hoc Distributed Queries'.