RAVINDRANATHREDDY - MSBI - BLOG

This blog provides MSBI stuff and Interview Questions.
And especially you can get a lot of blog's URL on SSIS,SSAS,SSRS &MOSS From different dedicators on MSBI.If you have any queries/concerns,please feel free to contact me at->nv.ravindranathreddy@gmail.com

MSBI (SSIS/SSRS/SSAS) Online Training

Monday, January 12, 2015

Microsoft introduced some significant enhancements
inSQL Server 2014—especially
with In-Memory OLTP. However, as you might expect after such a short release
cycle, not every subsystem has been updated; there are no major changes to SQL
Server Integration Services (SSIS), SQL Server Replication Services, or SQL
Server Reporting Services (SSRS). Nonetheless, there are plenty of significant
enhancements. Here are 10 new features in SQL Server 2014.

1.
In-Memory OLTP Engine

SQL Server 2014
enables memory optimization of selected tables and stored procedures. The
In-Memory OLTP engine is designed for high concurrency and uses a new
optimistic concurrency control mechanism to eliminate locking delays.
Microsoft states that customers can expect performance to be up to 20 times
better than with SQL Server 2012 when using this new feature. For more
information, check out “Rev Up Application Performance with the In-Memory OLTP Engine.”

2.
AlwaysOn Enhancements

Microsoft has
enhanced AlwaysOn integration by expanding the maximum number of secondary
replicas from four to eight. Readable secondary replicas are now also available
for read workloads, even when the primary replica is unavailable. In addition,
SQL Server 2014 provides the new Add Azure Replica Wizard, which helps you
create asynchronous secondary replicas in Windows Azure.

3.
Buffer Pool Extension

SQL Server 2014
provides a new solid state disk (SSD) integration capability that lets you use
SSDs to expand the SQL Server 2014 Buffer Pool as nonvolatile RAM (NvRAM). With
the new Buffer Pool Extensions feature, you can use SSD drives to expand the
buffer pool in systems that have maxed out their memory. Buffer Pool Extensions
can provide performance gains for read-heavy OLTP workloads.

4.
Updateable Columnstore Indexes

When Microsoft
introduced the columnstore index in SQL Server 2012, it provided improved
performance for data warehousing queries. For some queries, the columnstore
indexes provided a tenfold performance improvement. However, to utilize the
columnstore index, the underlying table had to be read-only. SQL Server 2014
eliminates this restriction with the new updateable Columnstore Index. The SQL
Server 2014 Columnstore Index must use all the columns in the table and can’t
be combined with other indexes.

5.
Storage I/O control

The Resource
Governor lets you limit the amount of CPU and memory that a given workload can
consume. SQL Server 2014 extends the reach of the Resource Governor to manage
storage I/O usage as well. The SQL Server 2014 Resource Governor can limit the
physical I/Os issued for user threads in a given resource pool.

6.
Power View for Multidimensional Models

Power View used to
be limited to tabular data. However, with SQL Server 2014, Power View can now
be used with multidimensional models (OLAP cubes) and can create a variety of
data visualizations including tables, matrices, bubble charts, and geographical
maps. Power View multidimensional models also support queries using Data
Analysis Expressions (DAX).

Power BI for Office
365 is a cloud-based business intelligence (BI) solution that provides data
navigation and visualization capabilities. Power BI for Office 365 includes
Power Query (formerly code-named Data Explorer), Power Map (formerly code-named
GeoFlow), Power Pivot, and Power View. You can learn more about Power BI atMicrosoft’s Power BI for Office 365 site.

8.
SQL Server Data Tools for Business Intelligence

The new SQL Server
Data Tools for BI (SSDT-BI) is used to create SQL Server Analysis Services
(SSAS) models, SSRS reports, and SSIS packages. The new SSDT-BI supports SSAS
and SSRS for SQL Server 2014 and earlier, but SSIS projects are limited to SQL
Server 2014. In the pre-release version of SQL Server 2014, SQL Server Setup
doesn’t install SSDT-BI. Instead, you mustdownload SSDT-BI separatelyfrom
the Microsoft Download Center.

9.
Backup Encryption

One welcome
addition to SQL Server 2014 is the ability to encrypt database backups for
at-rest data protection. SQL Server 2014 supports several encryption
algorithms, including Advanced Encryption Standard (AES) 128, AES 192, AES 256,
and Triple DES. You must use a certificate or an asymmetric key to perform
encryption for SQL Server 2014 backups.

Wednesday, October 8, 2014

I got this question often: which is the best
Data Visualization product? Let’s compare some DV platforms such as Spotfire,
Qlikview, Tableau Software and the Microsoft BI platform (PowerPivot, Excel
2010, SQL Server with its VertiPaq, SSAS, SSRS and SSIS). 2 key factors for
tool selection are

·which makes it easy to
comprehend the data,

·price-performance

Because modern datasets are huge (or growing
very fast!), they are usually best comprehended using Data Visualization (DV)
with an ability to interact with data through visual drill-down capabilities
and dashboards. There is NO a single best visualization product. Each has
its place. For example, Spotfire has best web client and analytical functionality. On the other hand, Qliktech may be the best visualization
product for interactive drill-down capabilities. The Microsoft BI platform provides better price-performance
ratio and good as a backend for DV (especially with release of SQL Server 2012)
or for people who wish to build own (that will be a mistake) DV. Tableau has the best ability to interact with OLAP cubes etc. We put into a summary table the comparison of 4 Platforms
to help you to evaluate DV products, based on your needs.

·Qliktech is a vendor of
Qlikview DV Platform: Qlikview Developer, Qlikview Server, Publisher and Access
Point tools with a leading in-memory columnar database combined with advanced
Visual Drill-Down, Interactive Dashboards and comprehensive set of client
software, running even on SmartPhones.

·Microsoft is a vendor of
the most comprehensive set of BI and DV components, including SQL Server,
Analytical, Reporting and Integration Services (SSAS, SSRS, SSIS), Sharepoint,
Excel 2010 with PowerPivot add-in (Excel is the most popular BI tool,
regardless) and VertiPaq engine. I did not include PerformancePoint
Server: it was discontinued in April 2009 and PerformancePoint Services from
SharePoint I cannot recommend, because I CAN NOT recommend SharePoint.

·DV expert can be a
cost-effective customizer of all of the above with the ability to customize DV
with development components from leading technological vendors like Dundas,
DevExpress etc.

·I did not include Oracle,
because it does not have own DV tool; however Oracle is an OEM partner with
Tableau and resell it as a part of OBIEE toolset.

DV expert can help to select the appropriate
DV approach for your specific application. Additional services
include the following:

·Professional gathering of
business requirements, system analysis of workflow and dataflow, and functional
specifications;

·Design of Custom Data
Visualization and dashboards deployed over the Internet through smart client
and RIA technologies.

Custom DV applications enable the user to
perform visual drill-down, fast searches for outliers in large datasets,
easy-to-use root-cause and business logic analysis, interactive data search,
and visual and predictive analytics. Some factors and/or features are very
important and some I did not mention because they will be mentioned
on other pages and posts of this blog (e,g, see “in-memory” page and DV Tools
pages. I perceive that DV area has 4 super-leaders in this area: Qlikview 10,
Spotfire 3.3, Tableau 6.1 and PowerPivot, but for completeness (because the
real function is to be a “Data Visualizers”), I wish to add Visokio’s
Omniscope. I do not include vendors who are 1-2 generation behind: SAP,
SAS, IBM, Oracle, Microstrategy and I can add to this list a dozen more of
mis-leaders. Many vendors working on some in-memory technology. Tableau 6.X has
now in-memory data engine (64-bit).

Thursday, October 2, 2014

In part 1 we
looked at a method to quantify the work that gets done by SQL Server Analysis
Server and found that the OLE DB provider with a network packet size of 32767
brings best throughput while processing a single partition and maxing out the
contribution per single CPU.

In this 2nd part we will
focus on how to leverage 10 cores or more (64!) and benefit from every of these
CPU’s available in your server while processing multiple partitions in
parallel; hope the tips and approach will help you to test and determine the
maximum processing capacity of the cubes on your SSAS server and process them
as fast as possible!

Quick
Wins

If you have more than 10
cores in your SSAS server the first thing you’ll notice when you start
processing multiple partitions in parallel is that Windows performance counter
‘% Processor time’ of the msmdsrv process is steady at 1000% which means 10
full CPU’s are 100% busy processing. Also the ‘Rows read/sec’ counter will top
and produce a steady flat line similar to the one below at 2 million Rows
read/sec (==200K rows read/sec per CPU):

In our search for maximum
processing performance we will increase the number to reflect the # Cores by
modifying the Data Source Properties. Change the ‘Maximum number of connection’
from 10 into the # Cores in your server. In our test server we have 32 logical-
and 32 Hyperthreaded = 64 cores available.

1) #
Connections

By default each cube will
open up a maximum of 10 connections to a data source. This means that up to 10
partitions are processed at the same time. See picture below: 10x status ‘In
Progress- ’ for the AdventureWorks cubes which is slightly enpanded to span
multiple years:

Just by changing the number
of connections to 64 the processing of 64 partitions in parallel results in an
average throughput of over 5 million Rows read/sec, utilizing 40 cores (yellow
line)

This seems a great number
already but its effective (5 million rows/40 cores =) 125K Rows per core and we
do still see a flat line when looking at the effective throughput; this tells
us that we are hitting the next bottleneck. Also the CPU usage as visible in
Windows Task Manager isn’t at its full capacity yet!

Time to fire up another
Xperf or Kernrate session to dig a bit deeper and zoom into the CPU ticks that
are spend by the data provider:

This shows an almost
identical result as the profiling of a single partition in blog part I.

By profiling around a bit
and checking on both the OLEDB and also some SQL native client sessions
surprisingly you will find that most of the CPU ticks are spend on… data
type conversions.

The other steps make sense
and include lots of data validation; like, while it fetches new rows it checks
for invalid characters etc. before the data gets pushed into an AS buffer. But
the number 1 CPU consumer, CDataSource::DataConvert is an area that we can
optimize!

(To download a local copy
of the symbol files yourselves, just install the Windows Debugger by searching
the net for ‘windbg download’
and run the symchk.exe utility to download all symbols that belong to all
resident processes into the folder c:\websymbols\;

This is an important topic;
if the data types between your data source and the cube don’t match the
transport driver will need a lot of time to do the conversions and this affects
the overall processing capacity; Basically Analysis Server has to wait for the
conversion to complete before it can process the new incoming data and this should
be avoided.

Let’s go over an
AdventureWorksDW2012 Internet_sales partition as example:

By looking at the table or
query that is the source for the partition, we can determine it uses a range
from the FactInternetSales table. But what data types are defined under the
hood?

To get to all data type
information just ‘right click’ on the SSAS Database name and script the entire
DB into a new query Editor Window.

Search through the xml for
the query source name that is used for the partition, like:
msprop:DbTableName="FactInternetSales"

These should match the SQL Server data types; check especially for
unsignedByte, short, String lengths and Doubles (slow) vs floats (fast).
(We do have to warn you about the difference between an exact
data type like Double vs an approximate like Float here).

How can we quickly check
and align the data types best because to go over them all manually one by one
isn’t funny as you probably just found out. By searching the net I ran into a
really nice and useful utility written by John Tunnicliffe called ‘CheckCubeDataTypes’ that does the job for us; it comparesa
cube’s data source view with the data types/sizes of the corresponding
dimensional attribute. (Kudos John!) But unfortunately even after making sure
the datatypes are aligned and running Kernrate again shows that DataConvert is
still the number one consumer of CPU ticks on the SSAS side.

3)
Optimize the data types at the source

To proof that this
conversion is our next bottleneck we can also create a view on the database
source side and explicitly cast all fields to make sure they match the cube
definition. (This will also be an option to test environments where you don’t
own the cube source & databases)

Maybe as best-practice CAST
all columns even if you think the data types are right and exclude also the
ones that are not used for processing the Measure group from the View. (For
example, to process the FactInternetSales Measure Group from the
AdventureWorks2012 DW cube we don’t need [CarrierTrackingNumber],
[SalesOrderNumber], [PromotionKey] and [CustomerPONumber]) ; every bit that we
don’t have push over the wire and process from the database source is a pure
win. Just create a view with the name ‘Speed’ like to give it a try.

(Note: always be careful
when changing data types!

For example, in the
picture above, using the ‘Money’ data type is Okay because it is used
for FactInternetSales, but Money is not a replacement
for all Decimals (as it will only keep 4 digits behind the decimal point and
doesn’t provide the same range) so be careful when casting data types and
double check you don’t lose any data!)

The picture also shows that
one of the physical CPU sockets (look at the 2nd line of 16 cores in Numa Node
1) is completely max’d out:

4) Create
a ‘Static Speed’ View for testing

If you would like to take
the database performance out of the equation something I found useful is to
create a static view in the database with all the values pre-populated this way
there will still be a few logical reads from the database but significant less
physical IO.

Approach:

1) Copy the original query
from the cube:

2) Request just the SELECT
TOP (1):

3) Create a Static view:

Add these values to a view
named ‘Static_Speed’ and cast them all:

4) Create an additional
test partition that queries the new Static_view

5) Copy this test partition
multiple times

Create at least as many
test partitions equal to the number of cores in your server, or more:

Script the test partition
as created in step 4):

Create multiple new
partitions from it by just changing the and ; these
will run the same query using just the static view. This way you can test the
impact of your modifications to the view quickly and at scale!

6) Processing the test
partitions

Process all these newly
created test partitions who will only query the statics view and select
as many of them or more as the number of CPU’s you have available in your SSAS
server.

If you have a spare moment
to check out the workload performance counters of your most demanding cube
servers you may find that there is room for improvement. If you see flat lines
during the Cube processing I hope your eyes will now start to blink; by
increasing the number of connections or checking if you don’t spend your CPU
cycles on data type conversions you may get a similar of over 3x improvement,
like shown in the example above. By looking at the Task Manager CPU utilization
where just one of the NUMA nodes is completely max’d out might indicate its
time to start looking into some of the msmdsrv.ini file settings…