Tag: Data Management

SQL Server 2008 | SQL Server 2008 R2 In order to utilize map data in SQL Server the spatial data types and the spatial functions of the SQL Server are important. In my last post I “aggregated” detailed shapes up....weiterlesen

PDW v2 | Big Data In my former post about Big Data, I used a “definition” which can be abbreviated as “data that is too big for analysis within the required time” The key aspects of this phrase are: size....weiterlesen

Recently I found a bug in SQL Server Management Studio 2014. Database user permissions are not granted when creating a new login. We want to add a new login and giving this account, let’s say dbowner permission. In SSMS we....weiterlesen

Last week at customer side we had some problems in SQL Server 2012 Reporting Services with nested multi-value parameter. A report that was working fine in SQL 2008 was deployed to SQL Server 2012 SP1 CU4. When trying to render....weiterlesen

When talking about PDW V2 Performance we have to distinguish between the MPP approach where one process i executed and the Clustered Columnstore Index. Hilmar already explained quite a lot about CCI in PDW but I wan’t to explain in....weiterlesen

Dimensional modeling Tracking historical changes within a dimension is a common task in data warehousing and well covered by Ralph Kimball’s slowly changing dimension (SCD) methods. In short, the SCD methods proposed by Ralph Kimball assume, that the source system....weiterlesen

PDW v1/v2 When connecting to a PDW from PowerPivot, there might be some confusion about what to enter as the server name. Confusion might even start at the very beginning when choosing the proper external data connection. After opening PowerPivot,....weiterlesen

PDW 2012 | SQL Server 2012 For today’s post I didn’t find a really good title. Here’s what this post is about: Sometimes you’ll find event based data in your source system (something happens at a specific point in time)....weiterlesen

PDW v1 | PDW 2012 | APS CTAS (Create Table As Select) is a common way on the Parallel Data Warehouse (PDW) to transform one table into another table for example for calculations or for ELT (Extract Load Transform) processes.....weiterlesen

Historisation of hierarchy structure is not required in all projects especially not from the beginning. But most of time it’s too late to implement it afterwards because source systems do not have any scd2 information about the structure of....weiterlesen

PDW v1/v2 For large tables usually we’re looking for tuning options like creating an index or having a good partition strategy in place. For the Parallel Data Warehouse (PDW) additional decisions have to be made for the table layout. Distributed....weiterlesen

It has been some time that I blogged and the reason is simple: there was too much going on. If you follow the innovations of the last years one can feel that the Singularity is near – as Kurzweil....weiterlesen

Hive 0.1x Apache Hive doesn’t have support for geo spatial functions out of the box. However, it’s easy to extend the functionality by using external libraries, that can be called via user defined functions (UDFs). This post shows how to....weiterlesen

Apache Hive 0.11+ in this post I’d like to compare the different file formats for Hive, as well as the different execution times for queries depending on the file format, compression and execution engine. As for the data, I’m using....weiterlesen

One of the first things in any PDW Project, is the migration of the existing data into the PDW. This could be necessary for POCs, Development Issues or the final move towards the new production system. Since the PDW is....weiterlesen

Sometime in april, Microsoft has decided to rebrand its big data appliance and rename the former Parallel Data Warehouse (PDW) to Analytics Plattform System (APS). Depending on who you ask, they are either more or less happy with yet another....weiterlesen

It is a usual situation when your cube users ask for a new attribute in a cube dimension. You will add the attribute in your project and deploy the changes which in turn get the related measure groups unprocessed.....weiterlesen

In our recent post “MSAS: XSLT for cube management” we presented a template for the cube management infrastructure based on XSL transformations. In the first feedback people were asking about modifications that allow producing several <Parallel> blocks in single <Batch>....weiterlesen

Since the PDW is a fairly new product, at some points you have to help the optimizer to build better execution plans. Within this article we will learn how to read the explain plan and give query hints to speed....weiterlesen

During processing of our AS Cubes we found ourselves with a strange behavior we couldn’t explain. In order to get better performance with split the fact partition into 8 partitions. The first division is done over contract ID with modulo. The....weiterlesen

Microsoft has brought with PDW a great appliance for MPP architectures. In order to get all of this to work, there are a lot of different hardware and software components involved. This article is the foundation for upcoming articles which....weiterlesen

Currently we are doing a migration of a data warehouse to PDW at one of our customers. Here are some important features of the BI-environment to give you the understanding of the project: Data in DWH (MS SQL Server) +....weiterlesen

When one of our clients decided to get a PDW, the newly gained power we got for providing our cubes with data was one of the main reasons. Our Business Warehouse is based on a regular star schema with about....weiterlesen

SQL Server 2016 One of the exciting new features in SQL Server 2016 that is already available in the public preview version (CTP 2) is Polybase, the transparent access bridge from SQL to Hadoop. This functionality has been available in....weiterlesen

Power Query Preview| Hadoop In order to make the common task of data acquisition and preparation available to end users, Microsoft developed a very powerful enhancement for Excel: Power Query. Actually Power Query is part of the new Power BI....weiterlesen

Power Query A powerful function in Power Query is to unpivot a given data set which means to rotate data in columns to rows. This is useful for a lot of statistical data sets that you will find on the....weiterlesen

PDW 2012 | APS 2012 In part 1 of this topic, I showed how basic insert operations in a CCI table are handled internally. As we saw, inserting of large batches causes the CCI to compress the data immediately (apart....weiterlesen

APS/PDW 2012 This post is about table partitioning on the Parallel Data Warehouse (PDW). The topic itself is actually quite simple but there are some differences between the SMP SQL Server compared to the Parallel Data Warehouse. On the SMP....weiterlesen

SQL Server 2012 | SQL Server 2014 Microsoft’s Analytics Platform System (APS) offers built in transparent access to Hadoop data sources through the Polybase technology. This includes bidirectional access not only to Hadoop but also to Cloud services. The SMP....weiterlesen

SQL Server 2012 This post is about the SSIS Data Streaming Destination. I must admit I wasn’t aware that it existed until I saw it in a presentation some weeks ago. Reason enough to write a short blog post about....weiterlesen

PDW 2012 | SQL Server 2012 Removing duplicate rows in large tables can be a challenging operation. As long as the table does not contain too many rows, several approaches will lead to the desired result. Recently, I had a....weiterlesen

Today I’d like to take a look at some design principles and extraction techniques when working with SAP as data source for your Data Warehouse. Layer Architecture In every book about Data Warehouse design you’ll find some sort of layer....weiterlesen

Controlling and Logging Daily Delta Loads from SAP For most of your daily loads of master data tables the best solution is a full load by DeltaQ extraction (see preceeding post). In case something goes wrong in your ETL and....weiterlesen

Parallel DeltaQ Loads and Processing of Delta Records Parallel Data Loads in SSIS with Xtract IS DeltaQ When loading SAP datasources with DeltaQ full load into your Data Warehouse you will most likely end up with 100s of tables that....weiterlesen

Visual AdHoc Analysis with Microsoft Power BI on SAP Suppose you want to analyze your SAP data flexible and without having to load it into your Data Warehouse. Maybe you need a very quick solution or this kind of analysis....weiterlesen

Microsoft BI and/or SAP BI? The previous posts in this small series have been very technical and showed, that there are proven ways to integrate data from SAP Systems into the Microsoft BI world. SAP and Microsoft work on these....weiterlesen

During my work with the PDW there were plenty of times where I needed some information and out of that the following query set was created. It is a useful compilation of different queries, which should help you with your....weiterlesen

Recently I had the problem that the SQL Server which I mainly use is not allowed to connect to an external server from which I have to load data that I then transform and load into a cube. But fortunately....weiterlesen

SQL Server 2012 One of the improvements with SQL Server 2012 are the advanced window functions which allow us for example to refer to a record before or after the current record in a result set. Since databases are getting....weiterlesen

SQL Server 2012 In my last post I explained some of the capabilities of the SQL Server 2012 window functions. Window function allow us to do computations involving other rows in a result set. Much like in my last post,....weiterlesen

SQL Server 2012 | SQL Server 2014 | PDW/APS 2012 Recently we needed to calculate something like a ‘last non empty’ value in a T-SQL query. This blog post is about the solution we ended up with as an alternative....weiterlesen

One of the great benefits of the PDW is the parallel execution on each compute node. With the right distribution and a matching query PDW can execute the query and return the data right away. But there might be some....weiterlesen

One of the most important services within the PDW is the Data Movement Service called DMS. It is part of the control node as well as every compute node. The name itself already gives a hint of what it does,....weiterlesen

PDW v1/v2 Did I say cursor? Isn’t this an evil word? Shouldn’t we try as hard as possible to avoid them in database design and especially in a data warehouse? Yes, sure. But there might be some patterns which make....weiterlesen

In this article I want to show some features about the Group By clause and the Row Number window function that you can use in SQL statements. There are many situations where you want a unique list of items. But....weiterlesen

Hadoop | HDInsight | Power Query Preview For today’s post, I’m showing a scenario where I want to load some text files from my Hadoop cluster into a single Excel Power Pivot table. For this purpose I created some files....weiterlesen

Hadoop | HDInsight | Power Query Preview As in my previous post I’d like to import a bunch of csv-files from a Hadoop cluster into a single Power Pivot table using Power Query. In the last post I used....weiterlesen

One of the most important things for a SQL Server, and this also includes the APS are up-to-date statistics. I would probably take it one step further and say that 80-90% of query execution problems can be solved by creating....weiterlesen

PDW v1/v2 In the context of more and more data and the need of being able to analyze this data, you might also have stumbled over the MPP approaches for large data warehouses. MPP stand for massively parallel processing in....weiterlesen

With APS V2 Microsoft implemented a feature called workload management. The whole purpose of this functionality is to assign a specific amount of memory and there a certain amount of concurrent slots to logins. Whenever this particular login connects to....weiterlesen