Fuzzy Matching, Similarity Matching or Record Linkage is the most critical and least understood process in any Data Warehouse, BI, Integration, Big Data , Data Quality, MDM , Social Network or integrating crap effort. Over at LinkedIn there is Group run by Henrik Liliendahl Sorensen for Data Matching, Bill Winkler, principal researcher at the US Census" has written a series of white papers on record linkage and particular a technique called "Blocking Indexs". In addition we wil cover William Cohen Research Professor, Machine Learning Department, Carnegie Mellon University white papers an implementations. Presented will be our collection of "real world" examples(Code) and you will leave a master of record linkage and the concepts behind it

Session Conflict!

Mark Primary

Come to this session and learn about why you should always create a data model for enterprise applications. This session will review the benefits and functionality that typically is provided by most data modeling tools. Learn why a data model needs to be more that just a collection of SQL scripts. Many projects try to version their models in a manually manner by generating SQL change scripts and checking them in. This labour intensive process frequently turns out to be not scalable and does not integrate well into the development process. This session will also share a process that was used to version database changes directly from a data model.

Session Conflict!

Mark Primary

Are your databases being backed up? Are you sure? Are you sure they're being backed up correctly? Maybe you're an "accidental DBA" or a developer who's not quite sure of the answers to those questions. In this session, you'll learn what the different types of basic backups are, how to execute them, and (perhaps most importantly) how to restore from them if and when necessary.

Session Conflict!

Mark Primary

So you have built this awesome Power Pivot model and deployed it SharePoint. Now you need additional security or better processing strategies. What do you do? Migrate the model to a SQL Server Analysis Services Tabular Model. In this session, we will explore those features that make the Tabular model more robust than a Power Pivot model. We will walk through migrating and updating a solution to take advantage of security, partitioning, and other Tabular model capabilities.

Session Conflict!

Mark Primary

This presentation will introduce big data analytics for the business analyst with PowerPivot and Power View. It will be of interest to those new to the concept of big data, new to self-service data modelling with PowerPivot, and those interested to understand what is new for PowerPivot and Power View in Excel 2013.
Demonstrations will include creating a big data solution by using HDInsight Server; producing a PowerPivot model based on the big data solution; using PowerPivot to integrate big data with local data; and, analyzing the PowerPivot model data by using Power View.

Session Conflict!

Mark Primary

Learn the patterns for using Azure for Big Data workloads. Learn how to create a cluster, load a cluster, and develop approaches for analysis to use the data. This session will draw on customer implementations, sharing the questions asked and problems solved by taking various implementation approaches.

Session Conflict!

Mark Primary

Is there a great difference in the brain chemistry of someone fleeing a hungry mountain lion and someone presenting to a group of colleagues in a corporate board room? The answer is: NO. Over the past decade, a lot has been learned about the chemistry of the brain and why humans react the way we do to events in our environment. The concept of EQ (Emotional Intelligence) is a compelling and growing concept that applies this knowledge in a set of learnable, improvable skills for leading human beings. While EQ is often applied to corporate leadership, the parallels to presenting are fantastic. This session will explain the basics of EQ and demonstrate how you can apply it to make your presentations better in the following areas: * Crafting better slide decks * Preparing yourself for presenting * Delivering your content * Dealing with the unexpected Understanding and practicing the concepts of EQ can make your presentations a better experience for everyone in the room--including you.

Mark Primary

Do you have dirty data? Most likely yes... This session will take us on the adventure of designing data cleansing solution and implementing it with Data Quality Services and SSIS. We will walk through creation of Knowledge Bases, Domains and Rules. You will learn how to Conform and De-duplicate text data and Enrich it with information from Windows Azure Data Market.

Session Conflict!

Mark Primary

Data must often be relocated to a different SQL server, when consolidating or upgrading servers or databases. This session details experiences from numerous such moves, using a number of different techniques. We talk about why you might do such a thing, which approaches are available (with what trade-offs), and some of the things that sometimes get forgotten. And especially, how to handle SQL-based apps for a move.

Session Conflict!

Mark Primary

Microsoft SQL Server makes it simple to apply data mining algorithms to a wide variety of data. Applying the results to business decisions without a thorough understanding of how the algorithms work is dangerous to the bottom line of the business, though. This session will take one of the algorithms, the Microsoft Clustering Algorithm, and do a deep dive into the mechanics of how it works. The algorithm is valuable for analyzing data in the fields of marketing, social networks and many others. The session will also examine the types of data that are valid for clustering. A demonstration of building a clustering model using SQL Server Analysis Services and viewing the model using the Excel Data Mining Add-In will be given.

Session Conflict!

Mark Primary

Want to off load some of that reporting workload, make an emergency rollback during a production upgrade faster, or make managing your test databases easier? If any of these appeal to you, then you should be familiar with database snapshots within SQL Server. Available since SQL Server 2005, use of this feature can make your life a lot easier.We'll spend the first part of this session taking an in depth look at how database snapshots work. We’ll look at what exactly is happening behind the scenes when you create a snapshot and when you query a snapshot. Once we've covered the basics, we'll spend the rest of our time analyzing and discussing a number of real world use cases where snapshots can be very beneficial and save you a great deal of time.

Mark Primary

This is a lecture and demonstration on how to deploy highly available SQL Server instances in the Amazon EC2 cloud presented by Microsoft Cluster MVP David Bermingham. Starting with preparing the EC2 environment including configuring the VPC, routing and security, Bermingham then continues on and shows you how to configure both AlwaysOn Availability Groups as well as AlwaysOn Failover Clusters for cross availability zone failover.

Session Conflict!

Mark Primary

This session will provide a deeper dive into the art of dimensional modeling. We will look at the different types of fact tables and dimension tables, how and when to use them. We will also some approaches to creating rich hierarchies that make reporting a snap. This session promises to be very interactive and engaging, bring your toughest Dimensional Modeling quandaries.

Session Conflict!

Mark Primary

Sometimes half the battle in computing is just to see what's happening. We will take a visual tour of physical database storage structures, using live demos with the freeware application SQL Server File Layout Viewer and our old friends the DBCC commands as guides. See what happens in a data file when you convert a table from a heap to a clustered index! See fragmentation and the havoc wrought by Shrink! Marvel at the behavior of multiple files and filegroups! Index Rebuilds! Eureka!

Session Conflict!

Mark Primary

Hadoop has long been a technology focused on crunching the world's data on Linux. With the partnership between Microsoft and Hortonworks, Hadoop has been brought to the Windows platform. This talk will focus on the Hadoop components available in the Hortonworks Data Platform for Windows and discuss integrations with the Microsoft technology stack.

Session Conflict!

Mark Primary

Have a large or potentially large database that you would like to partition? Manually implementing partitioning and the corresponding maintenance can be a lot of work, especially if you have many tables. In this presentation we look at how you would implement a partitioning solution dynamically for all tables with the designated partition column. We set it up to keep a FIXED number of file groups. The partition maintenance will recycle the file groups over time as partitions are dropped and created. We do as much as possible through dynamic scripting. This solution can easily be used for archiving but in this example we are simply keeping the most current 6 months of data and dropping the rest. Assumes you have a basic understanding of SQL Server table partitioning.

Session Conflict!

Mark Primary

Microsoft has provided some great tools for creating and editing SSIS packages. First Business Intelligence Development Studio (BIDS) then SQL Server Data Tools (SSDT). However, when creating an SSIS package you frequently repeat the same steps over and over. BIML is an XML markup language designed for representing the structure of an SSIS package. BIML Script takes that structure and generates an SSIS package.In this session we will use BIDS Helper to create a simple SSIS package using BIML. We will handle a couple SSIS scenarios with BIML and BIML Script.

Session Conflict!

Mark Primary

One of a DBA's primary responsibilities is managing the performance of a SQL Server environment. When performance problems arise, DBAs need to have the correct tools in place to be able to dig in and discover the issues that are occurring. Although it's one of the newer tools in the DBA toolbox, Extended Events is one of the most powerful tools available. In this session, we will discuss performance management responsibilities for DBAs and provide a foundation, through Extended Events, to understand and resolve performance issues.

Session Conflict!

Mark Primary

This session explores the seven CUBE functions that are natively available in Excel 2013. Unknown to many business analysts, these useful functions can be used to retrieve data model members and values to create parameter-driven report designs.
The session topics will introduce each of the seven functions. Demonstrations will range from the simple, to the more sophisticated involving dynamic expressions, MDX expressions, integration of data from multiple data models, and macro-driven layouts.
This session is a must for those looking to drive more from Excel when reporting from the BI Semantic Model. Much of the content presented in this session is also applicable to Excel 2007 and Excel 2010.

Session Conflict!

Mark Primary

Cube space; the final frontier. In this Star Trek themed introduction to MDX, we will discuss the fundamentals of cube structure and vocabulary, including tuples, members, sets, hierarchies, and more. We will introduce and demonstrate the basic syntax of MDX with queries that include navigating hierarchies and even some time-based expressions. This session will give you the tools you need to write simple, yet meaningful, MDX queries in your own environment.