Search results matching tags 'Cloud' and 'Data'http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&tag=Cloud,Data&orTags=0Search results matching tags 'Cloud' and 'Data'en-USCommunityServer 2.1 SP2 (Build: 61129.1)Creating a Corporate Data Hubhttp://sqlblog.com/blogs/buck_woody/archive/2012/06/26/creating-a-corporate-data-hub.aspxTue, 26 Jun 2012 14:36:41 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:44090BuckWoody<p>The Windows Azure Marketplace has a rich assortment of data and software offerings for you to use – a type of Software as a Service (SaaS) for IT workers, not necessarily for end-users. Among those offerings is the “Data Hub” – a&#160; codename for a project that ironically actually does what the codename says. </p> <p>In many of our organizations, we have multiple data quality issues. Finding data is one problem, but finding it just once is often a bigger problem. Lots of departments and even individuals have stored the same data more than once, and in some cases, made changes to one of the copies. It’s difficult to know which location or version of the data is authoritative.</p> <p>Then there’s the problem of accessing the data. It’s fairly straightforward to publish a database, share or other location internally to store the data. But then you have to figure out who owns it, how it is controlled, and pass out the various connection strings to those who want to use it. And then you need to figure out how to let folks access the internal data externally – bringing up all kinds of security issues. Finally, in many cases our user community wants us to combine data from the internally sources with external data, bringing up the security, strings, and exploration features up all over again.</p> <p>Enter the Data Hub. This is an online offering, where you assign an administrator and data stewards. You import the data into the service, and it’s available to you - and only you and your organization if you wish. </p> <p><img src="http://www.microsoft.com/global/en-us/sqlazurelabs/PublishingImages/datahub-image3-large.jpg" width="447" height="376" /></p> <p>The basic steps for this service are to set up the portal for your company, assign administrators and permissions, and then you assign data areas and import data into them. From there you make them discoverable, and then you have multiple options that you or your users can access that data. You’re then able, if you wish, to combine that data with other data in one location. </p> <p>So how does all that work? What about security? Is it really that easy? And can you really move the data definition off to the Subject Matter Experts (SME’s) that know the particular data stack better than the IT team does?</p> <p>Well, nothing good is easy – but using the Data Hub is actually pretty simple. I’ll give you a link in a moment where you can sign up and try this yourself. Once you sign up, you assign an administrator. From there you’ll create data areas, and then use a simple interface to bring the data in. All of this is done in a portal interface – nothing to install, configure, update or manage. </p> <p>After the data is entered in, and you’ve assigned meta-data to describe it, your users have multiple options to access it. They can simply use the portal – which actually has powerful visualizations you can use on any platform, even mobile phones or tablets. </p> <p><img src="http://i.msdn.microsoft.com/dynimg/IC498608.gif" width="459" height="213" />&#160;</p> <p>&#160;</p> <p>Your users can also hit the data with Excel – which gives them ultimate flexibility for display, all while using an authoritative, single reference for the data. Since the service is online, they can do this wherever they are – given the proper authentication and permissions. You can also hit the service with simple API calls, like this one from C#: <a title="http://msdn.microsoft.com/en-us/library/hh921924" href="http://msdn.microsoft.com/en-us/library/hh921924">http://msdn.microsoft.com/en-us/library/hh921924</a>&#160;</p> <p>You can make HTTP calls instead of code, and the data can even be exposed as an OData Feed. As you can see, there are a lot of options. </p> <p>You can check out the offering here: <a title="http://www.microsoft.com/en-us/sqlazurelabs/labs/data-hub.aspx" href="http://www.microsoft.com/en-us/sqlazurelabs/labs/data-hub.aspx">http://www.microsoft.com/en-us/sqlazurelabs/labs/data-hub.aspx</a> and you can read the documentation here: <a title="http://msdn.microsoft.com/en-us/library/hh921938" href="http://msdn.microsoft.com/en-us/library/hh921938">http://msdn.microsoft.com/en-us/library/hh921938</a></p>Big Data - A Microsoft Tools Approachhttp://sqlblog.com/blogs/buck_woody/archive/2012/02/20/big-data-a-microsoft-tools-approach.aspxMon, 20 Feb 2012 21:16:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:41832BuckWoody<p><em><span style="color:#c0504d;">(As with all of these types of posts, check the date of the latest update I&rsquo;ve made here. Anything older than 6 months is probably out of date, given the speed with which we release new features into Windows and SQL Azure)</span></em></p>
<p>I don&rsquo;t normally like to discuss things in terms of tools. I find that whenever you start with a given tool (or even a tool stack) it&rsquo;s too easy to fit the problem to the tool(s), rather than the other way around as it should be.</p>
<p>That being said, it&rsquo;s often useful to have an example to work through to better understand a concept. But like many ideas in Computer Science, &ldquo;Big Data&rdquo; is too broad a term in use to show a single example that brings out the multiple processes, use-cases and patterns you can use it for.</p>
<p>So we turn to a description of the tools you can use to analyze large data sets. &ldquo;Big Data&rdquo; is a term used lately to describe data sets that have the &ldquo;<a href="http://radar.oreilly.com/2012/01/what-is-big-data.html" target="_blank">Four V&rsquo;s</a>&rdquo;&nbsp; as a characteristic, but I have a simpler definition I like to use:</p>
<p align="center"><em><span style="color:#0000ff;font-size:small;">Big Data involves a data set too large to process in a reasonable period of time</span></em></p>
<p>I realize that&rsquo;s a bit broad, but in my mind it answers the question and is fairly future-proof. The general idea is that you want to analyze some data, and using whatever current methods, storage, compute and so on that you have at hand it doesn&rsquo;t allow you to finish processing it in a time period that you are comfortable with. I&rsquo;ll explain some new tools you can use for this processing.</p>
<p>Yes, this post is Microsoft-centric. There are probably posts from other vendors and open-source that cover this process in the way they best see fit. And of course you can always &ldquo;mix and match&rdquo;, meaning using Microsoft for one or more parts of the process and other vendors or open-source for another. I never advise that you use any one vendor blindly - educate yourself, examine the facts, perform some tests and choose whatever mix of technologies best solves your problem.</p>
<p>At the risk of being vendor-specific, and probably incomplete, I use the following short list of tools Microsoft has for working with &ldquo;Big Data&rdquo;. There is no single package that performs all phases of analysis. These tools are what I use; they should not be taken as a Microsoft authoritative testament to the toolset we&rsquo;ll finalize for a given problem-space. In fact, that&rsquo;s the key: find the problem and then fit the tools to that.</p>
<h2>Process Types</h2>
<p>I break up the analysis of the data into two process types. The first is examining and processing the data <em>in-line</em>, meaning as the data passes through some process. The second is a <em>store-analyze-present</em> process.</p>
<h2>Processing Data In-Line</h2>
<p>Processing data in-line means that the data doesn&rsquo;t have a destination - it remains in the source system. But as it moves from an input or is routed to storage within the source system, various methods are available to examine the data as it passes, and either trigger some action or create some analysis.</p>
<p>You might not think of this as &ldquo;Big Data&rdquo;, but in fact it can be. Organizations have huge amounts of data stored in multiple systems. Many times the data from these systems do not end up in a database for evaluation. There are options, however, to evaluate that data real-time and either act on the data or perhaps copy or stream it to another process for evaluation.</p>
<p>The advantage of an in-stream data analysis is that you don&rsquo;t necessarily have to store the data again to work with it. That&rsquo;s also a disadvantage - depending on how you architect the solution, you might not retain a historical record. One method of dealing with this requirement is to trigger a rollup collection or a more detailed collection based on the event.</p>
<p><strong>StreamInsight </strong>- StreamInsight is Microsoft&rsquo;s &ldquo;Complex Event Processing&rdquo; or CEP engine. This product, hooked into SQL Server 2008R2, has multiple ways of interacting with a data flow. You can create adapters to talk with systems, and then examine the data mid-stream and create triggers to do something with it. You can read more about StreamInsight here: <a title="http://msdn.microsoft.com/en-us/library/ee391416(v=sql.110).aspx" href="http://msdn.microsoft.com/en-us/library/ee391416(v=sql.110).aspx">http://msdn.microsoft.com/en-us/library/ee391416(v=sql.110).aspx</a>&nbsp;</p>
<p><strong>BizTalk </strong>- When there is more latency available between the initiation of the data and its processing, you can use Microsoft BizTalk. This is a message-passing and Service Bus oriented tool, and it can also be used to join system&rsquo;s data together than normally does not have a direct link, for instance a Mainframe system to SQL Server. You can learn more about BizTalk here: <a href="http://www.microsoft.com/biztalk/en/us/overview.aspx">http://www.microsoft.com/biztalk/en/us/overview.aspx</a>&nbsp;</p>
<p><strong>.NET and the Windows Azure Service Bus </strong>- Along the same lines as BizTalk but with a more programming-oriented design are the Windows and Windows Azure Service Bus tools. The Service Bus allows you to pass messages as well, and opens up web interactions and even inter-company routing. BizTalk can do this as well, but the Service Bus tools use an API approach for designing the flow and interfaces you want. The Service Bus offerings are also intended as near real-time, not as a streaming interface. You can learn more about the Windows Azure Service Bus here: <a href="http://www.windowsazure.com/en-us/home/tour/service-bus/">http://www.windowsazure.com/en-us/home/tour/service-bus/</a> and more about the Event Processing side here: <a href="http://msdn.microsoft.com/en-us/magazine/dd569756.aspx">http://msdn.microsoft.com/en-us/magazine/dd569756.aspx</a>&nbsp;</p>
<h2>Store-Analyze-Present</h2>
<p>A more traditional approach with an organization&rsquo;s data is to store the data and analyze it out-of-band. This began with simply running code over a data store, but as locking and blocking became an issue on a file system, Relational Database Management Systems (RDBMs) were created. Over time a distinction was made between data used in an online processing system, meant to be highly available for writing data (OLTP) and systems designed for analytical and reporting purposes (OLAP).</p>
<p>Later the data grew larger than these systems were designed for, primarily due to consistency requirements. In analysis, however, consistency isn&rsquo;t always a requirement, and so file-based systems for that analysis were re-introduced from the Mainframe concepts, with new technology layered in for speed and size.</p>
<p>I normally break up the process of analyzing large data sets into four phases:</p>
<ol>
<li><em>Source and Transfer </em>- Obtaining the data at its source and transferring or loading it into the storage; optionally transforming it along the way</li>
<li><em>Store and Process</em> - Data is stored on some sort of persistence, and in some cases an engine handles the acquisition and placement on persistent storage, as well as retrieval through an interface.</li>
<li>&nbsp;<em>Analysis </em>- A new layer introduced with &ldquo;Big Data&rdquo; is a separate analysis step. This is dependent on the engine or storage methodology, is often programming language or script based, and sometimes re-introduces the analysis back into the data. Some engines and processes combine this function into the previous phase.</li>
<li><em>Presentation</em> - In most cases, the data wants a graphical representation to comprehend, especially in a series or trend analysis. In other cases a simple symbolic representation, similar to the &ldquo;dashboard&rdquo; elements in a Business Intelligence suite. Presentation tools may also have an analysis or refinement capability to allow end-users to work with the data sets. As in the Analysis phase, some methodologies bundle in the Analysis and Presentation phases into one toolset.</li>
</ol>
<h3>Source and Transfer</h3>
<p>You&rsquo;ll notice in this area, along with those that follow, Microsoft is adopting not only its own technologies but those within open-source. This is a positive sign, and means that you will have a best-of-breed, supported set of tools to move the data from one location to another. Traditional file-copy, File Transfer Protocol and more are certainly options, but do not normally deal with moving datasets.</p>
<p>I&rsquo;ve already mentioned the ability of a streaming tool to push data into a store-analyze-present model, so I&rsquo;ll follow up that discussion with the tools that can extract data from one source and place it in another.</p>
<p><strong><span style="color:#800000;">SQL Server Integration Services (SSIS)/SQL Server Bulk Copy Program (BCP)</span> </strong>- SSIS is a SQL Server tool used to move data from one location to another, and optionally perform transform or other processes as it does so. You are not limited to working with SQL Server data - in fact, almost any modern source of data from text to various database platforms is available to move to various systems. It is also extremely fast and has a rich development environment. You can learn more about SSIS here: <a href="http://msdn.microsoft.com/en-us/library/ms141026.aspx">http://msdn.microsoft.com/en-us/library/ms141026.aspx</a> BCP is a tool that has been used with SQL Server data since the first releases; it has multiple sources and destinations as well. It is a command-line utility,and has some limited transform capabilities. You can learn more about BCP here: <a href="http://msdn.microsoft.com/en-us/library/ms162802.aspx">http://msdn.microsoft.com/en-us/library/ms162802.aspx</a>&nbsp;</p>
<p><strong><span style="color:#0000ff;"><span style="color:#800000;">Sqoop</span> </span></strong>- Tied to Microsoft&rsquo;s latest announcements with Hadoop on Windows and Windows Azure, Sqoop is a tool that is used to move data between SQL Server 2008R2 (and higher)&nbsp;and Hadoop, quickly and efficiently. You can read more about that in the Readme file here: <a href="http://www.microsoft.com/download/en/details.aspx?id=27584">http://www.microsoft.com/download/en/details.aspx?id=27584</a>&nbsp;</p>
<p><span style="color:#800000;"><strong>Application Programming Interfaces</strong></span> - API&rsquo;s exist in most every major language that can connect to one data source, access data, optionally transforming it and storing it in another system. Most every dialect of&nbsp; the .NET-based languages contain methods to perform this task.</p>
<h3>Store and Process</h3>
<p>Data at rest is normally used for historical analysis. In some cases this analysis is performed near real-time, and in others historical data is analyzed periodically. Systems that handle data at rest range from simple storage to active management engines.</p>
<p><strong><span style="color:#800000;">SQL Server</span></strong> - Microsoft&rsquo;s flagship RDBMS can indeed store massive amounts of complex data. I am familiar with a two systems in excess of 300 Terabytes of federated data, and the <a href="http://pan-starrs.ifa.hawaii.edu/public/" target="_blank">Pan-Starrs</a> project is designed to handle 1+ Petabyte of data. The theoretical limit of SQL Server DataCenter edition is 540 Petabytes. SQL Server is an engine, so the data access and storage is handled in an abstract layer that also handles concurrency for ACID properties. You can learn more about SQL Server here: <a href="http://www.microsoft.com/sqlserver/en/us/product-info/compare.aspx">http://www.microsoft.com/sqlserver/en/us/product-info/compare.aspx</a>&nbsp;</p>
<p><strong><span style="color:#800000;">SQL Azure Federations</span></strong> - SQL Azure is a database service from Microsoft associated with the Windows Azure platform. Database Servers are multi-tenant, but are shared across a &ldquo;fabric&rdquo; that moves active databases for redundancy and performance. Copies of all databases are kept triple-redundant with a consistent commitment model. Databases are (at this writing - check <a href="http://WindowsAzure.com">http://WindowsAzure.com</a> for the latest) capped at a 150 GB size limit per database. However, Microsoft released a &ldquo;Federation&rdquo; technology, allowing you to query a head node and have the data federated out to multiple databases. This improves both size and performance. You can read more about SQL Azure Federations here: <a href="http://social.technet.microsoft.com/wiki/contents/articles/2281.federations-building-scalable-elastic-and-multi-tenant-database-solutions-with-sql-azure.aspx">http://social.technet.microsoft.com/wiki/contents/articles/2281.federations-building-scalable-elastic-and-multi-tenant-database-solutions-with-sql-azure.aspx</a>&nbsp;</p>
<p><strong><span style="color:#800000;">Analysis Services</span></strong> - The Business Intelligence engine within SQL Server, called Analysis Services, can also handle extremely large data systems. In addition to traditional BI data store layouts (ROLAP, MOLAP and HOLAP), the latest version of SQL Server introduces the Vertipaq column-storage technology allowing more direct access to data and a different level of compression. You can read more about Analysis Services here: <a href="http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/analysis-services.aspx">http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/analysis-services.aspx</a> and more about Vertipaq here: <a href="http://msdn.microsoft.com/en-us/library/hh212945(v=SQL.110).aspx">http://msdn.microsoft.com/en-us/library/hh212945(v=SQL.110).aspx</a></p>
<p><span style="color:#800000;"><strong>Parallel Data Warehouse </strong></span>- The Parallel Data Warehouse (PDW) offering from Microsoft is largely described by the title. Accessed in multiple ways including using Transact-SQL (the Microsoft dialect of the Structured Query Language), <a href="http://sqlpdw.com/2010/07/what-mpp-means-to-sql-server-parallel-data-warehouse/" target="_blank">This is an MPP appliance</a>&nbsp;scaling in parallel to extremely large datasets. It is a hardware and software offering - you can learn more about it here: <a href="http://www.microsoft.com/sqlserver/en/us/solutions-technologies/data-warehousing/pdw.aspx">http://www.microsoft.com/sqlserver/en/us/solutions-technologies/data-warehousing/pdw.aspx</a></p>
<p><strong><span style="color:#800000;">HPC Server</span></strong> - Microsoft&rsquo;s High-Performance Computing version of Windows Server deals not only with large data sets, but with extremely complicated computing requirements. A scale-out architecture and inter-operation with Linux systems, as well as dozens of applications pre-written to work with this server make this a capable &ldquo;Big Data&rdquo; system. It is a mature offering, with a long track record of success in scientific, financial and other areas of data processing. It is available both on premises and in Windows Azure, and also in a hybrid of both models, allowing you to &ldquo;rent&rdquo; a super-computer when needed. You can read more about it here: <a href="http://www.microsoft.com/hpc/en/us/product/cluster-computing.aspx">http://www.microsoft.com/hpc/en/us/product/cluster-computing.aspx</a>&nbsp;</p>
<p><strong><span style="color:#800000;">Hadoop</span></strong> - Pairing up with Hortonworks, Microsoft has released the Hadoop Open-Source system -&nbsp; including HDFS and a Map/Reduce standardized software, Hive and Pig - on Windows and the Windows Azure platform. This is not a customized version; off-the-shelf concepts and queries work well here. You can read more about Hadoop here: <a href="http://hadoop.apache.org/common/docs/current/">http://hadoop.apache.org/common/docs/current/</a> and you can read more about Microsoft&rsquo;s offerings here: <a href="http://hortonworks.com/partners/microsoft/">http://hortonworks.com/partners/microsoft/</a>&nbsp;and here: <a href="http://social.technet.microsoft.com/wiki/contents/articles/6204.hadoop-based-services-for-windows.aspx">http://social.technet.microsoft.com/wiki/contents/articles/6204.hadoop-based-services-for-windows.aspx</a></p>
<p><strong><span style="color:#800000;">Windows and Azure Storage</span></strong> - Although not an engine - other than a triple-redundant, immediately consistent commit - Windows Azure can hold terabytes of information and make it available to everything from the R programming language to the Hadoop offering. Binary storage (Blobs) and Table storage (Key-Value Pair) data can be queried across a distributed environment. You can learn more about Windows Azure storage here: <a href="http://msdn.microsoft.com/en-us/library/windowsazure/gg433040.aspx">http://msdn.microsoft.com/en-us/library/windowsazure/gg433040.aspx</a>&nbsp;</p>
<h3>Analysis</h3>
<p>In a &ldquo;Big Data&rdquo; environment, it&rsquo;s not unusual to have a specialized set of tasks for analyzing and even interpreting the data. This is a new field called &ldquo;data Science&rdquo;, with a requirement not only for computing, but also a heavy emphasis on math.</p>
<p><span style="color:#800000;"><strong>Transact-SQL </strong></span>- T-SQL is the dialect of the Structured Query Language used by Microsoft. It includes not only robust selection, updating and manipulating of data, but also analytical and domain-level interrogation as well. It can be used on SQL Server, PDW and ODBC data sources. You can read more about T-SQL here: <a href="http://msdn.microsoft.com/en-us/library/bb510741.aspx">http://msdn.microsoft.com/en-us/library/bb510741.aspx</a>&nbsp;</p>
<p><strong><span style="color:#800000;">Multidimensional Expressions and Data Analysis Expressions</span></strong> - The MDX and DAX languages allow you to query multidimensional data models that do not fit well with typical two-plane query languages. Pivots, aggregations and more are available within these constructs to query and work with data in Analysis Services. You can read more about MDX here: <a href="http://msdn.microsoft.com/en-us/library/ms145506(v=sql.110).aspx">http://msdn.microsoft.com/en-us/library/ms145506(v=sql.110).aspx</a> and more about DAX here: <a href="http://www.microsoft.com/download/en/details.aspx?id=28572">http://www.microsoft.com/download/en/details.aspx?id=28572</a>&nbsp;</p>
<p><strong><span style="color:#800000;">HPC Jobs and Tasks </span></strong>- Work submitted to the Windows HPC Server has a particular job - essentially a reservation request for resources. Within a job you can submit tasks, such as parametric sweeps and more. You can learn more about Jobs and Tasks here: <a href="http://technet.microsoft.com/en-us/library/cc719020(v=ws.10).aspx">http://technet.microsoft.com/en-us/library/cc719020(v=ws.10).aspx</a>&nbsp;</p>
<p><strong><span style="color:#800000;">HiveQL </span></strong>- HiveQL is the language used to query a Hive object running on Hadoop. You can see a tutorial on that process here: <a href="http://social.technet.microsoft.com/wiki/contents/articles/6628.aspx">http://social.technet.microsoft.com/wiki/contents/articles/6628.aspx</a>&nbsp;</p>
<p><strong><span style="color:#800000;">Piglatin </span></strong>- Piglatin is the submission language for the Pig implementation on Hadoop. An example of that process is here: <a href="http://sqlblog.com/b/avkashchauhan/archive/2012/01/10/running-apache-pig-pig-latin-at-apache-hadoop-on-windows-azure.aspx">http://blogs.msdn.com/b/avkashchauhan/archive/2012/01/10/running-apache-pig-pig-latin-at-apache-hadoop-on-windows-azure.aspx</a>&nbsp;</p>
<p><strong><span style="color:#800000;">Application Programming Interfaces </span></strong>- Almost all of the analysis offerings have associated API&rsquo;s - of special note is Microsoft Research&rsquo;s Infer.NET, a new language construct for framework for running Bayesian inference in graphical models, as well as probabilistic programming. You can read more about Infer.NET here: <a href="http://research.microsoft.com/en-us/um/cambridge/projects/infernet/">http://research.microsoft.com/en-us/um/cambridge/projects/infernet/</a>&nbsp;</p>
<h3>Presentation</h3>
<p>Lots of tools work in presenting the data once you have done the primary analysis. In fact, there&rsquo;s a great video of a comparison of various tools here: <a href="http://msbiacademy.com/Lesson.aspx?id=73">http://msbiacademy.com/Lesson.aspx?id=73</a> Primarily focused on Business Intelligence. That term itself is now not as completely defined, but the tools I&rsquo;ll show below can be used in multiple ways - not just traditional Business Intelligence scenarios. Application Programming Interfaces (API&rsquo;s) can also be used for presentation; but I&rsquo;ll focus here on &ldquo;out of the box&rdquo; tools.</p>
<p><strong><span style="color:#800000;">Excel</span></strong> - Microsoft&rsquo;s Excel can be used not only for single-desk analysis of data sets, but with larger datasets as well. It has interfaces into SQL Server, Analysis Services, can be connected to the PDW, and is a first-class job submission system for the Windows HPC Server. You can watch a video about Excel and big data here: <a href="http://www.microsoft.com/en-us/showcase/details.aspx?uuid=e20b7482-11c9-4965-b8f0-7fb6ac7a769f">http://www.microsoft.com/en-us/showcase/details.aspx?uuid=e20b7482-11c9-4965-b8f0-7fb6ac7a769f</a>&nbsp;and you can also connect Excel to Hadoop: <a href="http://social.technet.microsoft.com/wiki/contents/articles/how-to-connect-excel-to-hadoop-on-azure-via-hiveodbc.aspx">http://social.technet.microsoft.com/wiki/contents/articles/how-to-connect-excel-to-hadoop-on-azure-via-hiveodbc.aspx</a></p>
<p><strong><span style="color:#800000;">Reporting Services</span></strong> - Reporting Services is a SQL Server tool that can query and show data from multiple sources, all at once. It can also be used with Analysis Services. You can read more about Reporting Services here: <a href="http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/reporting-services.aspx">http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/reporting-services.aspx</a>&nbsp;</p>
<p><strong><span style="color:#800000;">Power View</span></strong> - Power View is a &ldquo;Self-Service&rdquo; Business Intelligence reporting tool, which can work with on-premises data in addition to SQL Azure and other data. You can read more about it and see videos of Power View in action here: <a href="http://www.microsoft.com/sqlserver/en/us/future-editions/business-intelligence/SQL-Server-2012-reporting-services.aspx">http://www.microsoft.com/sqlserver/en/us/future-editions/business-intelligence/SQL-Server-2012-reporting-services.aspx</a>&nbsp;</p>
<p><strong><span style="color:#800000;">SharePoint Services -</span></strong> Microsoft has rolled several capable tools in SharePoint as &ldquo;Services&rdquo;. This has the advantage of being able to integrate into the working environment of many companies. You can read more about&nbsp; lots of these reporting and analytic presentation tools here: <a href="http://technet.microsoft.com/en-us/sharepoint/ee692578">http://technet.microsoft.com/en-us/sharepoint/ee692578</a>&nbsp;</p>
<p>This is by no means an exhaustive list - more capabilities are added all the time to Microsoft&rsquo;s products, and things will surely shift and merge as time goes on. Expect today&rsquo;s &ldquo;Big Data&rdquo; to be tomorrow&rsquo;s &ldquo;Laptop Environment&rdquo;.</p>Cloud Computing Patterns: Using Data Transaction Commitment Models for Designhttp://sqlblog.com/blogs/buck_woody/archive/2012/02/14/cloud-computing-patterns-using-data-transaction-commitment-models-for-design.aspxTue, 14 Feb 2012 20:45:47 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:41744BuckWoody<p>There are multiple ways to store data in a cloud provider, specifically around Windows and SQL Azure. As part of a &ldquo;Data First&rdquo; architecture design, one decision vector &ndash; assuming you&rsquo;ve already done a data classification of the elements you want to store &ndash; is to decide the transaction level you need for that datum.&nbsp; Once you&rsquo;ve decided on what level of transactional commitment you need, you can make intelligent decisions about the storage engine, method of access and storage, speed and other requirements.</p>
<p>Although the list below is neither original nor exhaustive, these are the general considerations I use for a given data set. It&rsquo;s important to note that in many on premises systems the engine choice at hand overrides these concerns. If you have a large Relational Database Management System (RDBMS) for instance, you might simply place all data there without further consideration. In a Platform as a Service (PaaS) like Windows and SQL Azure, however, selection of the proper engine for a particular dataset has implications ranging from cost to performance, and selecting the right engine is critical when you want to leverage the data across &ldquo;Bid Data&rdquo; analysis like Hadoop or other constructs.</p>
<p><strong>Monolithic Consistent Transactional</strong><br />The first selection is analogous to a local RDBMS system. The dataset is retrieved in a functionally single, monolithic transaction, i.e. kept together with ACID properties in mind. This is the most reliable type of data design for datasets that require a high degree of safety in the read/write pattern. As an example, a bank ATM transaction should be modeled in a monolithic way. If I make a transfer of funds from one account to another, I want the money to be subtracted from one account if and only if it is successfully added to the other. The bank, on the other hand, wants the money added to the second account if and only if it is subtracted from the first. This is a prime example of a monolithic (single atomic transaction), Consistent (if and only if) and Transactional (as a unit, with provision for roll-back and reporting if unsuccessful) data requirement.</p>
<p>The primary engine used for this type of data is often SQL Azure &ndash; an RDMBS in the same datacenters as Windows Azure. Placing both the calling application, whether that is a Data Access Layer-based code widget or a direct call from a Web or Worker Role, means that data is retrieved quickly and in a monolithic way. The costs for this method is based on overall database size.&nbsp; A consideration is how much data you can store this way. Database sizes have limits, although there are ways of overcoming size issues using technologies such as Sharding or SQL Azure Federations. There is also the consideration of performance. In an RDBMs that conforms to ACID properties, locking and other overhead for safety is at conflict with the highest possible read performance.&nbsp; But in some cases the ACID properties are worth the cost, as in the banking example.</p>
<p>You are not limited to SQL Azure in this model. Windows Azure Table storage, while similar to NoSQL offerings is different in that it is immediately consistent across all three replicated copies of data, offering a higher degree of safety. And while Table storage does not offer built-in support for transactions, there are ways to achieve certain transaction levels.</p>
<p><strong>Monolithic Realtime</strong><br />If consistency can be relaxed &ndash; meaning that a guaranteed read/write patter is not essential &ndash; then more options arise in Windows and SQL Azure. You can still use SQL Azure for this type of storage, with either automatic or programmatic hints allowing for &ldquo;dirty reads&rdquo;. Windows Azure Table storage is still consistent, but the selection of the method for querying the data such as separate copies of read and write data can be employed. Because of the relaxed transaction nature, higher speeds are possible by querying cached or separate datasets.</p>
<p>An example here is that same transaction from the bank, but a statement inquiry. Just after the money is deposited, the user wishes to query the current balance. The current balance &ndash; minus the transaction that just occurred &ndash; is retrieved and shown to the user, perhaps even combining the amount with the latest transaction, perhaps saved as a local cached object, with a caveat to the user.</p>
<p><strong>Distributed Realtime</strong><br />At some point, the data becomes too large to fit inside a single processing session, and parallelism is used. In this case, either separate databases in SQL Azure or Windows Azure Tables, local data storage on the Web or Worker Role, or a combination of all with Caches is the right approach for the data design.</p>
<p>The biggest implication in this type of system is speed &ndash; a higher degree of data separation is essential, and so the dataset selection must fit the pattern. It is unacceptable to force an ACID-properties type workload into this environment. Typical examples here are the actual data asset payload for streaming video or music, read-only documents and so on. This pattern is often separated from the meta-data, which is kept in more of a transactional model.</p>
<p>As an example, assume you log on to a website to watch a movie or listen to music. The provider needs to verify your identity and account balance, which are transactional data loads. After that process is complete, the workload shifts to a copy &ndash; perhaps one of several &ndash; of the asset to stream to your location.</p>
<p>In this case, Windows Azure Blob storage, along with the Content Delivery Network (CDN &ndash; a series of servers closer to the user) is employed along with the transactional realtime requirements for the metadata.</p>
<p><strong>Distributed Eventual</strong><br />At the furthest end of the data scale are large datasets that need deeper analysis, but not necessarily in realtime. Examples here are terrabytes of data requiring a Business Intelligence view, but with a tolerance of a few seconds to minutes or hours. In this case, Storage, Processing and Query methods, such as the Hadoop offering in Windows Azure, or perhaps the High Performance Computing (HPC) Windows Server in Windows Azure fit well.&nbsp; Here, the design of the data is often dictated by the source, and more emphasis is placed on the algorithms around processing and re-assembling the data.</p>
<p>There are, of course, other patterns. In many cases a single dataset may have needs in one or more of these categories &ndash; in fact, sitting at 30,000 feet typing this entry, I&rsquo;m having that very design discussion with a gentleman sitting next to me. The key is to design data-first, and fit the technology to the requirement for each datum. Allow each function and engine to handle the data in the most efficient, effective way for cost, performance and utility.</p>How Microsoft helps you NOT break your Windows Azure Application: Storage Services Versioninghttp://sqlblog.com/blogs/buck_woody/archive/2011/12/06/how-microsoft-helps-you-not-break-your-windows-azure-application-storage-services-versioning.aspxTue, 06 Dec 2011 14:42:57 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:40161BuckWoody<p><font size="2">One of the advantages of using Windows Azure to run your code is that you don’t have to constantly manage upgrades on your platform. While that’s a big advantage indeed, it immediately brings up the question - how do the upgrades happen? Microsoft upgrades the Azure platform in periodic increments, and the components that are affected are documented. </font></p> <p><font size="2">This brings up another question - upgrades mean change, and change can sometimes alter the way you might implement a feature. What if you have taken a dependency on some feature in your code that has been altered by an upgrade? Windows Azure does have an Application Lifecycle Management (ALM) Process, which I’ll reference at the end of this post. But beyond that, there are some features we’ve put into place that will help you manage many of these changes. One of those is being able to set the version of storage features you would like your code to use. </font></p> <p><font size="2">Windows Azure is made up of three main component areas: Computing, Storage and a group of features called the Application Fabric. You can use these components together or separately, depending on what you would like your application to do. In this post I’ll deal with the version control in the storage subsystem - in other posts I’ll explain how to track and in some cases control the versions of the other components you work with.</font></p> <p><font size="2">When you send a request to a Windows Azure resource, you’re actually using a <a href="http://en.wikipedia.org/wiki/REST" target="_blank">REST</a> call. That’s a three-part call to the system that has a request (called a URI), a header, and a body of code you want to send. So a typical call, such as to a table, might look like this example, which changes the properties of a Blob: </font></p> <p><font size="2"><strong>URI</strong>: <br /><font color="#0000ff">PUT http://myaccount.table.core.windows.net/?restype=service&amp;comp=properties HTTP/1.1</font></font></p> <p><font size="2"><strong>Header</strong>: <br /><font color="#0000ff"><font style="background-color:#ffff00;">x-ms-version: 2011-08-18</font> <br />x-ms-date: Tue, 30 Aug 2011 04:28:19 GMT <br />Authorization: SharedKey <br />myaccount:Z1lTLDwtq5o1UYQluucdsXk6/iB7YxEu0m6VofAEkUE= <br />Host: myaccount.table.core.windows.net</font></font></p> <p><font size="2"><strong>Body</strong>: <br /><font color="#0000ff">&lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?&gt; <br />&lt;StorageServiceProperties&gt; <br />&#160;&#160;&#160; &lt;Logging&gt; <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;Version&gt;1.0&lt;/Version&gt; <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;Delete&gt;true&lt;/Delete&gt; <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;Read&gt;false&lt;/Read&gt; <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;Write&gt;true&lt;/Write&gt; <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;RetentionPolicy&gt; <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;Enabled&gt;true&lt;/Enabled&gt; <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;Days&gt;7&lt;/Days&gt; <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;/RetentionPolicy&gt; <br />&#160;&#160;&#160; &lt;/Logging&gt; <br />&#160;&#160;&#160; &lt;Metrics&gt; <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;Version&gt;1.0&lt;/Version&gt; <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;Enabled&gt;true&lt;/Enabled&gt; <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;IncludeAPIs&gt;false&lt;/IncludeAPIs&gt; <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;RetentionPolicy&gt; <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;Enabled&gt;true&lt;/Enabled&gt; <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;Days&gt;7&lt;/Days&gt; <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;/RetentionPolicy&gt; <br />&#160;&#160;&#160; &lt;/Metrics&gt; <br />&lt;/StorageServiceProperties&gt; <br /></font></font><font size="2"></font></p> <p><font size="2"><em>(</em><a href="http://msdn.microsoft.com/en-us/library/windowsazure/hh452240.aspx" target="_blank"><em>Source</em></a><em> of this code)</em></font></p> <p><font size="2">You can see that I’ve highlighted a portion of the header block - that’s where you set the version of the Storage Services you would like to use. You can find a list of the <a href="http://msdn.microsoft.com/en-us/library/windowsazure/dd894041.aspx" target="_blank">features introduced in each version here</a>. </font><font size="2">It’s not a requirement of adding that element to the header, but it’s best practices to do so. </font></p> <p><font size="2">You don’t have to use REST calls directly, however. It’s more common to use the API in the Software Development Kit to just change the property in your IDE environment - the setting you’re looking for there is the <a href="http://msdn.microsoft.com/en-us/library/windowsazure/hh343266.aspx">Set Storage Service Properties</a> call. </font></p> <p><font size="2">Interestingly, rather than a breaking change you might run into an unexpected behavior if you are not aware of these parameters. In some code I recently reviewed a newer feature from the storage system failed when it was called. On inspection I found that the developer had used an older codeblock from a previous version of the storage system - he was not aware you can set the version of storage in the call. We changed the header to the latest version, and everything worked as expected. </font></p> <p><font size="2"><strong>References:</strong></font></p> <p><font size="2">The Storage Services Versioning and the changes for each version: </font></p> <p><font size="2"><span style="font-family:'Calibri','sans-serif';font-size:11pt;mso-fareast-font-family:'Times New Roman';mso-bidi-font-family:'Times New Roman';mso-ansi-language:en-us;mso-fareast-language:en-us;mso-bidi-language:ar-sa;"><a href="http://msdn.microsoft.com/en-us/library/windowsazure/dd894041.aspx"><u><font color="#4f81bd" size="2" face="Arial">http://msdn.microsoft.com/en-us/library/windowsazure/dd894041.aspx</font></u></a><font color="#000000" face="Arial"> </font></span></font></p> <p><span style="font-family:'Calibri','sans-serif';font-size:11pt;mso-fareast-font-family:'Times New Roman';mso-bidi-font-family:'Times New Roman';mso-ansi-language:en-us;mso-fareast-language:en-us;mso-bidi-language:ar-sa;"><font color="#000000" size="2" face="Arial">Windows Azure Application Lifecycle Management: </font></span></p> <p><span style="font-family:'Calibri','sans-serif';font-size:11pt;mso-fareast-font-family:'Times New Roman';mso-bidi-font-family:'Times New Roman';mso-ansi-language:en-us;mso-fareast-language:en-us;mso-bidi-language:ar-sa;"><font color="#000000" size="2" face="Arial"><a href="http://msdn.microsoft.com/en-us/library/ff803362.aspx">http://msdn.microsoft.com/en-us/library/ff803362.aspx</a></font></span></p> <p><span style="font-family:'Calibri','sans-serif';font-size:11pt;mso-fareast-font-family:'Times New Roman';mso-bidi-font-family:'Times New Roman';mso-ansi-language:en-us;mso-fareast-language:en-us;mso-bidi-language:ar-sa;"><font color="#000000" size="2" face="Arial"><a href="http://channel9.msdn.com/posts/Windows-Azure-Jump-Start-03-Windows-Azure-Lifecycle-Part-1">http://channel9.msdn.com/posts/Windows-Azure-Jump-Start-03-Windows-Azure-Lifecycle-Part-1</a></font></span></p> <p><span style="font-family:'Calibri','sans-serif';font-size:11pt;mso-fareast-font-family:'Times New Roman';mso-bidi-font-family:'Times New Roman';mso-ansi-language:en-us;mso-fareast-language:en-us;mso-bidi-language:ar-sa;"><a href="http://channel9.msdn.com/Events/TechEd/Australia/Tech-Ed-Australia-2011/COS201">http://channel9.msdn.com/Events/TechEd/Australia/Tech-Ed-Australia-2011/COS201</a>&#160;</span></p> <p><span style="font-family:'Calibri','sans-serif';font-size:11pt;mso-fareast-font-family:'Times New Roman';mso-bidi-font-family:'Times New Roman';mso-ansi-language:en-us;mso-fareast-language:en-us;mso-bidi-language:ar-sa;">&#160;</span></p>Big Data and the Cloud - More Hype or a Real Workload?http://sqlblog.com/blogs/buck_woody/archive/2011/10/18/big-data-and-the-cloud-more-hype-or-a-real-workload.aspxTue, 18 Oct 2011 13:57:36 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:39156BuckWoody<p>Last week Microsoft announced several new offerings for “Big Data” - and since I’m a stickler for definitions, I wanted to make sure I understood what that really means. What is “Big Data”? What size hard drive is that? After all, my laptop has 1TB of storage - is my laptop “Big Data”?</p> <p>There are actually a few definitions for this term, most notably those involving the <a href="http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data" target="_blank">“Four V’s” Volume, Velocity, Variety and Variability</a>. Others <a href="http://nosql.mypopescu.com/post/10120087314/big-data-and-the-4-vs-volume-velocity-variety" target="_blank">disagree with this</a> definition. I tend to try and get things into their simplest form, so I’m using this definition for myself:</p> <p align="center"><font color="#c0504d" size="3">Big data is defined as a <em>large set </em>of <em>computationally expensive </em>data that is <em>worked on simultaneously</em>.</font> </p> <p>Let me flesh that out a&#160; little. To be sure, “Big Data” has a larger size than say a few megabytes. The reason this is important is that it takes special hardware to be able to move large sets of data around, store it, process it and so on. (<font color="#c0504d">large set</font>)</p> <p>If you store a LOT of data, but only use a small portion of it at a time, that really isn’t super-hard to do. It’s mainly a storage issue at that point. But, if you do need to work with a large portion of the data at one time, then the memory, CPU and transfer components of the system have to adapt to be responsive - new ways to work with that data (game theory, knot-algorithms, map-reduce, etc.) need to be brought into play. (<font color="#c0504d">computationally expensive</font>)</p> <p>Once that data is loaded into the processing area (memory or whatever other mechanism is used) it must be worked on in parallel to come back in a reasonable time. You have two options here - you can scale the system up with more internal hardware (CPU’s, memory and so on) or you can scale it out to have multiple systems work on it at the same time using paradigms such as map/reduce and so on. Actually, when you lay this out in an architecture diagram, scale up or out doesn’t actually change the logical structure of the process - in scale out the network becomes the bus, and the nodes become more RAM and computing power. Of course, there are changes in code for how you stitch the workload back together. (<font color="#c0504d">worked on simultaneously</font>)</p> <p>So back to the original question. Is Big Data, as I have defined it here, a workload for Windows and SQL Azure? Absolutely! In fact, it’s probably one of the main workloads, and I believe it represents the latest, and perhaps also the earliest frontier of computing. Jim <a href="http://research.microsoft.com/en-us/um/people/gray/" target="_blank">Gray, a former researcher here at Microsoft and a hero of mine, was working on this very topic.</a> I believe as he did - all computing is simply an interface over data. </p> <p>Microsoft has multiple offerings on the topic of Big Data. In posts that follow from myself and my co-workers, we’ll explore when and where you use each one. Whether you are a data professional or a developer, this is the new frontier - <a href="http://www.straightpathsql.com/archives/2011/10/microsoft-loves-your-big-data/" target="_blank">don’t wait to educate yourself</a> on how to leverage Big Data for your organization. </p> <p><strong>Hadoop on Windows Azure and SQL Server&#160; </strong>- Microsoft’s <a href="http://www.hortonworks.com/the-whys-behind-the-microsoft-and-hortonworks-partnership/" target="_blank">partnership to include Hadoop workloads on Windows Azure</a> and <a href="http://www.microsoft.com/download/en/details.aspx?id=27584" target="_blank">SQL Server/Parallel Data Warehouse (PDW)</a></p> <p><strong>LINQ to HPC </strong>- Microsoft’s High-Performance Computing SKU of <a href="http://blogs.technet.com/b/windowshpc/archive/2011/05/20/dryad-becomes-linq-to-hpc.aspx" target="_blank">HPC is now in Azure</a></p> <p><strong>Windows Azure Table Storage </strong>- A <a href="http://msdn.microsoft.com/en-us/library/windowsazure/hh508997.aspx" target="_blank">key/value pair type storage with full partitioning</a> that is immediately consistent, able to handle huge loads of data and works with any REST-compatible language</p> <p>&#160;<strong>Other offerings </strong>- Including the new <a href="http://www.microsoft.com/en-us/sqlazurelabs/default.aspx" target="_blank">Data Explorer</a>, <a href="http://research.microsoft.com/en-us/news/headlines/daytona-071811.aspx" target="_blank">Project Daytona (with a Big Data Toolkit for Scientists and researchers)</a>, <a href="http://www.microsoft.com/sqlserver/en/us/future-editions/SQL-Server-2012-breakthrough-insight.aspx" target="_blank">Power View</a> and more. </p> <p>The era of Big Data is here. And you can use Windows and SQL Azure to bring it to your organization. </p>Rip and Replace or Extend and Embrace?http://sqlblog.com/blogs/buck_woody/archive/2011/09/13/rip-and-replace-or-extend-and-embrace.aspxTue, 13 Sep 2011 11:20:05 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:38437BuckWoody<p>As most of you know, I don&rsquo;t like the term &ldquo;cloud&rdquo; very<br />much. It isn&rsquo;t defined, which means it can be anything. I prefer &ldquo;distributed<br />computing&rdquo;, which is more technically accurate and describes what you&rsquo;re doing<br />in more concrete terms.</p>
<p>So when you think about Windows and SQL Azure, you don&rsquo;t<br />have to think about an entire product &ndash; you can use parts of the system<br />together or independently to accomplish what you need to do. You can use the<br />computing functions, storage, and more and more I see folks leverage the<br />Service Bus to enable current applications to expose things to the web.</p>
<p>And that brings up the point of this post. Once you decide<br />that a distributed architecture works to solve a problem, you&rsquo;re faced with a<br />decision: should you completely re-write your architecture to take advantage of<br />the current systems or should you just fold in new code that makes the data or<br />function available to the web?</p>
<p>Of course, the answer is always &ldquo;it depends&rdquo; on the situation<br />&ndash; and it does. But unless you&rsquo;re fixing a problem with current code, I usually<br />advocate a migration approach. That means at the very least retaining the<br />business logic (again, unless it&rsquo;s not currently working) and as much of the<br />code as you can. In fact, if you follow this paradigm, you&rsquo;re on your way to<br />making a Service Bus out of the functions you currently have. You can expose<br />the results of a system rather than opening the system up. Let&rsquo;s take an<br />example.</p>
<p>Assume for a moment that you have an order-taking system<br />on-premise. That system performs many functions, one of which might creating a<br />Purchase Order. Your system might be enclosed, meaning that it has an<br />application that talks to a middle-tier, and then from there to a database<br />system. A query is generated from a screen, and passed along to eventually<br />compute, store and return a Purchase Order Number, along with other<br />information. Imagine now that you wire up the code not only to return the PO<br />number to the client, but to make that number available on an endpoint &ndash;<br />actually really not that hard to do.</p>
<p>Now you can make that PO number available to the web using<br />Azure. You could restrict who can make that call to the system, or open it up<br />to a broader audience. Or instead of the PO Number, you could make a product<br />list available. And you can go further than that &ndash; EBay, for instance, uses the<br />OData protocol (which is very cool in and of itself) which you can query from<br />the web. You could compare your company&rsquo;s product catalog to what is on EBay,<br />and list the items you have there if there are no competitors in that space.<br />And on and on it goes.</p>
<p>So the point is this &ndash; where you can, retain what works.<br />Fold in systems like Azure where they make sense. Extend and Embrace.</p>Should All Data Be Encrypted By Default?http://sqlblog.com/blogs/buck_woody/archive/2011/08/09/should-all-data-be-encrypted-by-default.aspxTue, 09 Aug 2011 13:45:04 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:37638BuckWoody<p>Recently several IT industry information outlets have reported that there has been a 10-year concentrated, organized effort on breaking through computer security at some of the largest companies in the world. Government sites have also been attacked in multiple countries. Add to this the regular loss of data by banking and other industries, and the fear of “the cloud” as a storage location, and it seems to beg the question asked in the title in this post: “should all data, everywhere, be encrypted by default?” </p> <p>If you’re new to encryption, there’s an excellent video and overview here: <a href="http://blogs.msdn.com/b/plankytronixx/archive/2010/10/23/crypto-primer-understanding-encryption-public-private-key-signatures-and-certificates.aspx">http://blogs.msdn.com/b/plankytronixx/archive/2010/10/23/crypto-primer-understanding-encryption-public-private-key-signatures-and-certificates.aspx</a>&#160;</p> <p>If all data were encrypted, the break-in to websites would still continue, but the value would be lessened for some types of “orthogonal” attacks that only seek the pure stream of data. </p> <p><strong>Data States</strong></p> <p>Computing has two major components - static program elements and data. The program doesn’t change (until it is updated, of course) over the course of a transaction between a user and the ultimate data store. Data is classified as anything that is manipulated by the program. That implies three states of the data interchange: Creation, Transmission, and Storage. In on-premise systems, many times none of these states are encrypted. The entire system from user to data store is viewed as “secure”, which of course evidence has proved it is not. In some cases, even laptops are viewed as part of an on-premise system, and so is left unprotected. If all data were treated as “publicly viewable”, that mindset would lead to encrypting the data at all states, even for on-premise systems.</p> <p><em>Creation</em></p> <p>In this phase, a user, device or other input program creates data to send to the program. This can be entries on a web form, input from a weather sensor, or one service (program) sending information to another service. There are multiple ways to encrypt data at this state, most notably using client-side libraries such as the Windows Crypto API, hardware encryption and others. The reference for the Crypto API is here: <a href="http://msdn.microsoft.com/en-us/library/ms867086.aspx">http://msdn.microsoft.com/en-us/library/ms867086.aspx</a></p> <p><em>Transmission</em></p> <p>After the data is created, it needs to be transmitted to the processing and storage system. the references above explain how to secure the communications channel between the client systems and the various components used within the system. In the case of Windows Azure, the session can be protected with a secure session, and all communications within the Azure datacenters are encrypted. The key is that the transmission of data, regardless of method, should be considered to be “in the clear”, and treated as such. Without the decryption algorithm, it’s much harder to get to the ultimate goal. </p> <p><em>Storage (data at rest) </em></p> <p>It follows that f the data is encrypted at the source, and the decryption method is retained only with the code that processes the data, then the data “at rest” if obtained is less accessible. If the data is not encrypted at the source, then this step should be put into place at a minimum. In many cloud systems, including Windows and SQL Azure, the data is not encrypted at rest. There are various reasons for this, including performance, physical and logical security already in place, and the fact that the encryption process would expose customer data to the provider while it is being encrypted. In this case, the key is to encrypt the data before it is transmitted and stored, so that it is encrypted ahead of time. </p> <p><strong>Considerations</strong></p> <p>Encrypting data is a separate process, and must be factored into the original codebase. This means additional effort, and more CPU power for the encryption process (although many systems have security hardware included which help with this) and of course protecting the keys. If the keys are accessed, the data is considered unencrypted from then on, and all previous encryption with that particular key is now vulnerable. Key rotation and protection is essential. Even so, the benefits of treating all data as being at risk outweighs the efforts.</p> <p>You can learn more about general encryption here: <a href="http://msdn.microsoft.com/en-us/library/aa380255(VS.85).aspx">http://msdn.microsoft.com/en-us/library/aa380255(VS.85).aspx</a></p>Windows Azure Security Reviewhttp://sqlblog.com/blogs/buck_woody/archive/2011/08/02/windows-azure-security-review.aspxTue, 02 Aug 2011 13:24:50 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:37432BuckWoody<p><em><font color="#d19049">Current as of 08/01/2011 - Check the Resources listed below for more up-to-date information on this topic</font></em></p> <p><strong>Background:</strong></p> <p>Security for any computing platform involves three primary areas:</p> <ol> <li><font color="#ff0000">Principals</font> (users or programmatic access to an asset or other program) </li> <li><font color="#ff0000">Securables</font> (objects, data or programs that can be accessed) </li> <li><font color="#ff0000">Channels</font> (methods of access by Principals to Securables) </li> </ol> <p>On-premise systems normally use a central system to control security. In a Windows operating system-based environment, this is <a href="http://technet.microsoft.com/en-us/library/cc758436(WS.10).aspx" target="_blank">often accomplished with Active Directory</a> or other systems that&#160; provide sign-on and user identity information. While other networking security paradigms have different terminology, all involve the three areas defined above. </p> <p>In addition to the names and passwords for a user, Active Directory (like other security mechanisms) store other information about Principals - called <em><a href="http://claimsid.codeplex.com/" target="_blank">Claims</a></em>. These claims can include any custom fields the provider allows. In many networks, these fields are not used heavily, because applications that eventually need to secure the assets they control are not always deployed on the same platforms everywhere. </p> <p>In a single environment, security is often quite simple. A Principal is created such as a user or group, and then the Principal is granted access to a Securable such as a a folder, database or other asset. Permissions or Rights (or both) combine to allow a particular Principal to read, write, delete or edit data, or to access or run a particular program.</p> <p><a href="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/3324.Figure1_5F00_2.png"><img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Figure1" border="0" alt="Figure1" src="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/5140.Figure1_5F00_thumb.png" width="549" height="398" /></a></p> <p><em><font color="#008000">Figure 1 - On-premise security environment example</font></em></p> <p>The simplicity of this arrangement is due to a single, homogenous boundary. Even if more than one location is used, the Principals and Securables are grouped into a single logical boundary that is managed from one location. </p> <p>This background serves as the starting point for the Federating Security topic below.</p> <p><strong>Windows Azure Security Boundaries</strong></p> <p>Windows Azure is a series of resources - servers, data and service buses, in addition to other features. Developers write code, and the deploy that to the Azure environment. </p> <p><a href="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/1665.Figure2a_5F00_2.png"><img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Figure2a" border="0" alt="Figure2a" src="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/3480.Figure2a_5F00_thumb.png" width="702" height="471" /></a></p> <p><em><font color="#008000">Figure 2 - Azure Components</font></em></p> <p>The code or data can be deployed to use one or more of the services. In other words, the <a href="http://www.31a2ba2a-b718-11dc-8314-0800200c9a66.com/2010/12/how-to-combine-worker-and-web-role-in.html" target="_blank">Web Role in Windows Azure might host a simple website</a>, and no other component need be used. </p> <p><a href="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/4073.Figure2_5F00_2.png"><img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Figure2" border="0" alt="Figure2" src="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/1258.Figure2_5F00_thumb.png" width="737" height="252" /></a></p> <p><em><font color="#008000">Figure 3 - Simple Azure Web Role Application - only one feature used</font></em></p> <p>Or, <a href="http://blogs.msdn.com/b/buckwoody/archive/2011/02/22/windows-azure-use-case-hybrid-applications.aspx" target="_blank">a complex mix of Web, Worker and Data Services, along with a Service Bus, RDBS and even on-site systems</a> can be grouped into a much larger program. </p> <p><a href="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/6136.Figure4_5F00_2.png"><img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Figure4" border="0" alt="Figure4" src="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/4863.Figure4_5F00_thumb.png" width="735" height="456" /></a></p> <p><em><font color="#008000">Figure 4 - Complex Windows and SQL Azure Application With Multiple Interactions</font></em></p> <p>For a more basic introduction to Windows and SQL Azure, see this link: <a href="http://channel9.msdn.com/Events/TechEd/Europe/2010/COS322">http://channel9.msdn.com/Events/TechEd/Europe/2010/COS322</a>&#160;</p> <p>Windows Azure, like any web-based property, has three general layers of security:</p> <ol> <li><font color="#ff0000">Physical Access</font> </li> <li><font color="#ff0000">Operating Environment (Including the Operating System itself)</font> </li> <li><font color="#ff0000">Data and Programmatic Security</font> </li> </ol> <p>Each of these layers have additional layers within themselves, and this forms the basis of a secure experience for the end user or program. Some of these layers are the responsibility of Microsoft; others are the responsibility of the architect and developer; others are a joint or shared responsibility of both Microsoft and the client.</p> <p><em><font color="#0000ff">Layer One: Physical Access</font></em></p> <p>The first layer of security within a web property such as Windows or SQL Azure is a secure facility. the following data points are important to understand for the worldwide facilities that host Windows and SQL Azure:</p> <ul> <li>Microsoft Global Foundation Services (GFS) is responsible for the physical security of the datacenters located worldwide for Windows and SQL Azure. Information on Microsoft datacenters can be found here:&#160; <a href="http://www.globalfoundationservices.com/">http://www.globalfoundationservices.com/</a> </li> <li>The address and exact locations facilities are not commonly documented for security reasons. </li> <li>Microsoft runs it’s own data centers and does not contract this function out. </li> <li>The GFS controlled facilities hold an ISO/IEC 27001:2005 certification, and are audited to SAS level II. </li> <li>Standard secure operations protocols are in place, including least-privilege access. </li> </ul> <p><em><font color="#0000ff">Layer Two: Operating Environment</font></em></p> <p>Windows Azure and SQL Azure do not currently hold certifications. Microsoft does not comment on the security certifications being pursued for Windows or SQL Azure. That being said, the Windows Azure environment is based on a modified Windows 2008 R2 Enterprise environment, developed using the Trustworthy Computing Initiative (TCI). </p> <p>The system controlling the host machines and their guest environments that ultimately hold the Web and Worker Roles within Windows Azure is called the Fabric - not to be confused with the Application Fabric feature. The Fabric is not accessible by client code - it controls the inner workings of Windows Azure, including Load-balancing, system restarts, maintenance and monitoring. </p> <p>Within the host machines that house the Web and Worker Roles, special networking constructs broker all conversations between Virtual Machines. Virtual Machines - even ones configured to communicate with each other - move through this network. Direct-machine to machine communication is not allowed, protecting one application from another or one data construct from another.</p> <p><a href="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/8015.Figure5_5F00_2.png"><img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Figure5" border="0" alt="Figure5" src="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/8182.Figure5_5F00_thumb.png" width="720" height="351" /></a></p> <p><em><font color="#008000">Figure 5 - Windows Azure Fabric</font></em></p> <p>Windows and SQL Azure support only TCP-based communications. Ports commonly used are:&#160; </p> <ul> <li>80 - Default public port used for Web Roles - can be enabled/disabled per configuration </li> <li>443 - Default secure port used for Web roles - <a href="http://msdn.microsoft.com/en-us/gg271302" target="_blank">can be enabled/disabled per configuration</a> </li> <li>9350-9353 - These ports are used by the Windows Azure AppFabric service bus bindings. Refer to <a href="http://msdn.microsoft.com/en-us/library/ee732535.aspx">http://msdn.microsoft.com/en-us/library/ee732535.aspx</a> for more details </li> <li>1433 - SQL Azure </li> <li>3389 - This port is used for RDP access to VM-based roles, only if enabled </li> </ul> <p><em><font color="#0000ff">Layer Three: Data and Programmatic Security</font></em></p> <p>All internal access through use of keys only. Without the proper key, code or data will not transfer. Storage Accounts have individual keys, so in this manner different security layers may be applied not only programmatically but at the account layer. </p> <p><a href="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/6840.Figure6_5F00_2.png"><img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Figure6" border="0" alt="Figure6" src="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/4370.Figure6_5F00_thumb.png" width="703" height="290" /></a></p> <p><em><font color="#008000">Figure 6 - Windows Azure communications between components</font></em></p> <p>Calls to Windows Azure are made using standard SOAP, XML or REST-based protocols. The communications channel can be encrypted between the client and Windows Azure or allow it to remain unencrypted based on security needs. </p> <p>SQL Azure uses the standard SQL Server Tabular Data Stream (TDS) protocol, but only allows encrypted communications.</p> <p>Data is unencrypted within Windows Azure Blob or Table Storage - but is only accessible via the key for a storage account. <a href="http://blogs.msdn.com/b/plankytronixx/archive/2010/10/23/crypto-primer-understanding-encryption-public-private-key-signatures-and-certificates.aspx" target="_blank">Data can be encrypted client-side and stored in Windows Azure in an encrypted fashion</a>. Microsoft does not inspect internal data for validity or encryption enforcement.&#160; The key is that the data is client-side encrypted and decrypted.</p> <p><a href="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/8203.Figure7_5F00_2.png"><img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Figure7" border="0" alt="Figure7" src="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/4466.Figure7_5F00_thumb.png" width="702" height="307" /></a></p> <p><em><font color="#008000">Figure 7 - Example data at rest encryption scenario </font></em></p> <p>Alternatively, a hybrid solution can store sensitive data locally and non-sensitive data in Azure Storage. The data can be coalesced at the client level such that the data is never transferred over any channel not owned or controlled by the organization.</p> <p><strong>Federating Security:</strong></p> <p>In the case of a single security boundary for Windows Azure, multiple security options are available. Users can be anonymously authorized, such as in the case of a public website for advertisement or informational purposes. </p> <p>Another option is to create an Internet Information Services (IIS) Internal Security Store. This is not a best-practice (although still possible) approach since the Fabric services within Windows Azure may recycle an instance and the session may sever between a given role and a client. Architecting stateless applications is a preferred approach.</p> <p>Using Claims-Based Authentication is a better solution. In this approach, the Principal is authenticated through a trusted party, such as Active Directory, OpenID, OpenAuthentication, or LiveID. Many web-properties use these methods, such as Microsoft, Google, Yahoo and Facebook to name a few. After authenticating with one of these services, the client is issued Claims using the WS-Federation (WS-Fed) or Security Assertion Markup Language (SAML)&#160; that are passed to Windows Azure. At no time does Windows Azure store, transfer or interrogate the Principal’s security token. Claims can be anything from a group or role membership to location or any other settable attribute. Assets are then secured allowing only the Claim, without regard to the user’s location or access method. In this fashion a single security paradigm covers the Securables, with the Principals being controlled in any number of other mechanisms. This allows single-sign-on and/or federated security access from multiple providers. </p> <p>The simplest mechanism for building this environment is the Access Control Services (ACS) feature found in the Windows Azure Application Fabric component. It is a federated authorization management service that simplifies user access authorization across organizations and ID providers and performs claims transformation to map identities with access levels.</p> <p>ACS can:</p> <ul> <li>Create and manage scopes such as URLs </li> <li>Create and manage claim types </li> <li>Create and manage signing and encryption keys </li> <li>Create and manage rules within an application scope </li> <li>Chain claims rules </li> <li>Manage permissions on scopes or perform delegation </li> </ul> <p><a href="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/2728.Figure8_5F00_2.png"><img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Figure8" border="0" alt="Figure8" src="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/5852.Figure8_5F00_thumb.png" width="693" height="410" /></a></p> <p><em><font color="#008000">Figure 8 - Federated Security Example </font></em></p> <p>Full information on the Access Control Service is available at this link:&#160; <a href="http://social.technet.microsoft.com/wiki/contents/articles/windows-identity-foundation-wif-and-azure-appfabric-access-control-service-acs-survival-guide.aspx?wa=wsignin1.0"><u><font color="#0066cc">http://social.technet.microsoft.com/wiki/contents/articles/windows-identity-foundation-wif-and-azure-appfabric-access-control-service-acs-survival-guide.aspx?wa=wsignin1.0</font></u></a></p> <p>Since the Web and Worker Roles within Windows Azure are designed to be stateless, Microsoft created a Certification Store within the Management area to hold Certificates that can be called from within code. An example of using the Certification Store is here: <a href="http://blogs.msdn.com/b/jnak/archive/2010/01/29/installing-certificates-in-windows-azure-vms.aspx">http://blogs.msdn.com/b/jnak/archive/2010/01/29/installing-certificates-in-windows-azure-vms.aspx</a>&#160;</p> <p><strong>Additional Resources:</strong></p> <p><span style="color:#1f497d;font-size:10pt;"><font face="Calibri">Official, authoritative security resource list: <a href="http://msdn.microsoft.com/en-us/library/ff934690.aspx"><font face="Arial"></font><a href="http://msdn.microsoft.com/en-us/library/ff934690.aspxTechnical">http://msdn.microsoft.com/en-us/library/ff934690.aspx</a></a> <br /></a></font><span style="color:#1f497d;font-size:10pt;"><font face="Calibri">Technical</font> Overview of the Security Features in the Windows Azure Platform: </span><a href="http://www.microsoft.com/online/legal/?langid=en-us&amp;docid=11"><u><font color="#0000ff" face="Calibri">http://www.microsoft.com/online/legal/?langid=en-us&amp;docid=11</font></u></a><font face="Calibri">. <br /></font></span><span style="color:#1f497d;font-size:10pt;"><font face="Calibri">Windows Azure Security Overview: </font><a href="http://www.globalfoundationservices.com/security/documents/WindowsAzureSecurityOverview1_0Aug2010.pdf"><u><font color="#0000ff" face="Calibri">http://www.globalfoundationservices.com/security/documents/WindowsAzureSecurityOverview1_0Aug2010.pdf</font></u></a> <br /></span><span style="color:#1f497d;font-size:10pt;"><font face="Calibri">Windows Azure Privacy: </font><a href="http://www.microsoft.com/online/legal/?langid=en-us&amp;docid=11"><u><font color="#0000ff" face="Calibri">http://www.microsoft.com/online/legal/?langid=en-us&amp;docid=11</font></u></a> <br /></span><span style="color:#1f497d;font-size:10pt;"><font face="Calibri">Securing Microsoft Cloud Infrastructure: </font><a href="http://www.globalfoundationservices.com/security/documents/SecuringtheMSCloudMay09.pdf"><u><font color="#0000ff" face="Calibri">http://www.globalfoundationservices.com/security/documents/SecuringtheMSCloudMay09.pdf</font></u></a><font face="Calibri">. <br /></font></span>A list of other security resources is here: <a href="http://blogs.msdn.com/b/buckwoody/archive/2010/12/07/windows-azure-learning-plan-security.aspx">http://blogs.msdn.com/b/buckwoody/archive/2010/12/07/windows-azure-learning-plan-security.aspx</a>&#160;</p> <p><font color="#0000ff" size="1"><em>Image Attribution: David Pallmann: </em></font><a href="http://davidpallmann.blogspot.com/2011/07/windows-azure-design-patterns-part-1.html"><font color="#0000ff" size="1"><em>http://davidpallmann.blogspot.com/2011/07/windows-azure-design-patterns-part-1.html</em></font></a></p>Cloud Computing and the Importance of Code Diagramshttp://sqlblog.com/blogs/buck_woody/archive/2011/05/03/cloud-computing-and-the-importance-of-code-diagrams.aspxTue, 03 May 2011 13:59:20 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:35407BuckWoody<p>Most mature development shops use various code diagrams to give a symbolic representation of high-level and database code structures. Standards such as <a href="http://www.bpmb.de/images/BPMN2_0_Poster_EN.pdf" target="_blank">Business Process Model Notation</a> (BPMN), <a href="http://www.informit.com/guides/content.aspx?g=sqlserver&amp;seqNum=62" target="_blank">Entity Relationship Diagrams</a> (ERD) and the <a href="http://uml.org/" target="_blank">Unified Modeling Language</a> (UML) are a few I use all the time. </p> <p>In the Distributed Computing (Cloud Computing) paradigm, these three diagrams (or their equivalent) become essential. In the past, I’ve been able to rely on a single architecture where my code will run. I understand the servers, the networking and the path the code takes between the client and the components within that architecture.</p> <p>With Distributed Computing (DC), the architecture changes. In fact, the reason I use the term “Distributed Computing” instead of “Cloud Computing” most often (except in the title of this post, as you can see) is that I feel it’s more technically accurate about how we write code. I don’t view DC coding as an “all or nothing” exercise – I view it as just another option to solve a computing problem. A “hybrid” approach, where I mix in the strengths of a cloud provider is often a great way to leverage the best cost, performance and other advantages of each part of your solution. It can also help keep data secure, provide options for High Availability and Disaster Recovery, and more.</p> <p>To gain these advantages, we have to think more about the components of the application rather than a monolithic stack of components in a single architecture. And that brings us to the title of this post…</p> <p>For us to correctly identify code components, database objects, security paths and other elements, we have to be able to conceptualize them. And that’s where those diagrams come into play. Starting with some sort of business or organizational need, we can use BPMN or UML Actor diagrams to explain what the program needs to do. That helps segregate the security and location requirements. For instance, if&#160; the BPMN shows a data access to Private Information, we can evaluate the need for an on-premise system that is federated to a DC provider. If the business users need global access, we can decide whether to set up a VPN to allow access to an on-premise system or whether a login component can be used on the web.</p> <p>After determining the flow of the program, move on to the data the system will store. In the case of Windows and SQL Azure, there are several options for storing data. In the past, I’ve often selected a single storage type, such as an RDBMS, and stored program data there. Now we can store in multiple formats, in multiple locations and more. The ERD is pivotal, because it defines data types, which can help decisions around where things go. Another important aspect to the data decision which is not covered in an ERD (but perhaps should be) is the estimated size and growth of a datum, since that can also drive the decision on where to put a data component.</p> <p>From there, the UML document helps me understand where each computing element can live. There are strengths for each type of computing, and using the UML diagram I can place each code component in the best environment for speed, security and other considerations.</p> <p>So in the new Distributed Computing world, these graphical documents do much more than just help design the application – they can help define the architecture as well.</p>SQL Azure Use Case: Shared Storage Applicationhttp://sqlblog.com/blogs/buck_woody/archive/2011/04/26/sql-azure-use-case-shared-storage-application.aspxTue, 26 Apr 2011 13:33:50 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:35207BuckWoody<p><span style="font-size:x-small;"><em><span style="font-size:small;">This is one in a series of posts on when and where to use a distributed architecture design in your organization's computing needs. You can find the main post here: </span><a href="http://blogs.msdn.com/b/buckwoody/archive/2011/01/18/windows-azure-and-sql-azure-use-cases.aspx"><span style="font-size:small;"><u><font color="#800080">http://blogs.msdn.com/b/buckwoody/archive/2011/01/18/windows-azure-and-sql-azure-use-cases.aspx</font></u></span></a><span style="font-size:small;"> </span></em></span></p> <p><strong><span style="font-size:small;">Description:</span></strong></p> <p><span style="font-size:small;">On-premise data will be a part of computing for quite some time – perhaps permanently. Bandwidth requirements, security, or even financial considerations for large data sets often dictate that relational (on non-relational) systems will be maintained locally in many organizations, especially in enterprise computing. </span></p> <p><span style="font-size:small;">But distributed data systems are useful in many situations. Organizations may wish to store a portion of data off-site, either for sharing the data with other applications (including web-based applications) or as a supplement to a High-Availability and Disaster Recovery (HADR) strategy.</span></p> <span style="font-size:small;"> <p><strong><span style="font-size:small;">Implementation:</span></strong></p> <p><span style="font-size:small;">SQL Azure can be used to add an additional option to an HADR strategy by copying off portions (or all) of an on-premise database system.</span></p> <p><span style="font-size:small;"><a href="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/3386.sql_2D00_aHADR_5F00_2.png"><img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="sql-aHADR" border="0" alt="sql-aHADR" src="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/4265.sql_2D00_aHADR_5F00_thumb.png" width="298" height="181" /></a></span></p> <p><span style="font-size:small;">In this arrangement, on-premise systems remain as they are. Data is replicated using many technologies, such as SQL Server Integration Services (SSIS), scripts, or Microsoft’s Sync Framework to a SQL Azure database. This data can be kept “cold”, meaning that a manual process is required to bring the data back, or as a “warm” standby using connection string management in the application.</span></p> <p><span style="font-size:small;">Recently we architected a solution where a company kept a rolling two-week window of data replicated to SQL Azure using the <a href="http://msdn.microsoft.com/en-us/sync/default.aspx" target="_blank">Sync Framework</a>. The application, a compiled EXE running on user’s systems, had a “switch connections” button, that allowed the users to take a laptop to another location, select that option, and continue working from anywhere they had Internet connectivity. This required forethought and planning, and did not replace their primary HADR systems, but it did allow them to continue operations in the case of a severe outage at multiple sites. Since they are an emergency services provider, this gave them the highest redundancy.</span></p> <p><span style="font-size:small;">Another option is to amalgamate data from disparate sources. </span></p> <p><span style="font-size:small;"><a href="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/6320.sql_2D00_aHyb_5F00_2.png"><img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="sql-aHyb" border="0" alt="sql-aHyb" src="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/2625.sql_2D00_aHyb_5F00_thumb.png" width="342" height="134" /></a></span></p> <p><span style="font-size:small;">In this arrangement, two or more data services (one of which is SQL Azure) are accessed by a single program. The program queries each system independently, and using LINQ a single query can work across all of the data, assuming there is some sort of natural or artificial “key” that can join the data sets together. The user programs simply view this single data set as a single data source, unaware of the underlying data sets. This allows great flexibility and agility in the downstream program. The upstream data sources can change as long as the elements are kept consistent.</span></p> <p><span style="font-size:small;">There are performance and security implications to amalgamated data systems, but if architected carefully they provide multiple benefits. A few of of these are that other systems can access the individual data sources, reporting is simplified and standardized, and multiple copies of data are eliminated.</span></p> <span style="font-size:small;"> <p><strong><span style="font-size:small;">Resources:</span></strong></p> <p><span style="font-size:small;">You can read more about the Sync Framework and SQL Azure here: <a href="http://social.technet.microsoft.com/wiki/contents/articles/sync-framework-sql-server-to-sql-azure-synchronization.aspx">http://social.technet.microsoft.com/wiki/contents/articles/sync-framework-sql-server-to-sql-azure-synchronization.aspx</a>&#160;</span></p> <p><span style="font-size:small;">If you are new to LINQ, you can find more resources on it here: <a href="http://msdn.microsoft.com/en-us/library/bb308959.aspx">http://msdn.microsoft.com/en-us/library/bb308959.aspx</a>&#160;</span></p> </span></span>