Search results matching tag 'sql server'http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&tag=sql+server&orTags=0Search results matching tag 'sql server'en-USCommunityServer 2.1 SP2 (Build: 61129.1)Data Mining Algorithms – Pluralsight Coursehttp://sqlblog.com/blogs/dejan_sarka/archive/2015/07/30/data-mining-algorithms-pluralsight-course.aspxThu, 30 Jul 2015 07:00:17 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:59233Dejan Sarka<p>This is a bit different post in the series about the data mining and machine learning algorithms. This time I am honored and humbled to announce that my fourth Pluralsight course is alive. This is the <a href="http://www.pluralsight.com/courses/data-mining-algorithms-ssas-excel-r">Data Mining Algorithms in SSAS, Excel, and R</a> course. besides explaining the algorithms, I also show demos in different products. This gives you even better understanding than just reading the blog posts.</p> <p>Of course, I will continue with describing the algorithms here as well.</p>SQL Server 2016 Video Pillshttp://sqlblog.com/blogs/sergio_govoni/archive/2015/07/28/sql-server-2016-video-pills.aspxTue, 28 Jul 2015 20:04:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:59222Sergio Govoni<p>This is my first&nbsp;curation since Microsoft <span>Curah! has been moving to a new Microsoft platform named <a href="https://docs.com/en-us" target="_blank" mce_href="https://docs.com/en-us">docs.com</a></span>.</p><p><a href="https://docs.com/sergio-govoni/8292/sql-server-2016-video-pills-eng" target="_blank" mce_href="https://docs.com/sergio-govoni/8292/sql-server-2016-video-pills-eng">SQL Server 2016 Video Pills</a> is a collection of videos&nbsp;about the most important features of SQL Server 2016; speakers are Engineers and Program Manager of the SQL Server Team!</p><p><a href="https://docs.com/sergio-govoni/8292/sql-server-2016-video-pills-eng" target="_blank" mce_href="https://docs.com/sergio-govoni/8292/sql-server-2016-video-pills-eng"><img width="724" height="285" style="width:724px;height:285px;" src="http://sqlblog.com/files/folders/59225/download.aspx" border="0"></a></p><p>Would you like to discover the new features of SQL Server 2016? Watch these videos!</p><p>Enjoy!</p>SQL Formatting Matters…http://sqlblog.com/blogs/andy_leonard/archive/2015/07/20/sql-formatting-matters.aspxMon, 20 Jul 2015 16:43:56 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:59156andyleonard<p>… to some folks more than others.</p> <p>We likely all have stories about working with folks who care <em>a lot</em> about the format of their SQL statements. I’ve worked with a lot of database developers and analysts. I cannot say SQL formatting distinguishes the good from the bad (or ugly), but I have noticed the sharpest developers have a preferred SQL format. I’ve also noticed the very sharpest are the most dogmatic about the format they prefer.</p> <p>Why is this?</p> <p>I don’t know for sure. I’ve asked, and a few shared that their preferred SQL format helps them visualize or conceptualize the purpose of the statement. I can buy that. When working with folks who have a SQL format preference, I try to only send them SQL formatted as they like. </p> <p>How about you? Does format matter? If so, why?</p> <p>:{&gt;</p> <p>Learn more:&#160; <br /><a href="http://www.sqlservercentral.com/stairway/100550/" target="_blank">Stairway to Biml</a> <br /><a href="http://www.linchpinpeople.com/tag/ssis/" target="_blank">Linchpin People Blog: SSIS</a> <br /><a href="http://www.sqlservercentral.com/stairway/72494/" target="_blank">Stairway to Integration Services</a></p> <p><a href="http://smile.amazon.com/Server-Integration-Services-Design-Patterns/dp/1484200837" target="_blank"><img title="Learn more about SSIS Design Patterns!" border="0" alt="SSIS Design Patterns" src="http://sqlblog.com/blogs/andy_leonard/SSIS2014DesignPatterns200_2A08FE52.jpg" width="163" height="204" /></a></p>SQL Saturday in Italy…English version!http://sqlblog.com/blogs/davide_mauri/archive/2015/06/16/sql-saturday-in-italy-english-version.aspxTue, 16 Jun 2015 13:48:40 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:58874manowar<p>The 2015 is a special year for Italy, because the country hosts Expo 2015, which is the current Universal Exposition. For this reason, the <a href="http://www.ugiss.org/">Italian PASS chapter</a> promotes a special edition of SQL Saturday, a free training event for SQL Server professionals. The <a href="http://www.sqlsaturday.com/454/EventHome.aspx">SQL Saturday #454 in Turin</a> on October 10, 2015 has the following characteristics:</p> <ul> <li>More than 20 sessions, on SQL Server, Business Intelligence and Azure Data Platform. </li> <li><b>All the sessions will be in English language</b>. </li> <li>The venue is in the center of Turin, close to the train station: <ul> <li>You can be at the expo in 40 minute </li> <li>You can travel to Milan in less than 1 hour </li> </ul> </li> <li>Turin is usually less expensive than Milan and you might stop for at least one night, dedicating the Sunday after SQL Saturday to visiting the Expo or Milan. <ul></ul> </li> </ul> <p>We want to provide the best experience to the attendees, and we also want to help those of you traveling with family and/or friends that might not interested to technical content. For this reason, we are planning a web page containing information for side and/or alternative activity during the SQL Saturday. You will get more information about that starting in July.</p> <p>However, we first need a good estimation of the number of attendees, in order to correctly size the venue and to verify the interest in side activities, so we will module the time to allocate in such a section of the upcoming web site. These operations have to be completed months ahead of the event.</p> <p>For this reason, <b>we ask you to fill the survey at <a href="http://www.sqlsatexpo.com/">http://www.sqlsatexpo.com/</a></b>, providing us important information about your intention of visiting Expo 2015 and about the number of people who will travel with you.</p> <p>If you are a speaker, please <a href="http://www.sqlsaturday.com/454/Speakers/Submission.aspx">submit your sessions</a>, considering that the agenda will prioritize three topics: SQL Server 2016, Power BI, and Azure Data Platform.</p> <p>See you in Turin! </p>24 Hours of PASS (June 2015)http://sqlblog.com/blogs/sergio_govoni/archive/2015/05/27/24-Hours-of-PASS-June-2015.aspxWed, 27 May 2015 20:04:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:58773Sergio Govoni<p>The most important free on-line event on SQL Server and Business Intelligence is back!</p><p>The 24 Hours of PASS is coming back with a great edition that will provide best practices, expert tips, and demos, from new and up-and-coming Speakers.&nbsp;D<span class="hps">on't</span>&nbsp;<span class="hps">take commitments on&nbsp;<span class="hps"><b>24</b> and</span> <span class="hps"><b>25</b> of <span class="hps"><strong>June 2015</strong>.</span></span></span></p><p><a href="http://www.sqlpass.org/24hours/2015/goc/Sessions.aspx" target="_blank" mce_href="http://www.sqlpass.org/24hours/2015/goc/Sessions.aspx"><img width="984" height="125" style="width:984px;height:125px;" src="http://sqlblog.com/files/folders/58774/download.aspx" border="0"></a></p><p>Register now at this <a href="http://www.sqlpass.org/24hours/2015/goc/Registration.aspx" target="_blank" mce_href="http://www.sqlpass.org/24hours/2015/goc/Registration.aspx">link</a>, it's free!</p><p>No matter from what part of the world you will follow the event, the important thing is to know that it will be 24 hours of continuous training on SQL Server and Business Intelligence.</p><p>Enjoy!&nbsp;</p>Data Mining Algorithms – EM Clusteringhttp://sqlblog.com/blogs/dejan_sarka/archive/2015/05/12/data-mining-algorithms-em-clustering.aspxTue, 12 May 2015 14:30:32 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:58637Dejan Sarka<p>With the K-Means algorithm, each object is assigned to exactly one cluster. It is assigned to this cluster with a probability equal to 1.0. It is assigned to all other clusters with a probability equal to 0.0. This is hard clustering.</p> <p>Instead of distance, you can use a probabilistic measure to determine cluster membership. For example, you can cover the objects with bell curves for each dimension with a specific mean and standard deviation. A case is assigned to every cluster with a certain probability. Because clusters can overlap, this is called soft clustering. The Expectation-Maximization (EM) method changes the parameters of the bell curve to improve covering in each iteration.</p> <p>The Expectation - Maximization (EM) Clustering algorithm extends the K-Means paradigm in a different way. Instead of assigning each object to a dedicated cluster, it assigns each object to a cluster according to a weight representing the probability of the membership. In other words, there are no strict boundaries between clusters. Therefore, new means are computed based on weighted measures.</p> <p>The EM algorithm iterates between two steps. In the first step—the &quot;expectation&quot; step—the algorithm calculates the cluster membership of each case (i.e., the probability that a case belongs to a given cluster from the initially defined k number of clusters). In the second step—the &quot;maximization&quot; step—the algorithm uses these cluster memberships to re-estimate the models' parameters, such as the location and scale parameters of Gaussian distribution.</p> <p>The algorithm assumes that the data is drawn from a mixture of Gaussian distributions (bell curves). Take a look at the graphics. In the first row, the algorithm initializes the mixture distribution, which is the mixture of several bell curves here. In the second and third rows, the algorithm modifies the mixture distribution based on the data. The iteration stops when it meets the specified stopping criteria—for example, when it reaches a certain likelihood-of-improvement rate between iterations.</p> <p>Step 1: Initializing the mixture distribution</p> <p><a href="http://sqlblog.com/blogs/dejan_sarka/image_30F8E19C.png"><img title="image" style="border-top:0px;border-right:0px;border-bottom:0px;border-left:0px;display:inline;" border="0" alt="image" src="http://sqlblog.com/blogs/dejan_sarka/image_thumb_12448862.png" width="904" height="184" /></a> </p> <p>Step 2: Modifying the mixture distribution</p> <p><a href="http://sqlblog.com/blogs/dejan_sarka/image_3A76C174.png"><img title="image" style="border-top:0px;border-right:0px;border-bottom:0px;border-left:0px;display:inline;" border="0" alt="image" src="http://sqlblog.com/blogs/dejan_sarka/image_thumb_45A2DC6F.png" width="904" height="184" /></a> </p> <p>Step 3: Final modification</p> <p><a href="http://sqlblog.com/blogs/dejan_sarka/image_62AB8B37.png"><img title="image" style="border-top:0px;border-right:0px;border-bottom:0px;border-left:0px;display:inline;" border="0" alt="image" src="http://sqlblog.com/blogs/dejan_sarka/image_thumb_60056037.png" width="904" height="184" /></a> </p> <p>You use the EM Clustering for the same purposes as the K-Means Clustering. </p> <p>In addition, you can search for outliers based on combinations of values of all input variables with the EM algorithm. You check the highest probability of cases over all clusters. The cases where the highest probability is still low do not fit well into any cluster. Said differently, they are not like other cases, and therefore you can assume that they are outliers. See the last figure in this blog post bellow.</p> <p><a href="http://sqlblog.com/blogs/dejan_sarka/image_23DC2540.png"><img title="image" style="border-top:0px;border-right:0px;border-bottom:0px;border-left:0px;display:inline;" border="0" alt="image" src="http://sqlblog.com/blogs/dejan_sarka/image_thumb_48041080.png" width="904" height="263" /></a> </p> <p>The green case belongs to the cluster D with probability 0.95, to the cluster C with probability 0.002, to the cluster E with probability 0.0003, and so on.</p> <p>The red case belongs to the cluster C with probability 0.03, to the cluster B with probability 0.02, to the cluster D with probability 0.003, and so on. The highest probability for the red case is still a low value; therefore, this case does not fit well to any of the clusters and thus might represent an outlier.</p> <p>Outliers can also represent potentially fraudulent transactions. EM Clustering is therefore useful also for fraud detection. Finally, you can use EM Clustering for advanced data profiling to find the rows with suspicious combinations of column values.</p>Classic Music, Attractions, and Presentations: Presenting on Database Design at SQL Saturday Atlanta 2015http://sqlblog.com/blogs/louis_davidson/archive/2015/05/08/classic-music-attractions-and-presentations-presenting-on-database-design-at-sql-saturday-atlanta-2015.aspxSat, 09 May 2015 02:38:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:58609drsql
<p style="margin:0in;font-family:Calibri;font-size:11pt;">This next two weeks
are going to be awesome. May 11, I get to see the Who for probably the last
time ever. Saturday I am <a href="http://www.sqlsaturday.com/392/Sessions/Details.aspx?sid=13494">speaking
on Database Design</a> at <a href="http://www.sqlsaturday.com/392/eventhome.aspx">SQL Saturday in Atlanta</a>,
then heading to Disney World.<span style="mso-spacerun:yes;">&nbsp; </span>It rarely
gets even close to this good for a two week span, and SQL Saturday is a big
part of that. I love going to SQL Saturdays, seeing all of the people who I usually only see as a handle in a twitter feed or blog post.</p><p style="margin:0in;font-family:Calibri;font-size:11pt;">&nbsp;</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt;">
</p><p style="margin:0in;font-family:Calibri;font-size:11pt;">As I considered this
week, I couldn't help but try to find a somewhat interesting way to blog about
it. The presentation I am doing is my oldest, and truly most favorite
presentation. I have done it many times over, and it only changes a little bit
here and there every time I give it. </p><p style="margin:0in;font-family:Calibri;font-size:11pt;">
</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt;">&nbsp;</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt;">Baba O'Reilly,
Carousel of Progress, and Database Design?</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt;">&nbsp;</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt;"><a href="http://thewho.com/">The Who</a> is celebrating 50 years, and a few songs
they sing will be close to that age…and they are still relevant today and
connect with the audience young and old. A (begrudgingly) favorite attraction
of mine at Disney World is the <a href="https://www.youtube.com/watch?t=159&amp;v=CmrSiJTMf7s">Carousel of
Progress</a>, largely because it is charmingly old but the message still
relevant (if dated,) particularly to a computer scientist who wants to affect
technology for the future. Relational database design is a fundamental tool for
producing software, even in the 30+ years since it was started. Much like you
won't appreciate music and theme parks without knowing the classics,
understanding relational database concepts will help you design any solution
using a database. </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt;">&nbsp;</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt;">As a speaker and
writer, you want to produce new and interesting stuff, but it may be the oldies
that people want to hear. I love to do this presentation largely because the
topic, and there are plenty of people who still need to learn the fundamentals
of database design.<span style="mso-spacerun:yes;">&nbsp; </span>Of course, I won't
entertain you like The Who,<span style="mso-spacerun:yes;">&nbsp; </span>and the
classroom will almost certainly not rotate after each section of the
presentation, but if you are hearing about design for the first or fiftieth
time, I feel certain you will get something from it.</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt;">&nbsp;</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt;">If Database Design interests
you, I hope to see you in my session this Saturday! If you are still on the
fence, here is the abstract:</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt;">&nbsp;</p>
<p style="font-family:Calibri;font-size:13.5pt;margin-top:7pt;margin-bottom:7pt;"><a href="http://www.sqlsaturday.com/392/Sessions/Details.aspx?sid=13494"><span style="font-weight:bold;">Database Design Fundamentals</span></a></p>
<p style="margin:0in;font-family:Calibri;font-size:11pt;">Data should be easy
to work with in SQL Server if the database has been organized as close as
possible to the standards of normalization that have been proven for many
years, but are often thought of as old-fashioned. Many common T-SQL programming
"difficulties" are the result of struggling against these standards
and can be avoided by understanding the requirements, applying normalization,
as well as a healthy dose of simple common sense. In this session I will give
an overview of how to design a relational database, allowing you to work with
the data structures instead of against them. This will let you use SQL
naturally, enabling the query engine internals to optimize your output needs
without you needing to spend a lot of time thinking about it. This will mean
less time trying to figure out why SUBSTRING(column,3,1) = 'A' is killing your
performance, and more time for solving the next customer problem.</p>Database Design: 3-State Bitshttp://sqlblog.com/blogs/andy_leonard/archive/2015/05/06/database-design-3-state-bits.aspxWed, 06 May 2015 22:45:48 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:58598andyleonard<p align="left">I can hear you thinking, “Andy, you’re off your old rocker. There are only <em>two</em> states for bit values!” And you are correct. Almost.</p> <p align="left">What if I’m storing a truly binary value – a value that possesses only two possible states – but there’s the possibility of a third state to indicate that I do not know which state applies. Consider this single-column example table:</p> <p align="left"><u>[Value]</u> <br />High <br />Low <br />Don’t Know or Don’t Care</p> <p align="left">The two discrete states are High and Low. The 3rd state can be labeled “unknown.” Can I represent these three states using a bit data type?</p> <p align="left"><strong>Yes!</strong></p> <p align="left">I can do it with a NULL:</p> <p align="left"><u>[Value]</u> <br />&#160; 1 <br />&#160; 0 <br />NULL</p> <p align="left">NULL means “the data is missing.” But in this context, I can define NULL to <em>mean</em> “unknown.” One caveat is that I cannot define to NULL to mean both “missing” and “unknown.” I have to pick one and only one meaning for NULL.</p> <p align="left">:{&gt;</p>How to rename a column when CDC is enabledhttp://sqlblog.com/blogs/damian_widera/archive/2015/04/30/how-to-rename-a-column-when-cdc-is-enabled.aspxThu, 30 Apr 2015 18:16:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:58557Damian Widera<p class="MsoNormal"><a name="_GoBack" class=""></a>I have been recently asked if Always
On supports metadata operations on tables. The operation is add, drop and column
rename. I said that it was possible. However I was informed that during column
rename there was an error about “<a name="OLE_LINK3" class=""></a><a name="OLE_LINK2" class=""></a><a name="OLE_LINK1" class=""></a>replication”:</p><p class="MsoNormal"><span style="font-size:10pt;font-family:Consolas;color:blue;">Caution:</span><span style="font-size:10pt;font-family:Consolas;"> Changing <span style="color:gray;">any</span> part
<span style="color:blue;">of</span> an <span style="color:blue;">object</span>
name could <span style="color:blue;">break</span> scripts <span style="color:gray;">and</span> stored <span style="color:green;">procedures</span><span style="color:gray;">.</span></span><span style="font-family:Consolas;font-size:10pt;">Msg 4928</span><span style="font-family:Consolas;font-size:10pt;color:gray;">,</span><span style="font-family:Consolas;font-size:10pt;"> </span><span style="font-family:Consolas;font-size:10pt;color:blue;">Level</span><span style="font-family:Consolas;font-size:10pt;"> 16</span><span style="font-family:Consolas;font-size:10pt;color:gray;">,</span><span style="font-family:Consolas;font-size:10pt;"> </span><span style="font-family:Consolas;font-size:10pt;color:blue;">State</span><span style="font-family:Consolas;font-size:10pt;"> 1</span><span style="font-family:Consolas;font-size:10pt;color:gray;">,</span><span style="font-family:Consolas;font-size:10pt;"> </span><span style="font-family:Consolas;font-size:10pt;color:blue;">Procedure</span><span style="font-family:Consolas;font-size:10pt;"> </span><span style="font-family:Consolas;font-size:10pt;color:maroon;">sp_rename</span><span style="font-family:Consolas;font-size:10pt;color:gray;">,</span><span style="font-family:Consolas;font-size:10pt;"> Line 655.&nbsp;</span><span style="font-family:Consolas;font-size:10pt;">Cannot </span><span style="font-family:Consolas;font-size:10pt;color:blue;">alter</span><span style="font-family:Consolas;font-size:10pt;"> </span><span style="font-family:Consolas;font-size:10pt;color:blue;">column</span><span style="font-family:Consolas;font-size:10pt;"> </span><span style="font-family:Consolas;font-size:10pt;color:red;">'Name'</span><span style="font-family:Consolas;font-size:10pt;"> because
it </span><span style="font-family:Consolas;font-size:10pt;color:gray;">is</span><span style="font-family:Consolas;font-size:10pt;"> </span><span style="font-family:Consolas;font-size:10pt;color:red;">'REPLICATED'</span><span style="font-family:Consolas;font-size:10pt;color:gray;">.</span></p><p class="MsoNormal"><span style="font-size:10pt;">I know that replication is not used in the scenario so my next
guess was that this must be caused by the CDC. And I was right. Let’s do a
simple example to see how the CDC works and what happened if you try to rename
a column.</span></p><p class="MsoNormal"><o:p></o:p></p>
<p class="MsoNormal">I downloaded the AdventureWorks2014 database from the <a name="OLE_LINK6" class=""></a><a name="OLE_LINK5" class=""></a><a name="OLE_LINK4" class=""></a>codeplex page
and enabled the CDC in this database. I created also a role that could be used
to have access to the CDC data. However it is not important in our example.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:'Minion','serif';">&nbsp;</span><span style="font-size:10pt;font-family:Consolas;color:blue;">USE</span><span style="font-size:10pt;font-family:Consolas;"> AdventureWorks2014<span style="color:gray;">;</span></span></p><p class="MsoNormal"><span style="font-size:10pt;font-family:Consolas;"><span style="color:gray;"></span></span><span style="color:blue;font-family:Consolas;font-size:10pt;">GO</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10pt;font-family:Consolas;color:blue;">CREATE</span><span style="font-size:10pt;font-family:Consolas;"> <span style="color:blue;">ROLE</span> [cdc_admin]</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:blue;">GO</span><span style="font-size:10.0pt;font-family:Consolas;"><o:p></o:p></span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10pt;font-family:Consolas;color:blue;">EXEC</span><span style="font-size:10pt;font-family:Consolas;"> <span style="color:green;">sys</span><span style="color:gray;">.</span><span style="color:maroon;">sp_cdc_enable_db</span></span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:blue;">GO</span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;">&nbsp;</p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-family:'Minion','serif';">Next I configured that all CDC-related objects are stored in
the separate filegroup – that is for performance reason:</span><span style="font-size:10.0pt;font-family:Consolas;"><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10pt;font-family:Consolas;color:blue;">ALTER</span><span style="font-size:10pt;font-family:Consolas;"> <span style="color:blue;">DATABASE</span>
AdventureWorks2014</span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:blue;">ADD</span><span style="font-size:10.0pt;font-family:Consolas;"> <span style="color:blue;">FILEGROUP</span> [CDC_Objects]<span style="color:gray;">;</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:blue;">GO</span><span style="font-size:10.0pt;font-family:Consolas;"><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10pt;font-family:Consolas;color:blue;">ALTER</span><span style="font-size:10pt;font-family:Consolas;"> <span style="color:blue;">DATABASE</span>
AdventureWorks2014</span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:blue;">ADD</span><span style="font-size:10.0pt;font-family:Consolas;"> <span style="color:blue;">FILE </span><span style="color:gray;">(</span>&nbsp;&nbsp;&nbsp; NAME <span style="color:gray;">=</span> <span style="color:red;">N'CDC_Objects'</span><span style="color:gray;">,</span> <o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="color:blue;">FILENAME</span> <span style="color:gray;">=</span> <span style="color:red;">N'C:\Temp\Adwentureworks_cdc.ndf'</span><span style="color:gray;">,</span> <o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; SIZE
<span style="color:gray;">=</span> 1024MB <span style="color:gray;">,</span>
FILEGROWTH <span style="color:gray;">=</span> 1024MB <span style="color:gray;">)</span>
<o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:blue;">TO</span><span style="font-size:10.0pt;font-family:Consolas;"> <span style="color:blue;">FILEGROUP</span> [CDC_Objects]<span style="color:gray;">;</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:blue;">GO</span><span style="font-size:10.0pt;font-family:Consolas;"><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-family:Minion, serif;font-size:10pt;">The last configuration step is to choose the table that will
be under the control of the CDC feature.</span><span style="font-family:Minion, serif;font-size:10pt;">&nbsp;
</span><span style="font-family:Minion, serif;font-size:10pt;">You might indicate the role, capture instance name (which is important
in case where there are two CDC tables) and </span><a name="OLE_LINK14" style="font-family:Minion, serif;font-size:10pt;" class=""></a><a name="OLE_LINK13" style="font-family:Minion, serif;font-size:10pt;" class=""></a><a name="OLE_LINK12" style="font-family:Minion, serif;font-size:10pt;" class=""></a><a name="OLE_LINK11" style="font-family:Minion, serif;font-size:10pt;" class=""></a><a name="OLE_LINK10" style="font-family:Minion, serif;font-size:10pt;" class=""></a>filegroup <span style="font-family:Minion, serif;font-size:10pt;">name at the moment.</span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10pt;font-family:Consolas;color:blue;">EXEC</span><span style="font-size:10pt;font-family:Consolas;"> <span style="color:green;">sys</span><span style="color:gray;">.</span><span style="color:maroon;">sp_cdc_enable_table</span></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:blue;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="font-size:10.0pt;font-family:Consolas;">@source_schema <span style="color:gray;">=</span> <span style="color:red;">N'Sales'</span><span style="color:gray;">,</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; @source_name
<span style="color:gray;">=</span> <span style="color:red;">N'Store'</span><span style="color:gray;">,</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; @role_name
<span style="color:gray;">=</span> <span style="color:red;">N'cdc_admin'</span><span style="color:gray;">,</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; @capture_instance
<span style="color:gray;">=</span> <span style="color:red;">N'SalesStore'</span><span style="color:gray;">,</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; @filegroup_name
<span style="color:gray;">=</span> <span style="color:red;">N'CDC_Objects'</span><span style="color:gray;">;</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:blue;">GO</span><span style="font-size:10.0pt;font-family:Consolas;"><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-family:'Minion','serif';">As the CDC configuration is done I was trying to rename on of
the column that belongs to the table Sales.Store:</span><span style="font-size:10.0pt;font-family:Consolas;"><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10pt;font-family:Consolas;color:blue;">EXEC</span><span style="font-size:10pt;font-family:Consolas;"> <span style="color:green;">sys</span><span style="color:gray;">.</span><span style="color:maroon;">sp_rename</span><span style="color:blue;"> </span><span style="color:red;">'Sales.Store.Name'</span> <span style="color:gray;">,</span> <span style="color:red;">'SomeNewBetterName'</span><span style="color:gray;">,</span> <span style="color:red;">'COLUMN'</span></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-family:'Minion','serif';">&nbsp;</span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-family:'Minion','serif';">I was immediately notified by the error:<o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10pt;font-family:Consolas;color:blue;">Caution:</span><span style="font-size:10pt;font-family:Consolas;"> Changing <span style="color:gray;">any</span> part
<span style="color:blue;">of</span> an <span style="color:blue;">object</span>
name could <span style="color:blue;">break</span> scripts <span style="color:gray;">and</span> stored <span style="color:green;">procedures</span><span style="color:gray;">.</span></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">Msg 4928<span style="color:gray;">,</span> <span style="color:blue;">Level</span> 16<span style="color:gray;">,</span> <span style="color:blue;">State</span> 1<span style="color:gray;">,</span> <span style="color:blue;">Procedure</span> <span style="color:maroon;">sp_rename</span><span style="color:gray;">,</span> Line 655<o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">Cannot <span style="color:blue;">alter</span> <span style="color:blue;">column</span> <span style="color:red;">'Name'</span> because
it <span style="color:gray;">is</span> <span style="color:red;">'REPLICATED'</span><span style="color:gray;">.</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="color:blue;font-family:Consolas;font-size:10pt;">&nbsp;</span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-family:'Minion','serif';">I checked the sys.columns view and that was true – the flag
is_replicated was set to all columns that are configured for the CDC. <o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-family:'Minion','serif';">It makes perfect sense for me as the CDC feature uses log
reader agent </span><span style="font-family:Wingdings;mso-ascii-font-family:Minion;mso-hansi-font-family:Minion;mso-char-type:symbol;mso-symbol-font-family:Wingdings;">J</span><span style="font-family:'Minion','serif';"> to detect changes.</span><span style="font-family:'Minion','serif';mso-bidi-font-family:'Segoe UI';"> <o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-family:'Minion','serif';mso-bidi-font-family:'Segoe UI';">Change data capture and
transactional replication always use the procedure<span class="apple-converted-space">&nbsp;</span></span><span style="font-family:'Minion','serif';"><a href="https://msdn.microsoft.com/en-us/library/ms186983.aspx"><span style="mso-bidi-font-family:'Segoe UI';color:windowtext;text-decoration:none;text-underline:none;">sp_replcmds</span></a></span><span style="font-family:'Minion','serif';mso-bidi-font-family:'Segoe UI';"> to read changes from the transaction log. <span style="color:#2A2A2A;"><o:p></o:p></span></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:blue;">&nbsp;</span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:blue;">SELECT</span><span style="font-size:10.0pt;font-family:Consolas;"> is_replicated <span style="color:blue;">FROM</span>
<span style="color:green;">sys</span><span style="color:gray;">.</span><span style="color:green;">columns</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:blue;">WHERE</span><span style="font-size:10.0pt;font-family:Consolas;"> <span style="color:fuchsia;">object_id</span> <span style="color:gray;">=</span> <span style="color:fuchsia;">object_id</span><span style="color:gray;">(</span><span style="color:red;">N'Sales.Store'</span><span style="color:gray;">)</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:gray;">AND</span><span style="font-size:10.0pt;font-family:Consolas;"> name <span style="color:gray;">=</span> <span style="color:red;">N'Name'</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">&nbsp;</span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-family:'Minion','serif';">What should we do in case of column rename? There is only one
way I am aware of. First the CDC must be disabled for the table. But be careful
before you run the statement because all related CDC tables will be <a name="OLE_LINK19" class=""></a><a name="OLE_LINK18" class=""></a>dropped
immediately and without any warning </span><span style="font-family:Wingdings;mso-ascii-font-family:Minion;mso-hansi-font-family:Minion;mso-char-type:symbol;mso-symbol-font-family:Wingdings;">J</span><span style="font-family:'Minion','serif';"> </span><span style="font-size:10.0pt;font-family:Consolas;"><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10pt;font-family:Consolas;color:blue;">EXEC</span><span style="font-size:10pt;font-family:Consolas;"> <span style="color:green;">sys</span><span style="color:gray;">.</span><span style="color:maroon;">sp_cdc_disable_table</span></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:blue;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="font-size:10.0pt;font-family:Consolas;">@source_schema <span style="color:gray;">=</span> <span style="color:red;">N'Sales'</span><span style="color:gray;">,</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; @source_name
<span style="color:gray;">=</span> <span style="color:red;">N'Store'</span><span style="color:gray;">,</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; @capture_instance
<span style="color:gray;">=</span> <span style="color:red;">N'all'</span><span style="color:gray;">;</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:blue;">GO</span><span style="font-size:10.0pt;font-family:Consolas;"><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">&nbsp;</span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-family:'Minion','serif';">Finally it is possible to rename the column.</span><span style="font-size:10.0pt;font-family:Consolas;"><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10pt;font-family:Consolas;color:blue;">EXEC</span><span style="font-size:10pt;font-family:Consolas;"> <span style="color:green;">sys</span><span style="color:gray;">.</span><span style="color:maroon;">sp_rename</span><span style="color:blue;"> </span><span style="color:red;">'Sales.Store.Name'</span> <span style="color:gray;">,</span> <span style="color:red;">'SomeNewBetterName'</span><span style="color:gray;">,</span> <span style="color:red;">'COLUMN'</span></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">&nbsp;</span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><a name="OLE_LINK26" class=""></a><a name="OLE_LINK25" class=""></a><a name="OLE_LINK24" class=""></a><a name="OLE_LINK23" class=""></a><span style="font-family:'Minion','serif';">If
all changes are done the CDC should be enabled again</span><span style="font-size:10.0pt;font-family:Consolas;"><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10pt;font-family:Consolas;color:blue;">EXEC</span><span style="font-size:10pt;font-family:Consolas;"> <span style="color:green;">sys</span><span style="color:gray;">.</span><span style="color:maroon;">sp_cdc_enable_table</span></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:blue;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="font-size:10.0pt;font-family:Consolas;">@source_schema <span style="color:gray;">=</span> <span style="color:red;">N'Sales'</span><span style="color:gray;">,</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; @source_name
<span style="color:gray;">=</span> <span style="color:red;">N'Store'</span><span style="color:gray;">,</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; @role_name
<span style="color:gray;">=</span> <span style="color:red;">N'cdc_admin'</span><span style="color:gray;">,</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; @capture_instance
<span style="color:gray;">=</span> <span style="color:red;">N'SalesStore'</span><span style="color:gray;">,</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; @filegroup_name
<span style="color:gray;">=</span> <span style="color:red;">N'CDC_Objects'</span><span style="color:gray;">;</span><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-size:10.0pt;font-family:Consolas;color:blue;">GO<o:p></o:p></span></p><p class="MsoNormal">&nbsp;</p><p class="MsoNormal"><span style="font-family:Minion, serif;font-size:13.3333330154419px;">And that's all for today!</span><span style="font-family:'Minion','serif';">&nbsp;</span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-family:'Minion','serif';">Cheers,<o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;"><span style="font-family:'Minion','serif';">Damian</span><span style="font-size:10.0pt;font-family:Consolas;"><o:p></o:p></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;">
</p><p class="MsoNormal"><span style="font-family:'Minion','serif';">&nbsp;</span></p>Data Mining Algorithms – K-Means Clusteringhttp://sqlblog.com/blogs/dejan_sarka/archive/2015/04/17/data-mining-algorithms-k-means-clustering.aspxFri, 17 Apr 2015 06:03:00 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:58471Dejan Sarka<p>Hierarchical clustering could be very useful because it is easy to see the optimal number of clusters in a dendrogram and because the dendrogram visualizes the clusters and the process of building of that clusters. However, hierarchical methods don’t scale well. Just imagine how cluttered a dendrogram would be if 10,000 cases would be shown on it.</p> <p>K-Means is a distance-based partitioning algorithm that divides data set in predetermined (“k”) number of clusters around the average location (“mean”). In your mind, you intuitively know how to group people or any other cases. Groups do not need to have an equal number of members. You can do grouping according to one or more attributes.</p> <p>The algorithm comes from geometry. Imagine record space with attributes as dimensions. Each record (case) is uniquely located in space with values of the attributes (variables). </p> <p>The algorithm initially creates k fictitious members and defines them at the means of the clusters. These fictitious cases are also called centroids. The values of the input variables for these centroids could be selected randomly. Some algorithms use also a bit of heuristics and use the marginal distributions of the attributes as a starting point and randomly perturb from there.</p> <p>The algorithm then assigns each record to nearest centroid. This way, you get the initial clusters. When the clusters are defined, the algorithm can calculate the actual centroids of clusters and get new centroids. After the new centroids are calculated, the algorithm reassigns each record to the nearest centroid. Some records jump from cluster to cluster.</p> <p>Now the algorithm can calculate new centroids and then new clusters. The algorithm iterates last two steps until cluster boundaries stop changing. You can stop the iterations when there is less than the minimum number of cases defined as a parameter that can jump from cluster to cluster.</p> <p>Here is a graphical representation of the process. You can see the cases in a two-dimensional space. The dark brown cases are the fictitious centroids. The green case is the one that will jump between clusters.</p> <p><a href="http://sqlblog.com/blogs/dejan_sarka/image_1ACFC0C6.png"><img title="image" style="border-top:0px;border-right:0px;border-bottom:0px;border-left:0px;display:inline;" border="0" alt="image" src="http://sqlblog.com/blogs/dejan_sarka/image_thumb_37D86F8E.png" width="604" height="391" /></a> </p> <p>After the centroids were selected, the algorithm assigns each case to the nearest centroid.</p> <p><a href="http://sqlblog.com/blogs/dejan_sarka/image_12DAAF15.png"><img title="image" style="border-top:0px;border-right:0px;border-bottom:0px;border-left:0px;display:inline;" border="0" alt="image" src="http://sqlblog.com/blogs/dejan_sarka/image_thumb_7085AA4C.png" width="604" height="392" /></a> </p> <p>Now we have our three initial clusters. The algorithm can calculate the real centroids of those three clusters. This means that the centroids move.</p> <p><a href="http://sqlblog.com/blogs/dejan_sarka/image_0B51D059.png"><img title="image" style="border-top:0px;border-right:0px;border-bottom:0px;border-left:0px;display:inline;" border="0" alt="image" src="http://sqlblog.com/blogs/dejan_sarka/image_thumb_16EA1E49.png" width="604" height="392" /></a>&#160; <p></p> <p>The algorithm has to recalculate the cluster membership. The green case jumps from the middle cluster to the bottom left cluster.</p> <p><a href="http://sqlblog.com/blogs/dejan_sarka/image_71EC5DCF.png"><img title="image" style="border-top:0px;border-right:0px;border-bottom:0px;border-left:0px;display:inline;" border="0" alt="image" src="http://sqlblog.com/blogs/dejan_sarka/image_thumb_73BCB396.png" width="604" height="392" /></a> </p> <p>In the next iteration, no case jumps from a cluster to a cluster. Therefore, the algorithm can stop.</p> <p>K-means clustering scales much better than hierarchical methods. However, it has drawbacks as well. First of all, what is the optimal number of clusters? You can’t know in advance. Therefore, you need to create different models with different number of clusters and than select the one that fits your data the best. </p> <p>The next problem is the meaning of the clusters. There are no labels for the clusters that would be known in advance. Once the model is built, you need to check the distributions of the input variables in each cluster to understand what kind of cases constitute each cluster. Only after this step you can label the clusters.</p></p>