Problem

Would you like some help passing the Implementing a Data Warehouse exam for the MCSA SQL Server 2012 certification? If not, could you find value in a collection of great references for populating and managing the contents of a data warehouse? If not, are you interested in a database application for tracking references to any set of topics and subtopics? If you answered yes to at least one of these questions, then this tip is for you.

Solution

This is an extension of a previous tip on passing the Implementing a Data Warehouse exam -- one of the qualifications you need to earn a MCSA certification for SQL Server 2012. The prior tip presented a collection of URLs for each topic and subtopic for the exam. This tip extends the prior one in a couple of respects.

The collection of URLs is refined and enlarged from the initial set, and it is more focused on the exam topics.

The URLs in this improved collection are in a database so that you can query, add to, and delete URLs from the collection. Therefore, it is an open data source that any SQL Server BI professional can update according to their own personal preferences for the benefit of the SQL Server professional community.

The guidelines for including URLs in the collection are first that a URL must relate to one of the 20 subtopics delineated by Microsoft to denote content within the exam's scope. The URL collection for the prior tip had many URLs on topics that are beyond the precise scope of the exam. There is also a definite bias in the URL selection process to favoring Microsoft web sites -- particularly, the msdn web site. This is because Microsoft determines through its certification questions what is important for you to know about its product. Finally, all other things equal, URLs with brief relatively easy to digest content were typically chosen over otherwise equally meritorious references.

The group of 20 exam subtopics is an extension of 5 major exam topics. The database available as a download with this tip has one table for the major topics and a second table for their subtopics. A third table contains the collection of URLs. All three tables are integrated into a relational database. The relational database uses primary and foreign keys to support a table's role in the model.

While the database that you can download has the tables linked and populated according the rules described above, you can extend or modify the database with other URLs for the Implementing a Data Warehouse exam. Alternatively, you can truncate all three tables and re-populate them for another exam or content area about which you want to build a collection of URLs; you may also need to add or remove rows from the topics and subtopics tables. Of course, you can perform a radical re-design to the database structure as well as the contents within the database. This tip is meant to be of value as much for the URL references that it organizes as well as the database design for storing and exposing the references.

An Overview of the Database Design

Because a picture is often worth many words, I am starting with a SQL Server Database Diagram view of the database of URLs. Notice the MajorTopic table has a primary key based on the MajorTopicID column values. This column has an identity property. The MajorTopicTitle column in the table identifies with text each major topic, such as "Design and Implement a Data Warehouse" or "Extract and Transform Data". You can list the complete set of MajorTopicTitle column values with a SELECT statement that displays the column values for each row in the MajorTopic table.

The SubTopic table relates to both the MajorTopic table and the URL table. Individual rows in the SubTopic table are tracked by a primary key based on a pair of values -- the MajorTopicID column value and the SubTopicID column value. The MajorTopicID column in the SubTopic table also has a foreign key relationship with the MajorTopicID column in the MajorTopic table. In this way, each row in the SubTopic table relates uniquely to a single row in the MajorTopic table. The SubTopicTitle column values, such as "Design and implement dimensions" and "Design control flow" denote the title for each subtopic. Again, you can list the complete set of SubTopicTitle column values with a SELECT statement that displays the column values for each row in the SubTopic table.

The URL table has no primary key to identify individual rows. Instead, each row in the URL table belongs to a collection relating to a single row in the SubTopic table. Therefore, the MajorTopicID and SubTopicID columns in the URL table have a foreign key relationship with columns of the same names in the SubTopic table. Aside from the foreign keys, each row in the URL table also contains a URL as well as descriptive title. This title is often based on the title at the top of the page to which the URL points, but I modified some titles in the table to better represent why the corresponding URLs were included in the database. A script in the next section illustrates how to enumerate each URL for every subtopic in the database.

Some Queries for the Database

One of the major advantages of the database is that you can discover the URLs within it through simple queries. You can also modify the content of the database through other queries.

The following script returns each MajorTopicTitle and SubTopicTitle pair in the database along with the collection of URLs belonging to the pair. The query's result set also lists MajorTopicID and SubTopicID values for a row, which is especially handy for reminding you about column values for managing the content in the tables. In the database version that you can download with this article, the query lists 155 rows. The INNER JOIN elements within the query's FROM clause links the three tables by their primary and foreign key values.

Speaking of managing the database's content, one thing that I especially encourage readers to do is to add URLs to the database. In this way, you can customize the database so that it contains URLs that have special interest or value in your opinion. For example, it is entirely possible that I omitted any references at all to one of your favorite blogs. The following script shows how to add a URL to the URLs table. The result set from the preceding query is one way to obtain appropriate values with which to populate MajorTopicIDValue and SubTopicIDValue. You can script the URL table to obtain more precise information about how to format values for the script below. For example, the URL table's Title column has a varchar data type with a maximum length of 150 characters and the URL column has a varchar data type with a maximum length of 250 characters. You can optionally change the maximum number of characters per column or change the data types from varchar to nvarchar to accommodate a wider range of characters.

As you start to reference the URLs in the database, you may observe that some URLs are present in the collection more than one time. This can be reasonable in the sense that a single URL can provide useful information about more than one subtopic. However, the same URL should not occur more than once for a single subtopic. After you added a URL to the collection for a subtopic, there is no additional value gained by adding the same URL to a subtopic's collection a second time.

Because the database has no constraint to restrict a URL from being included more than once for a single subtopic, it is possible to add inadvertently the same URL more than once to the same topic. I did this at least once as I was initially populating the database. I discovered the problem with following T-SQL script that enumerates all URL references that occur more than once. The query was not designed to detect duplicate URL references within a subtopic, but the query's result set can expose the issue if it is preset. Each URL reference appears in the result set for as many times as it is duplicated; duplicate URL references appear one after another within a set of rows. By scanning MajorTopicID and SubTopicID column values in the result set, you can verify if the same URL reference occurs multiple times within a single subtopic.

If you can encounter a URL appearing two or more times within a single subtopic, you can delete all references to the URL within the subtopic, and then insert the URL just once for the subtopic. I recommend you copy the subtopic foreign keys as well as the title and URL reference before deleting any rows. After deleting multiple duplicate URL references within a single subtopic, use the Insert sample code to re-insert the URL just once within the subtopic. The following listing shows the code for deleting one or more URLs references within a subtopic. The assignments for MajorTopicIDValue, SubTopicIDValue, and Title are the similar to those for the Insert script.

DELETE
FROM [ImplementingDataWarehouseExamTopicURLs].dbo.URL
WHERE MajorTopicID = MajorTopicIDValue AND SubTopicID = SubTopicIDValue
AND Title = 'Title the URL reference you want to delete'

In addition to downloading the whole database file for this tip, you will also be able to download a .sql file with more sample code examples about how to query the database.

An Overview of URLs in the Database

The design of the database is important because it will help to make the database an open exam resource tool that SQL Server professionals can modify for their own personal needs. However, what's in the database is important because it is can serve as a starting point for your changes or just help focus your study if you prefer to use it as is.

For many, a good place to start studying for the exam or just about implementing a data warehouse even if you do not plan on taking the exam will be from the full set of URL references associated with each subtopic. The result set from the first query in the preceding section presents this resource. Recall that the result set contains 155 rows; it is not shown here because of its length. While there are 155 rows in the result set, there are not 155 distinct URLs in the database because some URLs relate to more than one subtopic. In fact, there are just 138 distinct URLs in the database.

Another good place to start benefitting from the database may be by focusing on the URLs that are referenced by more than one exam subtopic. URL references that apply to more than one subtopic are likely to provide a better overall perspective than those that apply to just one subtopic. You may also find additional links in them for drilling down further and learning more precise content. There are 14 URLs that are listed for two or more subtopics. The following screen shot shows these 14 URLs along with their number of repeats. The result set is ordered by the number of repeats.

By focusing just on those web sites contributing two or more distinct URLs, you can learn about web sites that are likely to have other URL references that may be worthy of your study time. The following screen shot shows just web sites with two or more distinct URL references and it also orders the web sites by the count of distinct references.

As you can see, I was true to my promise of favoring the msdn web site. Seventy-four of the 138 distinct URL references are from the msdn web site (msdn.microsoft.com). The blogs.msdn.com web site is also from msdn, but it appears separately from the msdn web site. The technet.com web site is another Microsoft web site with several distinct URLs. In the browser used for the screen shot (IE 9), following either technet.microsoft.com or social.technet.microsoft.com took me to the same URL. Across this set of web sites, 85 (74 + 5 + 4 + 2) of the 138 distinct URL references came from a Microsoft web site. The main point of this URL counting summary for web sites is to encourage you to make sure that you take full advantage of the msdn and other Microsoft web sites as you prepare for the exam. They are rich in content pertaining to the exam.

I also want to definitely encourage you to examine non-Microsoft web sites as well. As an occasional MSSQLTips.com contributor, I make a point of searching the MSSQLTips.com web site whenever I need to learn about a SQL Server topic. In my search for appropriate URL references, the web site with the second most distinct URL references was MSSQLTips.com. Other web sites having three or more distinct URL references include sqlblog.com and stackoverflow.com.

If you or someone else did a search for appropriate URLs, the set of URLS in the database would like differ at least somewhat and the consequently the ordering of web sites would differ. The fact that the database is published as an open data source makes it available for you to improve it for you and others. The URLs in the current database are not meant to denote the definitive list of URLs for exam preparation on or learning about implementing a database warehouse. The database's URLs are merely those that I discovered in two tries to compile a list of relevant URLs to help me and others prepare for the exam.

Summary

This tip builds on a prior tip that drills down on the type of content that you can expect to encounter on the SQL Server 2012 Implementing a Data Warehouse exam. That previous tip included a bunch of URLs to help you learn more about the exam content. This follow-up tip starts with the URL collection, but drops all references to non-exam content. In addition, many new URLs are added, and some deficiencies, such as a duplicate listing of a single URL for one subtopic, are removed.

The most dramatic improvement of this follow-up tip is the inclusion of all URLs in a relational database model. As a result, you can query for particular information from the URL collection and you can easily add and remove URL references. You can even perform more dramatic changes, such as modify the relational database design. Furthermore, if you have a really great idea you can share your modification with the whole community of SQL Server professionals on the web.

You can download the .mdf file for the database by clicking one of the links for this tip. After downloading the database, you can attach the database to your SQL Server 2012 installation with a variation of the following script. Before you can attach the .mdf file, you'll need to unzip the file that you download. Then, you can query the database and learn more about its URL references or modify the relational database model in any way that serves your purposes.