SQL

Funny that it’s these T-SQL Tuesday posts that lead me to post a blog. I have a bunch of unfinished drafts, but now I have the chance to rework an entry about a particular hobby or two. So let’s talk about it.

Well, considering another calendar year is coming, and there’s a T-SQL Tuesday post theme about learning goals (something I haven’t tackled in a while), it’s about time I write another blog post. Time to shift the motivator to HYPERTHRUST.

I’m going to take part in T-SQL Tuesday this month. Why? Well, because I’m now starting to venture into speaking on beginner topics and lessons learned as I perfect my technical chops and learn new skills. This month, the subject is on new speakers, and the veterans who have advice for those of us starting out.

Talk #1 is already happening

Funny thing is…I’m already going to make my speaking debut on November 15, giving the microsession at our Raleigh-Durham PASS User Group. I’ll be discussing the TRY conversion functions (TRY_CAST, TRY_CONVERT, TRY_PARSE) over a 10-15 time frame. The talk was inspired by a case at my job involving error reporting, and some manually entered date values that led to exceptions in the original procedures. For this talk, I’ll be using some basic examples to explain each one. How CAST and CONVERT throw exceptions, while TRY_CAST and TRY_CONVERT avoid it with NULL instead. How PARSE is something most people don’t touch for good reason. It should be interesting to start out, and I’ve received some good advice from a few professionals before I rehearse. It does help that I’ll be working on rehearsal with partners as well for critiques.

Talk #2 is a longer talk – a Saturday submission

This March is SQL Saturday Raleigh, and it’s not a secret to some of the members that I’m strongly considering throwing my hat in the ring for a slot this year. I’ve been considering two topics to begin. One is about merging Python and T-SQL together. Of course, I’m picking up R and the Revolution Analytics acquisition has led to SQL and R beginning a beautiful relationship. However, Python is still a popular enough language that for the longest time I struggled to consolidate with SQL. I want to investigate this further to see if this is possible as a data tool.

Another topic that has gotten some traction is about moving from analyst to developer. I started my career as an analyst, then became a Database & BI Developer, and now I’m in a bit of a hybrid role. My view is that there are plenty of analysts who queried all the time, like myself, who might want to work more in the development space. Maybe from there someone will want to be a DBA, or a data scientist (CHEERS). So I’m looking into the high level tools if an analyst goes from having read permissions to having write & execute as well, and might have to use Reporting & Integration Services more often. It’s experience that I have been through, and now I think I can touch on this experience again in my current full-time role.

How does this get moving?

It is partially spectrum-related, but I am unsure where to start and how to approach someone for advice directly. A general blog post with references sounds like a perfect place to begin, and I’d like to see where I can go, and how some veterans learned their speaking skills. I’m nervous of talking about a subject that I don’t know much on, as I’ve had those over-my-head moments in front of fellow professionals. Now I have to ask myself, and others, how to target the audience and let everyone know that I’m impersonating an expert. Then there’s the matter of adjusting presentations and turning feedback into something constructive for another technical topic (or even the same one, refined for a different crowd).

I read Steve Jones’ post for this month as I was writing, and it made me think of one of my esteemed colleagues (yet another Steve who is part of PASS, affectionately known as SQL Steve in our company’s department) who did a series of weekly tips and tricks for our team, which included some practical examples we could use internally. It reflected the way of getting started with a small group. The other aspect that stuck out to me from the post was this:

[Your first technical speaking opportunity] can be hard as you need to be open to debate, accept you might get some things wrong, and that you might learn something. There is stress in trying to come across as an expert but I’d rather you try to teach something, with the idea that this is your understanding of a topic, but you are open to admitting you might be wrong if someone else knows more.

I don’t consider myself an expert at all. Intermediate in some areas, beginner in a few others, and court jester in the last one or two. It seems that a key for my microsession will be that experts in the house will have critiques, but it does not mean that they believe I’m ignorant here. I’ll have to keep my skin tough and stay aware that there will always be learning. The interaction and discussion will allow everyone to learn and explain how they came to their conclusions.

I’m excited about reading some of the other posts that are in the very long comment section, and what advice can be gleaned. Let me know that I can reach out to you (as Andy did on his host post), and I’ll do just that.

First, Microsoft’s Joseph Sirosh provided the keynote address. The focus was on the power of data and analytics in this changing world. I didn’t write anything down during this session, so instead I’ll link to Kevin Kline’s recap of the talk.

My focus was towards some sessions to help with query writing. I first attended Andy Yun’s session, Why Your Data Type Choices Matter. Awesome session. Andy first focused on the internals (how data is stored). While I was aware that a table was not created in the order of the columns listed, I did not know much about the FIXVAR format. If using variable columns (ex. VARCHAR), the extra two bytes represent the variable column offset array. We need more metadata to read the actual data. One thing that stuck out was the question of using unicode versus non-unicode. If you use the former (VARCHAR), then data could be lost and refactoring may be required. If you use the latter (NVARCHAR), the storage requirement is doubled. The key is to right-size data types (even in temp tables…if a certain person is reading, the person is being called out!), match data types, and recall the 8KB page size limit.

During lunch, I sat with a table of developers and DBAs talking about query tuning. The major advice I received was during a conversation about date functions that related to project I had worked on previously. Basically, I was told about how scalar user-defined functions can be a terrible thing in many situations. This may be open for debate, but scalar UDFs can bring down the IO. Now I’ll have to investigate that further.

My next session was Optimizing SQL Server and Databases for Large Fact Tables, presented by Thomas Grosher. The idea was that sometimes we have to pull data from large fact tables, and we can apply some tricks to make this happen. Say, if there’s a need to read a fact table for a report which runs for two hours, how do you help it. This session did go over my head a bit, as it seemed more performance-based than BI-development-based. Though I did take away a few things. First, it’s important to choose a clustered index key wisely, such as using a lookup table for common parameters. Second, table partitioning comes into play, which I haven’t tried to do much myself, but our engineering unit has. Third, row compression. We have some tables at my company with one billion rows, and Thomas explained that 1 byte less on 1 billion rows can save 1GB. A solid session, though I may have been surprised by the deeper infrastructure content.

Final session of the day was on Biml for Beginners, by Cathrine Wilhelmsen – a great example of the next generation of data platform stars already coming to fruition. So Biml is something I had brought up in passing at work and read a few blog articles about, but I had not followed through. Here was my chance. Basically, Biml (Business intelligence markup language) is helpful to many a business, including my company, by using business logic to easily repeat SSIS packages. It’s essentially XML. I learned about the beta BimlOnline as well, which does reverse engineering to help us gather what the Biml looks like. We can even extend Biml with C# or VB code blocks to import structures. Cathrine even explained tiered Biml files to repeat such attributes like admin, source, and destination. To use a practical example, there are some weekly reports to automate coming up, and this could be relatively useful to set these packages up at a base level. Biml is not meant for deployment, but it can save us valuable development time. Easy to learn more as well. I’m adding it to the personal projects list.

Lighten up, Francis. Biml is a good thing.

Exhibitors had a fun reception with appetizers and light dinner. I’m bringing home more Idera ducks. Then came more karaoke for the night, put on by Pragmatic Works at the Hard Rock Cafe. Same drill as the previous night with networking opportunities.

Earlier this year, I came upon the opportunity to go to PASS Summit. Some of you who clicked on this link probably know what that is, but I’ll let the about section of the Summit site explain to you what the event is and why it is worthwhile. I had previously gone in 2013 when it was held in Charlotte (easy commute from Raleigh), and having the chance to go back was a no-brainer for myself and my company. At the conference, I get to network with industry professionals/mutual Twitter followers, learn new skills to advance my career, and get advice on problems and possibilities facing my department at my company to allow us to become better. Plus, I can personally thank Redgate for the awesomeness of their SQL Prompt tool, which our BI team actively uses.

Anyway, this is where I start blog recaps.

Day 0 (Monday)

Flight arrived after 5:00 (PDT), so I did not get to pick up my registration badge. I will note that it would be nice if that area was open later on that day like it is on the surrounding days. At first, introverted autistic me was concerned about showing up to events without a badge. I was due to attend a networking dinner hosted by veterans Steve Jones and Andy Warren, where people could converse and meet. I was able to talk with a few professionals, including a group of developers at stamps.com, on what they do and we shared challenges we each face on a broad scale. While I liked how the dinner gave us a chance to sit down, I sometimes hope that there’s more opportunity to move about the dinner area and talk to other tables. When someone speaks to me first and asks about me, it is much easier for me to talk, or if I’m place at a table with others I’m driven to make that conversation. That was advantageous.

After I returned to the hotel to handle a task for my work, and struggling with it (public props to my colleague for figuring it out quickly the next morning), I decided that I might want to talk with others on handling disparate data…while singing karaoke. Thanks to the magic of Twitter, the PASS community is constantly in touch about events, which led me to join many others (both first-timers and community notables) to a Chinese restaurant hosting nightly karaoke. That was more of a chance to have fun and practice “Purple Rain” in front of an audience outside Raleigh-Durham or Philadelphia. I’ve learned that singing a song in front of people who may not know you is a good opening for conversation once the song is done. It’s no secret that I have trouble approaching someone I’ve never spoken with and making small talk with that person, but there were some openings created through the power of singing songs someone else made famous.

A key takeaway professionally that night was simple: as soon as you are stuck on a problem involving data sources, ask someone who is more seasoned with the data on their thought process, and how they became acquainted with the data. If they solve the problem, find out how they did it so you can apply it next time. It can help someone doing work on the analyst side to get better. I admit it’s an area I still need to be more consistent about when trying to play hero.

Day 1 (Tuesday)

I wasn’t signed up for a pre-con because money, so it has really been a day of exploring and also following up on previous work tasks. Let’s just say I saw touristy stuff and took advantage of the #sqlsummit hashtag to meet random others. Again, PASS does a great job utilizing Twitter.

Then we got to the networking dinner. There’s a lot of standing around awkwardly for a person like myself, so this event isn’t necessarily easy. I found that moving around and striking up conversation with someone else who was also by themselves to be an effective way of connecting. I was able to talk to a DBA or three about how their systems worked, and got advice that the system administrator should not be the database administrator. Can’t treat them both the same. I can agree to some extent, but I would think it depends on if the sys admin has been trained as a DBA.

I like the networking dinner for the open and social aspect, but I do wonder if PASS could put together a networking event similar to the first-timers one for people who generally just want to meet other people but have trouble saying the first word. The first-timers one is set up speed dating style, and maybe it could be expanded to others in the future. Though with everything considered, I met plenty of new people by focusing on them directly.

Then came the fun of karaoke, yet again. How we do the connect part. Or reconnect, as I encountered professionals I had not seen in a long time…three years in a few cases.

And that was everything through Tuesday. Days 2-4 will provide even more learning.

Well, I should first note that I intended to blog more than I have, but was struggling to find the right topics that didn’t involve any ‘woe is me’ complaints struggles. However, then I found inspiration in talking to others about steps when I got started in trying to work on my technical skills in my time away from work (mostly T-SQL and some Python would be covered), and how to cover the basics for a lot of people looking to get started.

One situation I run into more frequently than I previously wanted to admit is being unable to explain the most basic of concepts after I do them. Part of it, as I came to realize, was not allowing myself to practice what I learned outside of the office anymore. By the end of last year, I figured out that I needed to actually use a home database if I was going to perfect my technical chops, let alone speak on a subject in front of an audience where examples are crucial. So my first thought is…what about the constructs of Microsoft’s own sample databases? There’s a new one for 2016, and I had to get it, and post some rudimentary thoughts.

Get the database file, of course

I figured this was a time to trace my steps and add my first database in my shiny new Developer edition instance. So where can you find Wide World Importers? Here it is on the shiny GitHub page. There is both a transaction backup file and an analytical backup file (WideWorldImportersDW-Full.bak). I downloaded both, and moved the extended backup files to my local backup folder. In my case the extension was Microsoft SQL Server\MSSQL13.[instance name]\MSSQL\Backup. My thought is that it’s an easier spot to keep the originals. I read that some folks advised to place it directly in the C: drive, however, and there may be a good reason for me to do so in the future.

So within SQL Server Management Studio, I decided to use the commands rather than the RESTORE DATABASE command. I am not at all DBA level (show compassion for we little developers, peeps), but pretended to be one by asking to restore a database. There are instructions on the Microsoft site, but I’ve got pictures for how I followed along (also because I couldn’t get video to happen).

Yes, the Restore Database command was the droid I was looking for in this particular instance.

Once at the backup screen, I got the database loaded pretty quickly. The key is looking for a file after clicking the Device radio button. Also, the backup folder appeared instantly when I clicked Add and it allowed me to easily choose the database.

Well, hitting okay a bunch of time allowed for a very quick “restore” of a database I never had to start. Then I was able to do the same for the DW/analytical version, and I even put AdventureWorks2014 in there solely to be experimental. Had no problems with a 2014 database brought into a 2016 system, in case some of you were like me some time ago and thought compatibility issues could occur if you don’t set it to 2016 in advance.

I should also note that I used the -full version because I have Developer edition, and -full works on that and everything Enterprise. If you don’t have one of those editions, you’ll need to stick with -standard.

Is Wide World Importers special in any way?

It’s hard for me to say while toying around with the DB so far. The business has changed slightly, including more tables based on delivery locations. However, some of the big differences to me are more about the DB practices and configuration for new SQL Server 2016 features.

I immediately noticed many more system-versioned tables (temporal, maybe?) in this edition. Those are the clocks in the corner of each table. maximized the next level, and there was a history archive. Pretty cool that it’s finally come over this way.

What’s up with the clocks? Now they can find out where I messed around and I can’t cover my tracks anymore?

Even some of the code itself is more detailed and also slightly different in format, in the way lines are split up. I do still notice cursors, which will show others how to do it…but I wonder if all those folks who convinced me of the badness of cursors would take issue. The views are pretty simplistic to say the least, and may have some use considering they are concentrated to three areas.

The analytical database this time uses dimension, fact, and integration (staging tables) as the prefixes. I think it’s an easier way to teach folks about the data warehouse schema by using these tables. The schema is also set up that way, with fact tables having many a foreign key and the dimensions having identities across the board that link accurately. I even saw a proc called GetLastETLCutoffTime, which gave me ideas that I can bring to my day job for some of what we run during our off hours.

AdventureWorks won’t be updated any more, but it really feels more so like WideWorldImporters is a promotion of sorts, with more integrated features and better key systems. The documentation is about the same, but the data itself is improved. For people using this edition longer than I have, fresh data is a good thing, I’m sure.

Hold up…weren’t you going to mention more about 2016 features?

Oh yeah…those 2016 features. I notice that I haven’t yet tried to stretch the database, but it appears this one is configured so that it can be done. Same goes with R Services, which I’ve only scratched the surface on when using a release candidate.

I’ll have to go into more detail in a second part once I play around with these features, and maybe do a comparison against other public databases that are SQL Server compliant. Always learning, you know.

So I’ve really slacked on blogging since starting my own website. My thought was with my own domain that I would do this more. I have a lot of partial drafts right now that are eager for publication.Well, hopefully I have motivating myself merely by typing that.

Moving on…

This past Saturday, I took part in a SQL Server 2016 Discovery Day in Raleigh-Durham, to learn about the new features in this year’s edition in a hands-on environment with some of the PASS data available to the public. It was a pretty solid way to spend a really hot weekend afternoon, geeking out over bells and whistles and data we don’t always get to access. First, our chapter leader (or as Kevin refers to himself, Grand Poobah) gave a presentation concerning all of the new features that made the 2016 edition the best one yet for all of us who spend a ton of time working within the Microsoft stack. I particularly looked forward to putting with the Transaction Performance Analysis Overview into action, along with checking out R Services. The second presentation was one on columnstore indexes delivered by MVP Rick Heiges (fun fact…he’s related to a former president of my alma mater, which is how we first made acquaintance some time ago). Indexes aren’t my strong suit, so I can only say that he showed us how this index focus will make it easier for folks like me to improve in that facet. Then it was time to form teams and start hacking it.

In five hours, and with about ten hours worth of PowerBI under my belt, look what I made in school today.

We were given some really dirty data sets (IMO, at least) and would use these to create analyses on geographical locations on SQL Saturday and Summit attendees, and correlations between the topics presented and virtual chapter membership. Our group (Team Cheetah) had a diverse background from database engineers to report writers. We really focused strongly on the PowerBI aspect, and finding ways to clean up some of the data related to tracks and zip codes/addresses. After the fact, I realized on my part that I could have used more columnstore index feature, but I’m guessing that the leaders liked my decision to use lookup tables to make the above graphic work. I think I got a bit excited explaining how we found that topics presented within an 18-24 month span were the most frequent of Summit presentations in the next year, going on…and on…and on about the dashboards and the graphic types. After the fact, our team found more we could have done with the hours we had, but considering the rookie status we felt alright with what we would present and be judged on (use of new features, dashboards, analysis, etc). So maybe we would get points for utilizing the presentation and talking about the muddied data we had. Plus, the engineers were good at the geospatial aspect and cleansing.

Well, our team won the contest, with gift cards as prizes, so obviously that was a cool feather in the cap.

I will say that an exercise like this was rather interesting and cool, but I’m wondering what we could do for the PASS Community if we were to take this on for longer than an afternoon. If we had a team with BI Developers (like myself), ETL and data quality experts, analysts, and data scientists, then we could have some very useful information for trending. That may be the ultimate goal of this exercise, but it can become a stepping stone for a true hacking session.

Plus, this is an opportunity for a lot of the members of our local chapter to try their hand at civic hacking…myself included. One of my recent goals/fascinations has been civic hacking. The opportunity to take public data and improve a community using this information. In the Triangle alone we have many open data events, including CityCamp and Datapalooza. I’ve had a chance to sit in on some of the ideas and even add an idea in there, but a hackathon is something I’d like to try in full. Today gave me an idea on how it may work in a controlled environment, and what I could improve on when presenting findings. I would love to see more of this happen, and knowing that the community can have its own members help out with the solutions to make things better. Then, in turn, we can go out and use open data to help our communities.