PASS Summit 2016: Wednesday

First, Microsoft’s Joseph Sirosh provided the keynote address. The focus was on the power of data and analytics in this changing world. I didn’t write anything down during this session, so instead I’ll link to Kevin Kline’s recap of the talk.

My focus was towards some sessions to help with query writing. I first attended Andy Yun’s session, Why Your Data Type Choices Matter. Awesome session. Andy first focused on the internals (how data is stored). While I was aware that a table was not created in the order of the columns listed, I did not know much about the FIXVAR format. If using variable columns (ex. VARCHAR), the extra two bytes represent the variable column offset array. We need more metadata to read the actual data. One thing that stuck out was the question of using unicode versus non-unicode. If you use the former (VARCHAR), then data could be lost and refactoring may be required. If you use the latter (NVARCHAR), the storage requirement is doubled. The key is to right-size data types (even in temp tables…if a certain person is reading, the person is being called out!), match data types, and recall the 8KB page size limit.

During lunch, I sat with a table of developers and DBAs talking about query tuning. The major advice I received was during a conversation about date functions that related to project I had worked on previously. Basically, I was told about how scalar user-defined functions can be a terrible thing in many situations. This may be open for debate, but scalar UDFs can bring down the IO. Now I’ll have to investigate that further.

My next session was Optimizing SQL Server and Databases for Large Fact Tables, presented by Thomas Grosher. The idea was that sometimes we have to pull data from large fact tables, and we can apply some tricks to make this happen. Say, if there’s a need to read a fact table for a report which runs for two hours, how do you help it. This session did go over my head a bit, as it seemed more performance-based than BI-development-based. Though I did take away a few things. First, it’s important to choose a clustered index key wisely, such as using a lookup table for common parameters. Second, table partitioning comes into play, which I haven’t tried to do much myself, but our engineering unit has. Third, row compression. We have some tables at my company with one billion rows, and Thomas explained that 1 byte less on 1 billion rows can save 1GB. A solid session, though I may have been surprised by the deeper infrastructure content.

Final session of the day was on Biml for Beginners, by Cathrine Wilhelmsen – a great example of the next generation of data platform stars already coming to fruition. So Biml is something I had brought up in passing at work and read a few blog articles about, but I had not followed through. Here was my chance. Basically, Biml (Business intelligence markup language) is helpful to many a business, including my company, by using business logic to easily repeat SSIS packages. It’s essentially XML. I learned about the beta BimlOnline as well, which does reverse engineering to help us gather what the Biml looks like. We can even extend Biml with C# or VB code blocks to import structures. Cathrine even explained tiered Biml files to repeat such attributes like admin, source, and destination. To use a practical example, there are some weekly reports to automate coming up, and this could be relatively useful to set these packages up at a base level. Biml is not meant for deployment, but it can save us valuable development time. Easy to learn more as well. I’m adding it to the personal projects list.

Lighten up, Francis. Biml is a good thing.

Exhibitors had a fun reception with appetizers and light dinner. I’m bringing home more Idera ducks. Then came more karaoke for the night, put on by Pragmatic Works at the Hard Rock Cafe. Same drill as the previous night with networking opportunities.