Category Archives for Data Analysis

I spend most of my time working with Microsoft web technologies, or as I like to refer to it, “.NET and Friends”. While I’m a big fan of the web, I’m always looking into new areas of development. One of those areas is data analysis. We are awash in data and learning how to process it is a valuable skill. Until recently, there wasn’t much in the Microsoft ecosystem for doing this kind of work. This isn’t a bad thing, but it’s nice to be able to use familiar tools to learn new things.

Fortunately, Microsoft has made some serious investments in the data analysis space. You aren’t going to be using C#, but Visual Studio now supports R. R is a language made “by statisticians for statisticians”. It’s one of the premier data science technologies and a great way to learn statistics. Microsoft also has R support in SQL Server.

In this post, I’m going to cover a few of the reasons R is worth a look. Even if you are not planning on donning the data scientist hat anytime soon.

The Power of Polyglot

This is sometimes forgotten in the .NET world, but different languages are good for different things. If you build web applications, you already know this. For example, if you want to build a modern web application, you need at least three different languages (JavaScript, CSS, and HTML). More likely you’re looking at six or more (JavaScript, Typescript, SASS, CSS, C#, HTML, XML, and Markdown).

Every language does certain things better. You should use the language that does the job best, rather than trying to shoe horn your language of choice. In the data analysis space, this is no different. The two most popular languages for data analysis are R and Python. While Python is a viable option (and supported in Visual Studio as well), R is purpose build for data analysis. You can do data analysis in either, but R does it with less code.

In addition to the productivity benefits of using the right tool for the right job, it’s good for your personal development to learn new programming languages. The Pragmatic Programmer recommends learning a new one each year. Learning new languages improves your thinking and makes you better at your primary development stack.

Data Is The New Oil

“There’s gold in them servers.”

Data is money. Large companies are using the data you generate as a goldmine. Uses range from using data to optimize advertising to using it to make even more addictive products. In addition to user generated data, we also have the mountains of data generated by IOT devices. Sometimes we can use it for small gains, like using a Nest Thermostat to optimize your heating and cooling, but sensor networks can have a much greater impact. We have access to more data than in all of human history. If you can figure out how to mine insights from that data, you will be rewarded handsomely.

If You Care About Truth, Data is For You

“There are three kinds of lies: lies, damned lies, and statistics.”

With plenty of data comes plenty of people using that data to manipulate you. Every political cause has a stable of statistics behind it. Even if they fall apart under scrutiny, people believe them because numbers sound fancy. People trying to sell you something use numbers to appear more credible. If you want to thrive in our data soaked economy, it’s essential to become data literate, so you can spot these manipulations.

R is for Learners

R has several features that make it a great tool for learning about data analysis. First, it’s really easy to learn. R is a simple language that you can pick up in a few hours. Additionally, R has an easy to use built in help system. If you need info on any command or method, it’s a few keystrokes away. R also has a lot of built in data sets to play with statistical techniques. This includes lots of popular demo statistical data sets that are well known in the statistics community.

Playing Nice With Others

As data analysis becomes more prevalent in the enterprise, you’re probably going to wind up working with data analysts and data scientists. Learning about some of the tools and techniques they use gives you common ground. It’s the same reason software developers should develop business and industry knowledge. Being able to connect with your team members on their terms makes you more than a run of the mill software developer.

Conclusion

If you’re an enterprise developer, R is worth a look. You can use R to learn valuable new skills using familiar tools. With a little effort, you’ll be able to slice and dice data for fun and profit.

In an effort to improve my data analysis skills, I’ve been learning and speaking about the R programming language. Even if you don’t want to be a data scientist, (whatever the hell that means this week) learning some analysis skills can pay dividends. Data literacy is an essential skill in our data soaked economy and R is a good learning tool for analysis skills.

One of the harder things to do when starting in a new area is finding useful resources. It’s tough to find the digital needle in the web powered haystack. To make your life a little easier, here’s a list of the R resources I found to be useful.

Setting Up R

There are three paths to getting R setup on your machine. If you’re a Visual Studio 2017 user, the easiest way to get R is to install the Data Science workload in Visual Studio. This will get you the Microsoft flavor of R and R Tools for Visual Studio.

If you’re not into Visual Studio, you can also install an R interpreter and R Studio. R Studio is a free R IDE. For interpreters, you can go with either the Microsoft flavor or the standard CRAN flavor of R.

If neither of those options work for you, you can also run R in a Jupyter Notebook. Jupyter is a web-based environment that makes it really easy to mix text and code. It’s used in many contexts including scientific research and virtual textbooks. To setup a local copy, start off by installing Anaconda. Anaconda is a data science environment that includes a plethora of handy analysis tools. After you install Anaconda, you’ll need to install R using the conda package manager. Then you can run Jupyter using the “jupyter notebook” command.

I’m really enjoying using R Tools for Visual Studio. It’s nice to learn something new (R) with something familiar (Visual Studio). I did, however, hit a snag when trying out the new IDE.

Upon installing the data science workload in Visual Studio (which is how you install R Tools), I couldn’t open up or create a project. File -> New Project just hung indefinitely. Usually, you expect these things to actually work, so I dropped a bug onto the R Tools for Visual Studio Github page. To my surprise, within about thirty minutes (on a Sunday night), someone asked me about my issue. While they didn’t give me an exact answer, they gave me the hint I needed to fix my issue. I was impressed by the speed and helpfulness of their response.

As some of you already know, our good friends at Microsoft maintain their own version of R. This version is faster, but it’s a point release behind the latest one. It’s a basic trade off between shiny and speedy. Turns out R Tools for Visual Studio doesn’t yet support the latest version. I had previously installed the non-Microsoft R, which was at v3.4 and Visual Studio defaulted to that version.

There are two ways to solve the issue. The easiest way is to just uninstall R 3.4 and use the Microsoft versions of R. If you are using R Studio as well, Microsoft’s R works fine. The second way is to go to R-Tools -> Windows -> Workspaces. From there, you can pick the version of R that’s being used by Visual Studio.

Regardless of which solution you go with, this issue, while vexing, is easy enough to fix.

Happy data hunting.

Dustin Ewers

I help companies build better software using modern development practices and the latest technologies.

I like to focus on the human side of software development. I believe that the primary purpose of technology is to improve people’s lives. At the end of the day, software is less about code and more about communicating clearly with other people.