Archive for: November, 2017

I’m an R programmer. To me, R has been great for data exploration, transformation, statistical modeling, and visualizations. However, there is a huge community of Data Scientists and Analysts who turn to Python for these tasks. Moreover, both R and Python experts exist in most analytics organizations, and it is important for both languages to coexist.

Many times, this means that R coders will develop a workflow in R but then must redesign and recode it in Python for their production systems. If the coder is lucky, this is easy, and the R model can be exported as a serialized object and read into Python. There are packages that do this, such as pmml. Unfortunately, many times, this is more challenging because the production system might demand that the entire end to end workflow is built exclusively in Python. That’s sometimes tough because there are aspects of statistical model building in R which are more intuitive than Python.

Python has many strengths, such as its robust data structures such as Dictionaries, compatibility with Deep Learning and Spark, and its ability to be a multipurpose language. However, many scenarios in enterprise analytics require people to go back to basic statistics and Machine Learning, which the classic Data Science packages in Python are not as intuitive as R for. The key difference is that many statistical methods are built into R natively. As a result, there is a gap for when R users must build workflows in Python. To try to bridge this gap, this post will discuss a relatively new package developed by Microsoft, revoscalepy.

Having worked with both, my loyalties tend to lie with R for a couple of reasons. But this might help some people bridge the gap.

Pro Tip: A quick test is to see if the log transformation increases the magnitude of the correlation between “TotalCharges” and “Churn”. We’ll use a few dplyr operations along with the corrr package to perform a quick correlation.

correlate(): Performs tidy correlations on numeric data

focus(): Similar to select(). Takes columns and focuses on only the rows/columns of importance.

ASP.NET session state enables you to store and retrieve values for a user as the user navigates the different ASP.NET pages that make up a Web application. Currently, ASP.NET ships with three session state providers that provide the interface between Microsoft ASP.NET’s session state module and session state data sources:

InProcSessionStateStore, which stores session state in memory in the ASP.NET worker process

OutOfProcSessionStateStore, which stores session state in memory in an external state server process

SqlSessionStateStore, which stores session state in Microsoft SQL Server database

This blog post focuses on the SqlSessionStateStore provider and describes how you can configure it to use SQL Server In-Memory OLTP as the storage option for session data. You can either use the latest ASP.NET async version of the SQL Session State provider (which is the recommended approach), or configure an earlier version of the provider to work with In-Memory OLTP by downloading and running the In-Memory OLTP SQL scripts from our sql server samples github repo.

The me of seven years ago really needed this. But with the strong shift against session-based data collection and back to stateless or client-held state paradigms, I’m not sure how many people this helps.

No matter how hard the dbatools; team tries, there’s always someone who wants to do things we’d never thought. This is one of the great things with getting feedback direct from a great community. Unfortunately a lot of these ideas are either too niche to implement, or would be a lot of complex code for a single use case.

As part of the Restore-DbaDatabase stack rewrite, I wanted to do make things easier for users to be able to get their hands dirty within the Restore stack. Not necessarily needing to dive into the core code and the world of GitHub Pull Requests, but by manipulating the data flowing through the pipeline using standard PowerShell techniques, all the while being able to do the heavy lifting without code.

So hooray! We have found word vectors again, a bit faster, with clearer and easier-to-understand code. I do argue that this is a real benefit of this approach; it’s based on counting, dividing, and matrix decomposition and is thus much easier to understand and implement than anything with a neural network. And the results?

Click through to see the new method, as well as some basic analogy testing.

Data preparation for machine learning requires business domain expertise, bias awareness and an experimental thought process. Before preparing your data, you’ll first define a business problem solve. During that exercise, you’ll select an outcome metric and brainstorm potential input variables that influence it from many varied perspectives. From there you will begin identifying, collecting, cleaning, shaping and sampling data to run through automated machine learning model processes.

Note that it is also not unusual for relevant machine learning input data to occur outside of existing transactional processes. If that is the case, you can still start creating a first-generation machine learning model with existing data and continue to build new model versions over time as supplementary data is acquired.