In this article

Python samples for MicrosoftML

02/16/2018

2 minutes to read

Contributors

In this article

MicrosoftML samples that use the Python language are described and linked here to help you get started quickly with Microsoft Machine Learning Server. We provide samples for tasks that are quick to run. We also provide some more advanced samples that are described in sections following the simpler task-based samples.

Sentiment analysis with a pre-trained model

The Sentiment analysis sample is a text analytics sample that shows how to use the featurize_text transform to featurize text data. The featurized text data is then used to train a model to predict if a sentence expresses positive or negative sentiments. The type of machine learning task exhibited in this sample is a supervised binary classification problem.

More specifically, the example provided shows how to use the featurize_text transform in the MicrosoftML package to produce a bag of counts of n-grams (sequences of consecutive words) from the text for classification. The sample uses the Sentiment Labelled Sentences Data Set from the UCI repository, which contains sentences that are labeled as positive or negative sentiment.

The sentiment analysis quickstart uses a pre-trained model. Pre-trained models are installed through setup as an optional component of the Machine Learning Server or SQL Server Machine Learning. To install them, you must check the ML Models checkbox on the Configure the installation page. For more information on pre-trained models, see Pre-trained machine learning models.

The sample then shows how to improve the prediction by choosing optimal hyperparameters for the learner. All learners employ hyperparameters which impact the way a model is trained. Most of the time, they have a default value which works on most of the datasets. But the default values are not usually the best possible value for a particular dataset. This sample shows how to find optimal values for this dataset.

Sentiment analysis with text featurization

Text featurization is a machine learning technique that converts text into numerical values that are used to capture features of interest. The Plot text featurization sample is a text analytics example that creates columns features containing n-grams probabilities for positive and negative sentiments computed from their sentences. The featurized text data is then used to train a model to predict if a sentence expresses positive or negative sentiments. We repeat the training and prediction with n-grams of different lengths and compare their speed and performance with an ROC curve. We also show how to use a combination of two sets of features and plot the results. For more information on text featurization, see text featurization.

Next steps

Now that you've tried some of these examples, you can start developing your own solutions using the MicrosoftML packages and APIs for R and Python: