Welcome!

Introduction to FastScore

Difficulty:Beginner

Estimated Time:10 minutes

In this first scenario, you'll be introduced to FastScore components and example models to walk through the deployment process.

FastScore is comprised of docker containers customized to execute analytics and is agnostic to the language, compute environment, and data sources. See architecture here. FastScore executes each model or piece of code in an individual FastScore Engine that can be scaled on demand. FastScore Manage communicates to the underlying storage location (example: Git) of models and other assets required to deploy into a FastScore Engine.

In this scenario, we will provide one model in Python and the associated assets needed to score data using that model through the CLI.

The model in this scenario is a Gradient Boosting Machine model. The model will consume data about cars and predict the risk factor of each car on a range of -3 to 3, -3 being very risky and 3 being the least risky.

Congratulations!

You've completed the scenario!

Scenario Rating

We just scored a gradient boosting machine model written in python in batch mode and then quickly switched it to streaming.

We also went over all of the critical components needed to score a model in FastScore. The scoring happened in a FastScore engine customized for that model which can be saved as an image and referenced using any orchestration tool on any infrastructure that supports docker.

Steps

Introduction to FastScore

Step1 of 5

Let's Look at the Scoring Models

Models are created in a variety of creation environments like R Studio or Jupyter Notebooks. Those models most likely were trained and created in a separate environment than they will run in production. FastScore enables data science teams to test, deploy, and monitor their models in a way that is portable and scalable. This scenario will introduce all of the components needed to deploy a model using FastScore after it has been created.

Let's start with the python model we will be using in this example.

This model is stored in a MySQL database which is the default backing store configured for FastScore. It could just as easily be stored in a code repository that FastScore Manage connects to and exposes the model to be used in a deployment configuration.

This model is specifically a scoring model. It uses the weights created during the training phase to predict a score off of the attributions in the incoming data.

Let's take a look at our model:

fastscore model show gbm_python

There are a few things to notice here:

The model references data schemas at the top using smart comments called "gbm_input" and "gbm_output", which we will discuss in the next step

The begin function performs preparation work for the execution of the model which will load any custom libraries and/or prepare the model coefficients (or weights) to be used

The action function uses a specified method to calculate a score

Let's Look at the Data Structure

Our models expect to receive very specific data fields that have restrictions on the data types. To ensure data being consumed by the model is valid, we will associate data schemas for the input data and for the data produced by the model.

Let's view the data schemas that our models need!

fastscore schema show gbm_input

This schema contains a list if specific attributes that the model expects in order to produce a score.

Similarly, the output data is designed to have a specific structure. Let's look at the output schema:

fastscore schema show gbm_output

Since the model is just producing one number, a double, between -3 and 3, this schema is pretty simple.

Let's Look at the Data Transports

It is best practice to separate the model from how the data will be transported in and out of the model.

Let's take a look at the different transports that will used in this scenario!

In the testing phase, a static file may be used to ensure the models execute properly in FastScore.

Let's look at the first row of data this model will consume (this example file only contains 5 rows of data):

head -1 ./katacoda/fastscore-intro/data/input_data.jsons

If the data structure does not match the input schema, we will receive an error. This ensures that the data structure expectations are met at all times.

Let's look at the data transport to feed data into the model from the file:

fastscore stream show file_in

The data transports specify where the data is located, how to move the data, and how the data is encoded.

This scenario will also switch the model from batch scoring to streaming by changing the data transports from a file to using Kafka (used for building real-time data pipelines).
Let's look at the kafka transport:

fastscore stream show kafka_in

Let's Batch Score a Model with FastScore!

FastScore requires at least one model and an associated schema for every data transport. In this scenario, we will deploy our model with one input data transport and one output data transport.
Each model is deployed in its own FastScore Engine. When we set up FastScore, we initialized 4 FastScore Engines. Let's deploy our python model in engine 1!

fastscore use engine-1

fastscore model load gbm_python

The engine is now waiting for data transports to send data so that it can produce scores. FastScore supports multiple input and output data transports, but we will be only using one of each. Let's set the transports!

We will be sending the data through a file and writing the scores to an output file:

fastscore stream attach file_in 0

The 0 here is the index of the Engine input slot we are using. See this documentation for more information on multiple input and output slots.

Let's set our output data stream:

fastscore stream attach file_out 1

Now, let's see the scores from the model:

head -5 ./katacoda/fastscore-intro/data/output_data.jsons

Let's Change the Model to Run as a Streaming Model!

Let's detach the file data transport from the engine and attach the kafka data transport to turn it into a streaming model! We will also produce the score and send them over kafka.

fastscore stream detach 0

fastscore stream detach 1

fastscore stream attach kafka_in 0

fastscore stream attach kafka_out 1

We are going to use a utility called kafkaesq that will send data row by row from a specified file over a kafka topic that we will use as the input of the model. This utility will also receive data over the output topic and write the data to the console.
Let's score the python model! We will specify the file location, the input topic name, and the output topic name (cleverly named "input" and "output").

Debugging Scenarios

Help

Katacoda offerings an Interactive Learning Environment for Developers. This course uses a command line and a pre-configured sandboxed environment for you to use. Below are useful commands when working with the environment.

cd <directory>

Change directory

ls

List directory

echo 'contents' > <file>

Write contents to a file

cat <file>

Output contents of file

Vim

In the case of certain exercises you will be required to edit files or text. The best approach is with Vim. Vim has two different modes, one for entering commands (Command Mode) and the other for entering text (Insert Mode). You need to switch between these two modes based on what you want to do. The basic commands are: