How to create a Machine Learning workflow
AWS USAWS TokyoIDCF

This article introduces a machine learning workflow for Treasure Data by making use of the Chicago Energy Benchmarking dataset to predict future energy consumption. The city of Chicago provides measured energy efficiency for each building to encourages participants to improve the efficiency.

Through this article, you will learn how to:

create a machine learning model for regression with Hivemall

create a machine learning workflow with Treasure Workflow

Writing a workflow is very important for production use to execute ML codes repeatedly.

Each feature of Hivemall consists of index (i.e., feature name) and value:

Numerical value: <index>:<value>

e.g., price:600.0

Categorical value: <index>#<value>

e.g., gender#male

Feature index and feature value are generally separated by comma. When comma is omitted, the value is considered to be 1.0. So, a categorical feature gender#male a one-hot representation of index := gender#male and value := 1.0. Note that # is not a special character for categorical feature.

Each feature is a string value and "feature vector" means an array of strings as follows: