Gradient Trader Part 4: Build Training Set with Rust for Python

Nov 16, 2017

Preparing dataset for machine learning is a CPU heavy task. To optimize for GPU utilization during training, it is imperative to process the data before training. What is the proper way to approach this? Depends on how much data you have.

Recently, I trained a new model. I had 60GB compressed order book data with which I needed to generate a 680GB training set.

This post is about the awkward situation where the dataset is not big enough to warrant Spark but would take too long to run on your computer. A top-end deep learning box only has a maximum of 32 cores while AWS has 128 cores on demand for $13.388/hr. A simple back-of-the-envelop calculation shows that if a task takes 24 hours on an Intel i7 with 8 threads, or 24 * 8 (hour x thread) then it would only take ~1 hour to run on a 128 core instance for the price of a burrito. Another pro is the huge memory(1952GB) that should fit most datasets.

I use Rust to do the heavy lifting and in this post I will cover these two aspects:

Using multiple cores

Saving to Numpy format

Parallel Programming

I suck at writing parallel code. About 4 months ago, I wrote this monstrocity. Feel free to skip it.

Using Numpy format

Since I’m dealing with spatial-temporal data(for RNN), I need to generate feature tensor of shape [batch_size, time_step, input_dim]. To do this, I wrote a serializer for the npy format.

// write_npy.rsusebyteorder::{BE,LE,WriteBytesExt};usestd::io::Write;userecord::*;staticMAGIC_VALUE:&[u8]=&[0x93,0x4E,0x55,0x4D,0x50,0x59];fnget_header()->String{format!("
{{'descr': [('data', '>f4')],'fortran_order': False,'shape': ({},{},{})}}",BATCH_SIZE,TIME_STEP,INPUT_DIM)}/// these are just from the specpubfnwrite(wtr:&mutWrite,record:&Record){let_=wtr.write(MAGIC_VALUE);let_=wtr.write_u8(0x01);// major versionlet_=wtr.write_u8(0x00);// minor versionletheader=&get_header();letheader_len=header.len();let_=wtr.write_u16::<LE>(header_lenasu16);let_=wtr.write(header.as_bytes());// headerforbatchinrecord.iter(){forstepinbatch.iter(){forinputinstep.iter(){let_=wtr.write_f32::<BE>(*input);}}}}