Contents

Exploring value-at-risk analysis with Apache Spark and Jupyter

Introduction

This application characterizes historical returns of stock data and then
simulates many possible outcomes for a given portfolio over a given time
horizon. The idea is that you can characterize the value at risk for a given
percentage of simulated outcomes (e.g., "there is a 5% chance we’d lose more
than $1M USD in the next two weeks"). While you wouldn’t want to use the
techniques in this application to guide real investment decisions, it also
demonstrates how to use a Jupyter notebook with Apache Spark in OpenShift and
you can modify the basic simulation to model security returns in a more
sophisticated manner.

Architecture

The driver for the value-at-risk application is a
Jupyter notebook. A Jupyter notebook is an interactive
compute environment that interleaves documentation, code examples, and
graphical output. This particular notebook uses Spark to orchestrate
simulations of the stock market.

We’ll use the notebook the way a data scientist might for initial
experimentation, with a single Spark executor embedded within the application.
Running a notebook on OpenShift provides a lot of advantages even if we aren’t
using scale-out compute:

if we have a project accessible from a public cloud, we can work on it from
anywhere, with any device;

we can easily share our work with colleagues; and

our results will be easily reproducible, since repeatable deployments mean they
won’t depend on any details of our environment.

Install and run the notebook

Installing our notebook is very simple: we just need to create an OpenShift project and install and run the notebook application. Make sure you’re logged in to OpenShift and have selected the right project, and then execute the following two commands:

Check the OpenShift web console. Once the notebook application is running, click on its route. Jupyter will ask you to log in; use developer for a password (or the password you specified if you used a different one when you ran oc new-app). You’ll be presented with a list of notebooks that are installed in the workshop-notebook image. The one we’re interested in is var.ipynb, so click on it. (There are some other files in this image: pyspark.ipynb is an introduction to Apache Spark and ml-basics.ipynb introduces some basic machine learning techniques.)

Interacting with the Jupyter notebook is very simple. Select a cell with Python
code and then execute it, either by pressing the "run cell" button in the
toolbar or by pressing shift + enter. You can edit the code in the notebook
cells and re-run them, and results from cells you’ve already run will be
available to new cells. The notebook interface provides a great way to
experiment with new techniques. Try it out!

Once you’ve worked through the whole notebook, consider trying out the
self-guided exercises at the end of the notebook. If you want to run your own
Jupyter notebooks in OpenShift, consider extending the
base-notebook image.

Videos

You can see a demo of installing and running the notebook application in the
following video: