skill set

coding

As a developer, writing code is the major component of my regular workflow. I have experience building software with system-level languages like C, Python, and Java. For analysis and system administration, I have employed scripting languages like awk & Python to process and clean raw data files for input into databases, or data analysis environments like Jupyter and RStudio.
A detailed list of familiar languages and software tools can be found on
my experience timeline.

visualization

Looking at the data is a critical step in almost any data analysis. As part of the InfoVis group at UBC, I have discussed visualization techniques and evaluations since 2005. In my own experience, I use Tableau,
or dplyr and
ggplot2 for exploration of static tables, and GNU
Octave/MatlabRStudio, and
IPython Notebooks for iterative exploration while developing algorithms.

stats/machine learning

Stats/ML are a suite of techniques that help build predictive models of data distributions as well as give guidance about how much confidence to put in those models. As head of analytics at Coho Data, I have employed time-series analysis and clustering to analyze customer usage patterns. As a doctoral student, I have devised "unsupervised learning" techniques for data exploration.
As a quant, I have used a variety of supervised learning techniques for fitting model parameters and statistical hypothesis testing for confirmatory analyses.

computer graphics

Familiarity with computer graphics APIs like WebGL/OpenGL has yielded at least two benefits for me as a researcher/analyst. First, I have leveraged the graphics pipeline to build novel visualization techniques that scale to large datasets. Second, I have exploited
the parallelism of graphics processors (GPUs) to speed up existing analysis techniques.

domains

data storage

My team and I have designed and developed an analytics system for analyzing storage product performance, events, and alerts. The system is deployed in AWS, is designed to easily scale with demand, and uses a flexible elasticsearch back end.

I have also developed different methods for analyzing the workload statistics of computer storage systems. I am interested in analyzing block-level storage traces for:

classifying workloads

optimizing flash provisioning

predicting usage patterns

empowering administrators

financial

My exposure to finance is on the trading side of things. I have researched the following:

Order-book-level market-impact analysis

Long-term trading system development (monthly)

Short-term trading system development (daily)

Trading algorithm ("Algo") development

Automated trading system ("Bot") development

text analysis

I have been involved in several projects to help researchers navigate unordered collections of documents. These projects are:

scientific computing

My experience with scientific computing has focused on numerical linear algebra and nonlinear optimization, having taken graduate courses in both these topics. In my own research, I have experience with

bioinformatics

I consulted with BC Cancer Agency to development software for analyzing DNA copy number alterations. The project involved collaborating with a lead researcher to design a visual console for analyzing copy numbers and labels across chromosomes.

projects

The Counter Stack is a compressed representation of a request stream. With Counter Stacks, one can calculate cache-usage statistics in sub-linear space for any time interval without storing the entire trace.