­ portfolio to contain assets with high volatility and yet have low volatility overall, since you can assemble assets that are negatively correlated — when some move up, others move down.

This article describes my effort to use stocks from the S&P500 to optimize a portfolio using math and data science.

The ModelThe purpose is to determine what fraction of a portfolio to invest in each of several possible assets with the goal of minimizing the volatility of the portfolio, subject to a target return.

To frame the question mathematically, suppose f is an n-dimensional vector of the fractions that I’ll invest in each of n financial assets.

And let C denote the covariance matrix of the daily returns of the assets, an n x n matrix.

Let r be the n-dimensional vector of the expected returns of each of the assets.

The target return is determined by r*.

The optimization problem to solve is:Since we want to minimize the function for the volatility of the portfolio (the risk level), the three constraints of the problem are:The sum of the fractions should add up to one.

The portfolio should attain the target return, r*.

Each stock’s fraction should be less than 100%.

Because the objective function that we want to minimize is a quadratic, this class of problem is called a quadratic program in the operations research and numerical optimization community.

It’s important to note that because it’s a convex function, there’s always a unique solution.

The DataI used a dataset² of stock prices between February 20 and April 18, 2019.

I selected a few stocks in the tech industry — AMZN, GOOGL, MSFT, IBM, FB and NFLX (Amazon, Google, Microsoft, IBM, Facebook and Netflix) — and created a chart to visualize their correlation:Most of the stocks are strongly correlated, with the exception of Netflix (NFLX), but remember this is only a 2-month period.

The Optimal Portfolio AllocationTo solve the problem, I needed to compute the covariance matrix C of the daily returns and the estimate returns r for each of the stocks in the S&P500 index.

I decided to use 9% as my target return.

Once I computed the expected returns and the expected volatilities (and covariances) of the daily returns, I was ready to solve the optimization problem.

Visit this repo to see my Python code, which will run as-is on Watson Studio Desktop.

(You will just need to get the dataset into your project.

) Here’s a snippet:The optimization model chose a total of 28 stocks out of the 500 stocks in the S&P500 index, including these as the top 3:COTY — 11.

14%SRE — 9.

14%CHD — 8.

74%Portfolio Optimal AllocationTo solve the optimization problem, I used the decision optimization tool CPLEX from Python, inspired on a notebook from the CPLEX github repo³.

Final ThoughtsThe expected returns I used in this blog came simply from the 2-month period observed in the data set, but you could also determine them using machine learning and AI techniques.

If you’re interested in the predictive side of the project, please reach out and we can collaborate!The data contains only stocks that are part of the S&P500 index.

Obviously, stocks are very risky assets.

Depending on your risk aversion, you could choose bring short and medium term bonds into the mix to decrease the volatility of the portfolio.

Again, I only used data from a 2-month period.

I certainly wouldn’t recommend making any serious decisions based on this data set or any data set with such a short time frame.

Each target return has a corresponding portfolio volatility.

Run this same exercise for different target returns in order to draw the so-called Markowitz efficient frontier.

If interested, please read this great post on plotting the efficient frontier with Python.