I propose that the most natural business differentiations should mirror the distinctions made in accounting regulations. It’s not that accounting rules are a natural economic differentiator, but rather that they can and do dramatically affect the presentation of operating results in financial disclosures.

On the other hand, accounting rules distort results even within the exact same vertical. For example, there are several ways “extractive industries” can account for PP&E under both US and Int’l GAAP. While other accounting (e.g., inventory and revenue recognition) methods distort results within industries, the latter factors affect all industries about equally.

Despite the limitations, on balance, I’d say that accounting rules exert more influence on a typical quant’s day-to-day investment decisions than any other business/economic differentiator.

Independently from that earlier thought, what about selecting candidates from within the same sector/sub-sector based on their correlations?

The rationale is to let the market do your homework for you. The more similar companies are, the more likely their stocks will move together. Selecting from within the same industry grouping likely mitigates the spurious correlation risks.

K-means clustering is probably a good stats tool to get the ball rolling. Moreover, K-means is heavily used in AI/ML applications, so there’s some buzzwordiness there.

K-means clustering is probably a good stats tool to get the ball rolling. Moreover, K-means is heavily used in AI/ML applications, so there’s some buzzwordiness there.

Hi David,

Agree. I would have said CART (classification and retrogression trees). But once you had the data in a spreadsheet you could try both and play with a support vector machine method too: in an afternoon. The problem would be getting the data out of P123 or what you did in Python (or R) into P123 for backtesting.

Nodes CAN have a lot of similarities to principal component analysis (eg both are linear weights of factors). I contend that some focused optimization of a node can duplicate the results of PCA: maximizing the variance and even reducing the dimensionality if desired. Even better would be if optimizing a node mimicked Principle Component Regression (could it?). Maybe most of us have been doing some of this this already—whether we gave it a fancy statistical term or not. Maybe we do not appreciate everything P123 is already doing.

But does P123 have the specific data we (or each of us) would want regarding the topic of this post?

You can get SOME correlation data with a custom series. You mentioned correlation data. I would probably look there too.

I don’t know if any of this helps and I do not think I will be focusing on this going forward—unless some of the problems with uploading or downloading of data can be addressed.

I was kind of skeptical of the usefulness of this idea when I read the first post in this thread. But I was reading about unsupervised machine learning today (what this is) and it is a rich topic. And there are clear examples where this method can work.

Just a few random ideas on an interesting topic.

-Jim

We are drowning in information and starving for knowledge. — John Naisbitt.

Jan 6, 2019 6:22:48 PM

Edit 7 times,
last edit by
Jrinne
at Jan 6, 2019 7:36:44 PM

yuvaltaylor

UNITED STATES
Joined: Apr 11, 2015
Post Count: 976
Offline

Re: beyond sectors: ways to classify industries

Good topic. I've been thinking about this a lot recently, specifically how a "technology" sector is basically meaningless as technology is ubiquitous across all sectors. If Disney is streaming video into your home for a subscription and Netflix is streaming video into your home for a subscription, why does Netflix get put in the technology sector worthy of a 105 PE while Disney has the 13 PE.

Well, both businesses are now in the Communication Services sector, so that particular problem has been solved. I thought the revisions to the GICS system in rethinking this sector were very sensible.

These delineations are for US GAAP -- IFRS GAAP has similar standards, but there are also notable differences.

Aside from these topics, I think there are opportunities for consolidating similar groups and then dis-aggregating important differences within sector groups. For example, I think I would either drop or consolidate accounting plans. In addition, I would--for financial statement screening purposes--create four possible groups for extractive enterprises based on how they capitalize exploration and drilling: those that capitalize the fulls costs of exploration/development and those that use successful efforts to capitalize exploration/development. While these things might seem trivial (and, to be fair, they are trivial over full economic cycles), they make big impacts on operating results over quarters and years.

Independently from that earlier thought, what about selecting candidates from within the same sector/sub-sector based on their correlations?

The rationale is to let the market do your homework for you. The more similar companies are, the more likely their stocks will move together. Selecting from within the same industry grouping likely mitigates the spurious correlation risks.

K-means clustering is probably a good stats tool to get the ball rolling. Moreover, K-means is heavily used in AI/ML applications, so there’s some buzzwordiness there.

I just wanted to say that I've spent the last week or two doing exactly this, with some fascinating initial results. Your suggestion of k-means is perfect, by the way--I tried a lot of other clustering algorithms, but k-means worked the best.

I went into this expecting an affirmation of GICS sector classification, and that, for example, health care stocks would correlate well, as would tech stocks, etc. I found the opposite. Health care stocks ended up in four different clusters. The only GICS sectors that remained intact were energy and staples.

Here are my results, in order from relatively undifferentiated to strongly differentiated. I'll be writing an article on my blog and on Seeking Alpha outlining how I arrived at these industry clusters and offering some additional thoughts, but it'll take a few weeks.

We are drowning in information and starving for knowledge. — John Naisbitt.

Jan 28, 2019 1:51:45 PM

primus

UNITED STATES
Joined: Aug 9, 2013
Post Count: 940
Offline

Re: beyond sectors: ways to classify industries

K-means clustering is probably a good stats tool to get the ball rolling. Moreover, K-means is heavily used in AI/ML applications, so there’s some buzzwordiness there.

I just wanted to say that I've spent the last week or two doing exactly this, with some fascinating initial results. Your suggestion of k-means is perfect, by the way--I tried a lot of other clustering algorithms, but k-means worked the best.

Good to hear. And thanks for sharing this! I’d be interested in hearing more about time-periods and sampling frequencies you used in your analysis.

K-means clustering is probably a good stats tool to get the ball rolling. Moreover, K-means is heavily used in AI/ML applications, so there’s some buzzwordiness there.

I just wanted to say that I've spent the last week or two doing exactly this, with some fascinating initial results. Your suggestion of k-means is perfect, by the way--I tried a lot of other clustering algorithms, but k-means worked the best.

Good to hear. And thanks for sharing this! I’d be interested in hearing more about time-periods and sampling frequencies you used in your analysis.

I used the maximum time period for P123, from 1999 to today, buying all the stocks in an industry above a certain liquidity limit and rebalancing to equal weight annually. I then used the correlation of daily returns for my correlation matrix.