I’ve said before that I’m not an expert on Search Engine Optimisation, but it certainly is an interesting area where businesses of all sizes can reap definite benefits from applying a little analysis, maths and science to the problem.

Have a look at this article explaining, in reasonably simple terms, some of the background behind PageRank.

Economic data out for South Africa is painting a good, if slghtly confusing picture. The good news is that the economy is still growing healthily (not compared with China’s official growth though) and inflation has dipped down very slightly. Neither of these are truly unexpected though. Exchange rate appreciation and a relaxation in the dollar oild price will have had a muting impact on inflation.

CPIX (the basket excluding things like mortgage repayments and a few others) increased by 5% year-on-year in October. We only have data for October available now because of delays in collecting, collating and analysing data. This was slightly above forecasts, but nobody’s panicking yet. The same figure for September was 5.1%.
Meanwhile, growth in GDP is also continuing (4.7% annualised) in the third quarter, without showing much impact of the interest rate increases Tito Mboweni has put in place this year. Having said that, if one digs down into the sector details, the interest rates increases have not been completely ignored with property-related sectors showing markedly reduced growth from earlier levels.
Now for the confusing part. The previous quarter’s GDP growth has been upped to 5.5% from 4.9%. First quarter figurees have been upped substantially from 4% to 5%, but last year’s growth adjusted only slightly from 4.9% to 5.1%.

So why all the changes? Well, before everyone goes on about “Lies, Damined Lies, and Statistics”, one should understand that estimating GDP is a tricky task in a developed economy, let alone one with a signficant contribution from an informal sector with limited records and reporting. As it is, there are discrepancies between information such as VAT receipts, money flowing through banks and the official GDP figures. While these measurements will be affected differently by different things, they should have a strong relationship to each other.

I’m going to dig into this as well over the next few months, but any comments or inputs are very welcome!

I’ve had an idea to analyse some of the many topics that come up in conversation time and time again. Chances of winning the lotto, FNB’s Million a Month account, randomness of the iPod shuffle to name a few. I’ll try to get hold of some interesting datasets and perform some basic analysis. My aim?

To find out whether any of the often-claimed techniques might actually work

And so discover what sort of randomness is hidden within these events

And also see which of these datasets I can easily get my hands on. Any thoughts on my chances of getting FNB to chat to me about how they select winners?

Anybody have some data they’d like me to look at? Any more questions to add to this list? I’ve been away from blogging for a while, so let me know when your office Christmas party is so I can prepare your answers in time!

SEOmoz.org provide some great resources on search engine optimisation (“SEO”). Recently, they performed a really interesting analysis comparing actual site traffic for 25 sites that volunteered their data against indicators from a range of competitive intelligence metrics from sources such as Google PageRank, Technorati Rank, Alexa Rank and SEOmoz.org’s very own Page Strength Tool. The stated goals of the project is described in this quote from their page:

This project’s primary objective is to determine the relative levels of accuracy for external metrics (from sites like Technorati, Alexa, Compete, etc.) in comparison to actual visitor traffic data provided by analytics programs. 25 unique sites, all in the search & website marketing niche, generously contributed data to this project. Through the statistics provided, we can also get a closer look at how the blog ecosphere in the search marketing space receives and sends traffic

Now, I’m not yet an expert on SEO, but I do know a few things about data analysis. Whereas their results indicate that none of the measures are particularly useful, I have three points to add:

1 Significance of correlation coefficients

A correlation coefficient does not need to be 0.9 or 0.95 to be significant as mentioned:

Technorati links is actually an almost usable option at this point, though any scientific analysis would tell you that correlations below 90-95% shouldn’t be used.

Roughly speaking, correlation coefficients greater than about 0.7 or 70% explain approximately half the variability in the observed variable (actual page visits). Whether or not this is “significant” depends on the amount of data used to measure the correlation. There are some very specific tests for measures of significance for correlation coefficients – I have summarised the results of one of the standard tests here:

Beyond the technical statistical tests though, I would imagine that there is a great deal of value in estimating a large part of the practical popularity of a website (and presumably page visits is a sensible measure of this) through freely available “competitive intelligence metrics”. On the other hand, if you are looking for a near-exact replica of actual visits, then a much higher correlation coefficient is required.

2 Extending analysis to multiple regression rather than single correlations

OK, this does take the analysis beyond the original stated goal, but it is interesting to see how good a model of actual site popularity we can develop based on freely available “competitive intelligence metrics”. But first, it is useful to consider the correlation matrix between all variables (the “dependent variable” and all independent variables). In an ideal regression model, the independent variables will be uncorrelated with each other. On the other hand, if these metrics are any good, we would expect them to be strongly correlated with each other.
As can be seen from the table above, there are several strong correlations between the independent variables. This can lead to problems with “multicollinearity” for multiple regression technqiues, but since I am trying to keep this post non-technical, I’ll leave that alone for now. It is also interesting that while all the large (loosely defined here as greater than 70% or less than -70%) correlations are positive, there are many negative correlations as well. Thus, some measures appear to be using different information or approaches to provide the metrics. Most interesting to me is that TR Rank and TR Link have a correlation coefficient of -50%. This will be a hint to our multiple regression results…
I decided to use only very basic tools for the analysis so interested readers can perform the same analysis on their own with only MS Excel (generally a fairly weak statistics platform even with the Data Analysis add-in activated). My aim was to find a model that explained more of the Average Visits than Technorati Links by combining several variables together. I had to exclyde Compete Rank and Ranking Rank due to the limitations of Excel’s regression tools. I would measure “good” models by having a high adjusted R-squared, and significant and sensible estimates for individual variables as well. The results of a “good” model (although not necessarily the best since I did fairly quick and dirty model selection) are given below:

The model has a “Multiple R” (which is intuitively analogous to the normal Pearson correlation coefficient) of 89%, and the model explains 80% of the variability in Average Visits. Other measures of goodness of fit include a high Adjusted R-squared (relative to other models fitted) of 71%, a F-statistic for overall model significance of 9.5 which gives a significance level or p-value of 0.00008 and low p-values for most independent variables included in the model. The intercept itself is not signfiicant, but we leave it in to improve the overall fit of the model. Similarly, while the significance level for Alexa Page Views is relatively high at 17%, it does add to the overall model in terms of fitting the data well.

Again, very interestingly but not surprising by now, many of the coefficients are negative. This implies that, at least once adjusting for the other variables, these measures are associated with lower rather than higher Average Visits. This suggests more analysis and more data is needed to understand the dynamics here properly!

3 Quality and quantity of data
This leads me to my final comment. 25 Websites, while great to have even this much data, is not really anywhere close enough data to analyse this problem. This isn’t because of the small size of 25 sites in relation to the total available websites on the ‘net, but rather to do with the spread of sites across the different types of websites and the potential to fit the model too closely to the exact data provided rather than to some underlying reality. Again, this is a difficult area to discuss correctly and thoroughly without becoming very technical so I’ll leave that well alone too.

Final comments

This analysis and presentation of results is very lite for something this interesting. There is an enormous amount more that could be done with time, energy, more data, and, for my part, a better understanding of how each of these competitive intelligence metrics are intended to work. I’d welcome any comments on what analysis would be desired (time-series? Non-linear models? More detailed regression? Rank correlation?) and whether there is any chance of getting more data. I’d be very happy to dig deeper and post the results here and/or directly on SEOmoz.org

One of my favourite quotes is by George Box: “All models are wrong, but some are useful“. If you work with models and understand their place in the universe, you may already agree with this too. However, there is more than one type of wrong, and while it is not always possible to tell which is which when the milk has been spilt, the difference is important.

Models are always wong in that they aren’t a perfect replica of the “real thing” being modelled. Some may argue exceptions and that some models do perfectly model the underlying reality – I haven’t been convinced yet. The fundamental point is, if the model is the same as reality, what is the need for the model?

The purpose of most models is to provide a useful way of understanding an extremely complex system. Extremely complex systems are difficult to understand in their entirety. Economists are regularly getting bashed for throwing dangerous phrases like ceteris paribus around in their commentary and conclusions. Why the insistence on holding all other things equal? Because their model is only complex enough to understand a few components of reality and so is wrong when it comes to those other areas. This is problematic when those other things turn out to be important and unequal. The technical term for these models is “not useful”. I’ll give George the credit for this term too.

Nobody said it was going to be easy…

To build a useful model, that is. Understanding the benefits of modelling specifics components requires and in-depth, often intuitive feel for the problem at hand. A consultant brought in from the outside won’t necessarily have this unless the problem is a common or generic one. A good consultant will spend a significant amount of time listening and understanding the problem, the environment and the broader issues that will influence the real benefits drivers. Recognising the costs of modelling individual pieces of the problem is more a technical problem. Knowledge of model-building approaches, computer systems and applications, statistical techniques and actuarial projections, database management and data-mining, logical thought and system building all come into the process. Knowledge is required, but there’s often little substitute for experience too. Throw in some serious academic training too and we can start to hit Excel.

But what about the other Wrong?

The wrong I’ve discussed so far is a pretty mild sort of wrong. Intended, required, carefully thought through and ultimately useful. But what about Wrong in a simpler form. Wrong because a mistake was made? Wrongbecauseaspreadsheetincludederrors? The real-world experience of model errors small and very, very large is compelling. Mistakes do happen. This post doesn’t deal with how to prevent reduce errors (plan, document, independent review etc.), but rather with how one classifies an error once it has been discovered.

A recent example I experienced was where a mistake had been made. Unfortunately for everyone it was one of the large, conspicuous and nasty types. The cause of the mistake could have been anything from incorrect proprietary models to incompetence, with lack of judgement, lack of review, weak control processes and lack of ownership of risk management protocols floating around somewhere in between. It is impossible to tell what was intended at the date the mistake was originally made since there is no record of what was intended, why it was done, how the decision was made, what checks were performed and who gave the thumbs up to go ahead. Unobserved and unrecorded history makes for compelling spy stories and thrillers, but not so great on the dry high school textbooks.

The little-known Wrong before other wrongs

Given the story above, the Wrong seems to be the lack of a clear objective stated at the outset, with clear understanding and documentation of this objective at the start. So often, the simple act of framing a problem correctly makes giant leaps towards it resolution. This is often the Wrong that precedes other wrongs:

Know what you are trying to do;

Make sure you understand why; and

Be clear and specific about describing it so that you and everyone else are on the same page.

Another of my favourite quotes is by George Bernard Shaw: “The single biggest problem in communication is the illusion that it has taken place“. That last bullet above isn’t as simple as it seems.

I haven’t discussed measurable marketing initiatives yet, which is a shame because it’s one of the most important “fuzzy business decisions” that most companies assume don’t require hard analysis, measurement and good business judgement. There is a long-standing (and important) debate around whether creative advertisements achieve the stated objective of the advertising campaign. I don’t know the answer to this question, and I suspect the answer is frustratingly along the lines of “it depends”. However, an easier debate is:

Are the stated objectives of marketing campaigns sufficiently closely related to critical business goals?

The answer is often “What business goals? This is just marketing.” This answer simply isn’t good enough.

Larry Bodine, who writes a blog that attempts to coax and drag law firms into the New Age of marketing professional services, had an interesting piece on the importance of measurable results from Chief Marketing Officers for law firms. His post is written more from the perspective of the budding new CMO, but it’s an interesting read for anyone looking at measuring marketing success. His article made me realise that I need to give this topic some more attention over the next few weeks. More on this later then.

The book, “Freakonomics”, has become something of a pop icon amongst certain groups. The Stevens (well, Steven and Stephen, Levitt and Dubner respectively) cleverly show how economic analysis can shed light on some interesting everyday (and not so everyday) problems and observations. I thoroughly enjoyed it while not agreeing with each and every word. Entertaining, yes. Insightful, definitely. A work of pure genius, probably not. Overall a worthwhile read for those interesting in either of economics or life (and I’m not tying myself down to admitting that these may be mutually exclusive) would benefit from reading it. If it does tend to get a little slow after a while, persevere through to the end. Even if only to make sure you can laugh and nod in the appropriate places during the next cocktail party with that same certain group.

On the other hand, if you are looking for a reminder of what economics is all about on a slightly more technical note, Tim Harford’s book, “The Undercover Economist” is definitely worth a read. I gather it’s partly derived from articles written for the Financial Times magazine where Mr Harford has a column. I enjoyed Freakonomics enough (and have enough respect for Steven Levitt) that when I read on the front cover “‘Required Reading’ Steven D. Levitt” I picked it off the shelf from Exclusive Books with hardly a glance inside.

The Undercover Economist takes the reader from the basics of economics (allocation of scarce resources) through perfect competition and into the bowls of market failures. Some of the examples that particularly resonated with me were second-hand automobiles and health insurance, one of which is an area I know a fair bit about and it’s not cars.

When should companies hedge their exposure to financial and commodity risks? Is it always wrong or always right or somewhere in between? This topic arises every now and again when a goldmine is found to have made losses (and we’ll distinguish shortly between different types of “losses”) as a result of gold price increases when gold has been sold forward or they have an effective short position in gold through derivative exposures. Sasol(2) has also been in the news for making apparent losses on hedges put in place.

What is a hedge and how do companies hedge

A hedge can be constructed in several ways.

The commodity may be sold forward such that the price is agreed today but the goods are only transferred in future.

A future may be sold (or shorted) on the OTC market or on an exchange which achieves a similar overall effect.

A put option on the commodity may be purchased such that the company is protected from price decreases

Other structures may be created in conjunction with banks, hedge funds or other capital market participants

The net result is usually that disadvantageous price movements (usually price decreases for a commodity producer) result in profits on the hedging instrument that offset losses as a direct result of lower revenue from sale of commodities. When commodity prices decrease, the company doesn’t make as large losses as it would without the hedge. The catch? When commodity prices increase, the company doesn’t make as much profit as it would without the hedge. So the company ends up with lower volatility of earnings.

The problems with hedging

Hedging accurately requires exact knowledge of future volumes and dates of sales. Even the most stable goldmine has some variation in output from month to month. When predicting volumes several years into the future, the uncertainty increases in what is often described as an expanding funnel of doubt. If the hedge is put in place to offset the exposure of 100 units of gold and only 50 units of gold are available to be sold, then the hedge will overcompensate for price changes to the extent that the company will, perversely, likely make a loss when the price rises and a profit when the price drops! Similarly, if there is actually 150 units of gold available to be sold, the company will only have partial protection against commodity price movements.

Secondly, movements in the price of the hedging instrument may not exactly offset movements in the value of the underlying commodities to be sold. This is known as basis risk. An easy example to understand this is with wheat futures – the exact quality, type and transportation costs for the specified future may not be the same as the company’s own crop. Alternatively, if the hedge does not have the same maturity date as the sale date for the commodity, deviations in price for the hedging instrument on the market due to liquidity or other issues could result in differences too.

A more complicated issue reflects accounting rules for hedges and natural resources. IFRS requires derivatives to be carried on the balance sheet at market value in most instances. However, the extent to which gold deposits still in the ground are recognised as an asset depends on several other factors. The scenario can arise where the full loss on the derivative must be taken through the income statement in a particular period, but the increase in the value of the gold deposits in the ground is not recognised because the resources haven’t been proved with sufficient accuracy. This has often been the argument put forward by executives looking to explain accounting losses on hedging transactions. There is also the potential for this to be used as an excuse to hide the real economic losses.

The usual arguments against hedging

The most common argument against companies’ hedging their financial or commodity risks is that there is no need. Investors can more effectively manage their own exposures to various risk factors and adjust their exposure themselves. Many investors, wanting geared exposure to a particular commodity, will look for unhedged, marginal mines that are extremely highly exposed to changes in the price if the underlying commodity. If the gold mining company itself hedges the risk, it is no longer of interest to the investors.

The argument follows that management of a gold mine should worry about mining gold, and that maize farmers should worry about maize farming. Leave the sophisticated financial and derivative decisions up to the experts.

Can hedging be good? Aka why the previous argument is flawed.

There are three clear flaws in the usual argument against hedging.
Firstly, if investors are so adept at adjusting their risk exposures with derivatives and fancy structures, why do so many of them like marginal mines to get their geared exposure? Chat to many resources punters and investors and, while many will stick with the tried and trusted, there is significant demand for marginal gold mines too. If all investors could get the same exposure through derivatives and other hedges, this demand wouldn’t exist.

Secondly, surely the gold miner has the best information relating to the volume and timing of gold sales and can thus hedge more effectively than an outside investor? If the company is hedging 100 units of gold exposure, it is likely that each individual investor is hedging close to 1 or maybe 10 units of exposure. Economies of scale do play a role in getting the best price in hedging transactions (admittedly it can work the wrong way round too). Thus, it is not clear that investors are in a better position to be able to hedge the resources company’s exposure.

Finally, and most importantly, pulling gold of the ground and planting maize require long-term planning. Planning acreage of planting, capital investment, labour quantities and transportation contracts accurately requires solid estimates of future volumes and revenues. For a gold miner to concentrate on mining gold he or she will need to know what rate of return can be expected on sinking a new shaft or reopening an existing shaft, hiring more staff and importing machinery from Canada. This is difficult with the uncertainty of a volatile gold price. How can a farmer choose what mix of crops to plant without knowing what the final revenue from each unit of crop will be? The very argument that advises the natural resource companies to not hedge and stick to their knitting is, in my view, the strongest argument for them to hedge.

The question that needs to be asked is, “Can the company more efficiently focus on its critical success factors by hedging?” If the answer is yes, then hedging is the way forward. If the answer is no, then it might be better left to investors.

This is by no means the final word on whether hedging is appropriate or not. It’s hardly even scratching the surface. However, everytime a journalist or analyst gives a gold mine a hard time about hedging when hedging was the right thing to have done at the time then we discourage our miners from hedging. A little more insight and more carefully thought-through analysis should lead to better decisions overall.