July 25, 2014

Pricing benchmarks have been one of the casualties of the financial crisis. Not because the benchmarks-like Libor, Platts’ Brent window, ISDA Fix, the Reuters FX window or the gold fix-contributed in an material way to the crisis. Instead, the post-crisis scrutiny of the financial sector turned over a lot of rocks, and among the vermin crawling underneath were abuses of benchmarks.

Every major benchmark has fallen under deep suspicion, and has been the subject of regulatory action or class action lawsuits. Generalizations are difficult because every benchmark has its own problems. It is sort of like what Tolstoy said about unhappy families: every flawed benchmark is flawed in its own way. Some, like Libor, are vulnerable to abuse because they are constructed from the estimates/reports of interested parties. Others, like the precious metals fixes, are problematic due to a lack of transparency and limited participation. Declining production and large parcel sizes bedevil Brent.

But some basic conclusions can be drawn.

First-and this should have been apparent in the immediate aftermath of the natural gas price reporting scandals of the early-2000s-benchmarks based on the reports of self-interested parties, rather than actual transactions, are fundamentally flawed. In my energy derivatives class I tell the story of AEP, which the government discovered kept a file called “Bogus IFERC.xls” (IFERC being an abbreviation for Inside Ferc, the main price reporting publication for gas and electricity) that included thousands of fake transactions that the utility reported to Platts.

Second, and somewhat depressingly, although benchmarks based on actual transactions are preferable to those based on reports, in many markets the number of transactions is small. Even if transactors do not attempt to manipulate, the limited number of transactions tends to inject some noise into the benchmark value. What’s more, benchmarks based on a small number of transactions can be influenced by a single trade or a small number of trades, thereby creating the potential for manipulation.

I refer to this as the bricks without straw problem. Just like the Jews in Egypt were confounded by Pharoh’s command to make bricks without straw, modern market participants are stymied in their attempts to create benchmarks without trades. This is a major problem in some big markets, notably Libor (where there are few interbank unsecured loans) and Brent (where large parcel sizes and declining Brent production mean that there are relatively few trades: Platts has attempted to address this problem by expanding the eligible cargoes to include Ekofisk, Oseberg, and Forties, and some baroque adjustments based on CFD and spread trades and monthly forward trades). This problem is not amenable to an easy fix.

Third, and perhaps even more depressingly, even transaction-based benchmarks derived from markets with a decent amount of trading activity are vulnerable to manipulation, and the incentive to manipulate is strong. Some changes can be made to mitigate these problems, but they can’t be eliminated through benchmark design alone. Some deterrence mechanism is necessary.

The precious metals fixes provide a good example of this. The silver and gold fixes have historically been based on transactions prices from an auction that Walras would recognize. But participation was limited, and some participants had the market power and the incentive to use it, and have evidently pushed prices to benefit related positions. For instance, in the recent allegation against Barclays, the bank could trade in sufficient volume to move the fix price sufficiently to benefit related positions in digital options. When there is a large enough amount of derivatives positions with payoffs tied to a benchmark, someone has the incentive to manipulate that benchmark, and many have the market power to carry out those manipulations.

The problems with the precious metals fixes have led to their redesign: a new silver fix method has been established and will go into effect next month, and the gold fix will be modified, probably along similar lines. The silver fix will replace the old telephone auction that operated via a few members trading on their own account and representing customer orders with a more transparent electronic auction operated by CME and Reuters. This will address some of the problems with the old fix. In particular, it will reduce the information advantage that the fixing dealers had that allowed them to trade profitably on other markets (e.g.,. gold futures and OTC forwards and options) based on the order flow information they could observe during the auction. Now everyone will be able to observe the auction via a screen, and will be less vulnerable to being picked off in other markets. It is unlikely, however, that the new mechanism will mitigate the market power problem. Big trades will move markets in the new auction, and firms with positions that have payoffs that depend on the auction price may have an incentive to make those big trades to advantage those positions.

Along these lines, it is important to note that many liquid and deep futures markets have been plagued by “bang the close” problems. For instance, Amaranth traded large volumes in the settlement period of expiring natural gas futures during three months of 2006 in order to move prices in ways that benefited its OTC swaps positions. The CFTC recently settled with the trading firm Optiver that allegedly banged the close in crude, gasoline, and heating oil in March, 2007. These are all liquid and deep markets, but are still vulnerable to “bullying” (as one Optiver trader characterized it) by large traders.

The incentives to cause an artificial price for any major benchmark will always exist, because one of the main purposes of benchmarks is to provide a mechanisms for determining cash flows for derivatives. The benchmark-derivatives market situation resembles an inverted pyramid, with large amounts cash flows from derivatives trades resting on a relatively small number of spot transactions used to set the benchmark value.

One way to try to ameliorate this problem is to expand the number of transactions at the point of the pyramid by expanding the window of time over which transactions are collected for the purpose of calculating the benchmark value: this has been suggested for the Platts Brent market, and for the FX fix. A couple of remarks. First, although this would tend to mitigate market power, it may not be sufficient to eliminate the problem: Amaranth manipulated a price that was based on a VWAP over a relatively long 30 minute interval. In contrast, in the Moore case (a manipulation case involving platinum and palladium brought by the CFTC) and Optiver, the windows were only two minutes long. Second, there are some disadvantages of widening the window. Some market participants prefer a benchmark that reflects a snapshot of the market at a point in time, rather than an average over a period of time. This is why Platts vociferously resists calls to extend the duration of its pricing window. There is a tradeoff in sources of noise. A short window is more affected by the larger sampling error inherent in the smaller number of transactions that occurs in a shorter interval, and the noise resulting from greater susceptibility to manipulation when a benchmark is based on smaller number of trades. However, an average taken over a time interval is a noisy estimate of the price at any point of time during that interval due to the random fluctuations in the “true” price driven by information flow. I’ve done some numerical experiments, and either the sampling error/manipulation noise has to be pretty large, or the volatility of the “true” price must be pretty low for it to be desirable to move to a longer interval.

Color me skeptical. Duffie and Stein recognize that the market has a tendency to concentrate on a single benchmark. It is easier to get into and out of positions in a contract which is similar to what everyone else is trading. This leads to what Duffie and Stein call “the agglomeration effect,” which I would refer to as a “tipping” effect: the market tends to tip to a single benchmark. This is what happened in Libor. Diversity is therefore unlikely in equilibrium, and the benchmark that survives is likely to be susceptible to either manipulation, or the bricks without straw problem.

Of course not all potential benchmarks are equally susceptible. So it would be good if market participants coordinated on the best of the possible alternatives. As Duffie and Stein note, there is no guarantee that this will be the case. This brings to mind the as yet unresolved debate over standard setting generally, in which some argue that the market’s choice of VHS over the allegedly superior Betamax technology, or the dominance of QWERTY over the purportedly better Dvorak keyboard (or Word vs. Word Perfect) demonstrate that the selection of a standard by a market process routinely results in a suboptimal outcome, but where others (notably Stan Lebowitz and Stephen Margolis) argue that these stories of market failure are fairy tales that do not comport with the actual histories. So the relevance of the “bad standard (benchmark) market failure” is very much an open question.

Darrel and Jeremy suggest that a wise government can make things better:

This is where national policy makers come in. By speaking publicly about the advantages of reform — or, if necessary, by using their power to regulate — they can guide markets in the desired direction. In financial benchmarks as in tap water, markets might not reach the best solution on their own.

Putting aside whether government regulators are indeed so wise in their judgments, there is the issue of how “better” is measured. Put differently: governments may desire a different direction than market participants.

Take one of the suggestions that Duffie and Stein raise as an alternative to Libor: short term Treasuries. It is almost certainly true that there is more straw in the Treasury markets than in any other rates market. Thus, a Treasury bill-based benchmark is likely to be less susceptible to manipulation than any other market. (Though not immune altogether, as the Pimco episode in June ’05 10 Year T-notes, the squeezes in the long bond in the mid-to-late-80s, the Salomon 2 year squeeze in 92, and the chronic specialness in some Treasury issues prove.)

But that’s not of much help if the non-manipulated benchmark is not representative of the rates that market participants want to hedge. Indeed, when swap markets started in the mid-80s, many contracts used Treasury rates to set the floating leg. But the basis between Treasury rates, and the rates at which banks borrowed and lent, was fairly variable. So a Treasury-based swap contract had more basis risk than Libor-based contracts. This is precisely why the market moved to Libor, and when the tipping process was done, Libor was the dominant benchmark not just for derivatives but floating rate loans, mortgages, etc.

Thus, there may be a trade-off between basis risk and susceptibility to manipulation (or to noise arising from sampling error due to a small number of transactions or averaging over a wide time window). Manipulation can lead to basis risk, but it can be smaller than the basis risk arising from a quality mismatch (e.g., a credit risk mismatch between default risk-free Treasury rates and a defaultable rate that private borrowers pay). I would wager that regulators would prefer a standard that is less subject to manipulation, even if it has more basis risk, because they don’t internalize the costs associated with basis risk. Market participants may have a very different opinion. Therefore, the “desired direction” may depend very much on whom you ask.

Putting all this together, I conclude we live in a fallen world. There is no benchmark Eden. Benchmark problems are likely to be chronic for the foreseeable future. And beyond. Some improvements are definitely possible, but benchmarks will always be subject to abuse. Their very source of utility-that they are a visible price that can be used to determine payoffs on vast sums of other contracts-always provides a temptation to manipulate.

Moving to transactions-based mechanisms eliminates outright lying as a manipulation strategy, but it does not eliminate the the potential for market power abuses. The benchmarks that would be least vulnerable to market power abuses are not necessarily the ones that best reflect the exposures that market participants face.

Thus, we cannot depend on benchmark design alone to address manipulation problems. The means, motive, and opportunity to manipulate even transactions-based benchmarks will endure. This means that reducing the frequency of manipulation requires some sort of deterrence mechanism, either through government action (as in the Libor, Optiver, Moore, and Amaranth cases) or private litigation (examples of which include all the aforementioned cases, plus some more, like Brent). It will not be possible to “solve” the benchmark problems by designing better mechanisms, then riding off into the sunset like the Lone Ranger. Our work here will never be done, Kimo Sabe.*

* Stream of consciousness/biographical detail of the day. The phrase “Kimo Sabe” was immortalized by Jay Silverheels-Tonto in the original Lone Ranger TV series. My GGGGF, Abel Sherman, was slain and scalped by an Indian warrior named Silverheels during the Indian War in Ohio in 1794. Silverheels made the mistake of bragging about his feat to a group of lumbermen, who just happened to include Abel’s son. Silverheels was found dead on a trail in the woods the next day, shot through the heart. Abel (a Revolutionary War vet) was reputedly the last white man slain by Indians in Washington County, OH. His tombstone is on display in the Campus Martius museum in Marietta. The carving on the headstone is very un-PC. It reads:

Here lyes the body of Abel Sherman who fell by the hand of the Savage on the 15th of August 1794, and in the 50th year of his age.

Here’s a picture of it:

The stream by which Abel was killed is still known as Dead Run, or Dead Man’s Run.

@srp-Thanks. Real time might be a stretch, because post-manipulation price movements can help determine whether a particular set of trades was manipulative or not. It comes down to the typical signal extraction problems, and the trade-off between Type I and Type II errors. My preference would be for conditioning on more information/data, which improves accuracy, and imposing substantial penalties ex post. In my view, this is the most cost-effective way to reduce the frequency and severity of manipulative conduct.