Research Paper Finds Fix Tracking Error Increased Post-Reform

A new research paper that looks at trading around the WM Reuters benchmark fix between 2012 and 2017 argues that while the mechanism has been made more robust and less open to manipulation, the shift to a five minute window has made actually achieving the benchmark harder for some market participants.

The paper, authored by Martin Evans from Georgetown University, Peter O’Neill form the UK’s Financial Conduct Authority, and Dagfinn Rime and Jo Saakvitne from BI Norwegian Business School, uses what the authors term “a unique dataset that allows us to identify the actions of individual traders”.

They add that these data (from Thomson Reuters Matching) provide new insights into how trading decisions affect the properties of the fix benchmark, and how the presence of the fix affects trading patterns. The analysis seeks to assess the impact of first the revelations around dealer misconduct at the Fix, as well as the consequent reform of the benchmark mechanism recommended by the Financial Stability Board in 2014.

The paper classifies and measures the usefulness of the fix rate along three dimensions: how closely it represents rates throughout the day (representativeness); the extent that market participants can replicate the fix rate through their own trading (attainability) and how resilient it is to manipulation (robustness).

It finds that the representativeness of the benchmark has increased after the lengthening of the benchmark window in 2015 and that, after this lengthening, its robustness also increased, The research also argues, however, that this has come at the cost of a reduction in attainability.

In terms of the Fix being representative, the paper finds that short-term price reversals in prices around the fix decrease steadily throughout the sample period, and disappear from 2015 onwards. This coincided with changes in trading behaviour of several types of market participants – most notably dealer banks began doing relatively less trading before the fix and more during the fix. The total trading volume of dealers that were subsequently fined for rigging decreased by one fifth, and direct trading costs in the largest currencies in the sample decreased by 5 to 10% relative to other times of the day.

Perhaps most controversially in the paper’s findings, it argues that attainability – a particular concern for trade-based benchmarks and often referred to as tracking error – was reduced after the FSB’s recommendations were implemented. “We find that the change to lengthen the reference window, which was recommended by the Financial Stability Board and implemented by WM/R, reduced attainability (or tracking error) by a magnitude of between 2 and 5 times for the largest currencies in our sample,” the paper states. “This significantly increases the tracking error of market participants, and thus trading costs, for those participants that use the benchmark for rebalancing purposes.”

On a more positive note, the paper does show that that the changes implemented in 2015 to increase the fix window have increased robustness. “We show that the introduction of outlier trades in a simulated price series, has half the impact with a 5-minute fix window in comparison to a 1-minute window,” the paper states. “However, we also show that the impact in both settings is economically small, at less than 1 basis point. We suggest that this is because the existing benchmark design – its sampling method and use of medians – is highly robust to our method of simulating outlier (manipulative) trades.”

After the revelations of rigging in 2013, the paper finds that trading costs during the fix have decreased, in the form of lower quoted spreads. After the lengthening of the fix window in 2015, quoted spreads and price impact rose, while order book depth decreased. “These aggregate effects coincided with changes in the trading patterns of participants, particularly an increase in ‘aggressive’ or ‘liquidity-taking’ trading behaviour of high frequency traders around the fix window,” the paper states.

Reinforcing the existing viewpoint, the paper documents how, despite “much controversy” following the dealer collusion revelations in 2013, the benchmark is still “very important”.

Both trading volume and the composition of participant types are broadly unchanged over the sample period, however, it does observe significant adjustments in trading patterns after the two key events in the sample.

The authors note that their findings highlight the general trade-off that exists between attainability and robustness. For example, the benchmark calculation method ensures uncertainty about which trades are selected in its sample, which makes the benchmark harder to manipulate but also harder to attain.

While the authors observe that any proposed change to a benchmark “should not be examined in isolation, without taking into account the likely adaptations by market participants”, the do argue that their findings have several implications for the design of the 4PM Fix, and for benchmarks more generally. These are in respect to appropriate window length, minimum trade sizes, and sampling and weighting decisions.

As noted, they find that an increase in the size of the window of inputs used to calculate a benchmark results in increased tracking error (or reduced attainability) for participants trying to replicate a benchmark price. “Therefore, benchmark administrators and regulators should be mindful that efforts to increase robustness must be weighed against attainability costs,” they state. “We find that participants can significantly reduce their tracking error by splitting their fix orders over the reference window, but they may be unable to do so due to the large minimum trade size requirement of the reference market…the average trade size in the fix is between 1m and 2m, for participants that utilise it. This means that participants are already splitting orders as much as the minimum trade size of 1m USD allows them to. The large minimum trade size also means that smaller trading participants experience larger tracking error than larger participants.”

The authors go on to argue that their results for attainability and robustness highlight the tension that exists between these properties, and note that they have shown that a lengthening of the calculation window decreases attainability substantially, but only improves our robustness measure by a small amount. “Several of our results stem from the use of a median in the WM/R benchmarking methodology, and it is therefore natural to ask whether the trade off between attainability and robustness can improved upon by using another location estimator in the benchmarking procedure,” they write, adding that to quantify the choice a simulation methodology can be used, but instead of studying the deviation between the ‘clean’ and ‘dirty’ benchmarks using medians as the benchmarking procedure, different benchmarking procedures can be considered.

They also argue that the results highlight that the median is, in a sense, “an extreme choice of benchmarking methodology – it has good robustness properties but very poor efficiency, and that alternatives exists with almost equally good robustness but much better efficiency.”

The paper also adds that the choice of sampling only one trade per second for the benchmark improves the robustness of the benchmark, as a would-be manipulator’s trades cannot guarantee their trades are selected, but this choice also diminishes attainability. “We think that increasing the number of trades sampled within a second would not improve attainability significantly, as the intra-second volatility is small relative to the inter-second volatility, and trades that consume multiple levels of liquidity are rare. The same is true of the choice to not use volume weighting in the benchmark, though we argue that the large number of single share executions means that this does not impact attainability significantly.”

The authors argue that reductions in attainability are important, as they may lead to participants deciding not to trade at the benchmark, which then result in negative liquidity agglomeration effects that then diminish the benchmark’s representativeness. “There is already evidence of this in the case of the popularity of NEX market’s ‘eFIX’ pre-fix netting product, they continue. “This means that flow that would have been executed within the fix window is instead executed outside of it. The netting facility relies on, but does not contribute to, fix price discovery. This is a similar case to dark pool venues in equity markets that reference the lit market price to match orders.”