Silence from Corethree as app outages cause travel chaos

Public transport users across the country have been experiencing problems with their ticketing apps because of technical problems at software developer Corethree. Beate Kubitz reports

28 September 2018

The Twitter-sphere has been full of people complaining about being unable to use their apps, including Transport for Greater Manchester’s ‘getmethere’

The Twitter-sphere has been full of people complaining about being unable to use their apps, including Transport for Greater Manchester’s ‘getmethere’

Corethree bills itself as the “world’s leading mobile ticket provider”. It issued its 50 millionth m-ticket in February this year – sold through apps developed for transport operators large and small across the UK. Corethree’s customers include a wide range of operators and authorities including First Bus, Arriva, Santander Cycles, Translink, Lothian Buses, My Get Me There (Transport for Greater Manchester), McGills Buses and Go Ahead.

But, on 3 September, everything went wrong. Reports of apps ‘going down’ started circulating. Things were no better on 4 September as people up and down the country suddenly found that previously bought tickets vanished as they tried to board buses (or that the app simply crashed when they tried to hire bikes).

Chaos ensued as drivers demanded cash fare payments, people were unable to travel, and customers complained bitterly on social media. Wrong-footed operators suggested various tactics from carrying (and presumably using) cash, showing emailed receipts as proof of purchase and asking people to send in claims for additional tickets they’d been forced to purchase.

Transport for Greater Manchester resorted to asking people to register for the smart card version of the My Get Me There card (which, incidentally, does not link to the back office of the app, so people cannot use tickets previously purchased on the app using the smart card).

This wasn’t a momentary failure. For over two weeks travel and ticketing apps across the UK – from Santander Cycles in London, to Lothian Buses in Edinburgh and including bus companies large (First) and smaller (McGills), were provoking complaints. Over the course of the two weeks the advice morphed into ‘buy your tickets out of peak hours and put your phone onto airplane mode so that you can see the ticket whilst using the bus’.

By 12 September First Bus was advising passengers to update their app to one that would address the issues. Complaints to Lothian seem to have subsided somewhat around 13 September. On 18 September TfGM said it was still “continuing to work with our supplier” and noted a “significant improvement in the service we have been able to provide over the course of the last week”. But even the improvements haven’t all been plain sailing with customers confused by having to re-enter credit card details.

It’s been a public relations disaster for operators besieged by angry travellers, particularly as the shift to mobile ticketing has been seen to speed up boarding times and provide invaluable traveller information. It’s also a salutary moment for those who advocate the shift to mobile. If we can’t get a ticketing app to work consistently and reliably, what hope is there for Mobility as a Service?

Before giving up on m-ticketing, let’s examine what may have gone wrong.

The short answer is that we can’t know for certain. Corethree has remained resolutely silent on the subject. We invited the company to comment by email then followed up by phone. The receptionist firmly said they wouldn’t comment and put the phone down. We followed this up by a tweet just to check that this was company policy – and received complete silence.

There are several things about this that are worrying. One is practical. None of this technology operates in a vacuum. For mobile ticketing to work requires a number of companies’ systems to work with each other – and if one element goes down the best way to get things going and prevent it happening again is to be transparent about the hows, the whys and the fixes.

The second is that secrecy is generally a bad way to run a company. Good practice in communications crisis management focuses on transparency and openness in order to re-establish trust. This applies to reassuring operators, other partners and travellers.

A few years ago, apps could be sold to operators as a ‘nice to have’ feature. We’ve now reached a stage where customers are reliant on this technology – and operators benefit from them using it. There needs to be consumer confidence and trust in apps, which can only be earned by a rock steady track record and a consumer-focused attitude to service provision. By comparison, glitches with Transport for London’s Oyster have been met with open barriers and automatic refunds for any passengers charged the maximum fare because they’d only touched in or out in one direction. Lastly, if there isn’t openness about how a system failure happened, how can it be avoided in future?

Why did the system breakdown?

By not providing information, there’s been quite some speculation about the failure – was it some element of hardware, software or an inherent system design flaw?

The suspicion however, is that this is a failure by the company to plan and invest for scale. Despite the lack of detailed information there are clues from the manner of failure. Operators have issued statements including this from TfGM: “The issue was caused by a technical fault with the app relating to unprecedented high levels of customer demand that coincided with the return to work and education for a lot of people following the summer break.”

Developers working with big databases that have to deal with a lot of transactions have to design them with sufficient capacity to be able to handle daily peaks and seasonal or other huge increases in use. For instance, shifts in travel patterns to respond to severe weather can increase usage many fold. This means that databases – and therefore the servers hosting them – can suddenly experience massive surges in the volumes of queries per second. If they don’t have enough capacity to manage these queries this can put the system in existential danger.

M-ticketing relies on more than one company – the ticket is just part of the system of hardware, APIs (data streams) and apps. The many different components and services that an app depends on mean that any one of them going offline could stop the app working.

It’s quite possible that Monday 3 September – the first day back for many school students and people returning from holiday – could have a sufficiently big impact on one element or service that imperilled the entire system.

A second confirmation that this is an issue around scaling (whether server capacity or system architecture) is in the advice to frustrated customers from operators. Firstly they asked people to buy tickets outside peak hours (thereby smoothing the demand on servers) and then put their phone into flight mode on boarding the bus to display the ticket (obviously this means they won’t be able to use their phone for anything else!).

We assume flight mode would trigger the app’s offline mode so that the app cannot then constantly contact the server (which would add further queries to its burden). These elements imply a capacity issue with servers unable to deal with the number of queries per second when in use.

How can this be prevented in future? Checking through the download numbers for the apps it’s obvious that we’re looking at many hundreds of thousands of users. It seems that, whilst Corethree has been successful at selling its services to operators, the increase in people actually using the apps has come as something of a surprise.

This really shouldn’t be the case – a quick look at social media for the past year shows that Corethree apps suffered failures on the first Monday of September last year, although with a smaller number of users the impact was less severe and less noticeable.

At this point, it should surely have invested time and effort into building more capacity for dealing with both year-on-year growth and seasonal peaks.

With silence from Corethree in response to questions about how it intends to prevent this happening again, other suppliers are more willing to talk about the best practices to avoid capacity issues arising.

Dave Hulbert, engineering director at Passenger, said: “Operators need to ensure their suppliers are using industry best practices. In software reliability engineering, this includes, at minimum: redundancies to balance load; fault tolerant software to allow for imperfect computer networks; practiced disaster recovery procedures; and the ability to scale to demand.”

Without a statement from Corethree it is very difficult to assess whether any of this is in place.

In the meantime, operators are looking at their options. McGills has announced the launch of a new app with a different supplier in October and Transport for Greater Manchester has suggested it’s reviewing sales channels in a statement: “Such is the pace at which technology develops and improves we continually review the channels that we use to ensure they meet our customers’ needs and have recently started a market sounding exercise. In addition, we are due to introduce some significant changes on Metrolink early next year that will set the foundations for a better integrated, simpler and easier to use transport network.”

Perhaps now is a good time to think about contracts in the industry. As apps become more integral to business, their failure becomes more critical to business processes. Whilst we have not had access to specific contracts, there are likely to be penalties for down time in current app provisions but less likely to be revenue loss penalties (unlike the costs of TfL Oyster outages which are covered by contractual penalty clauses).

Password Change Requirement.

As part of a number of changes we have made to comply with GDPR regulations and to give our readers greater control and assurances over thier personal data we have implemented a major security upgrade to TransportXtra.

Our new control require that all users update thier passwords by clicking the button below.