Ericsson's Centralized User Database has been fingered by O2 for a second network outage which hit the operator last week, and will thus be given the boot despite the £10m cost of a replacement.
Last week's outage wasn't as serious as the 21-hour downtime which hit O2 customers in July, but it was down to the same bit of kit …

O2 were relying on a 3rd party 'bit of kit' that went wrong once (acceptable).....2nd time (*looks for ways for it to never happen again*)....3rd time (*kicks old kit out and uses 'proven alternative solution'*).

Re: So...

Re: So...

...you'd splash out £10million on new kit on a single outage. Please, do NOT ever work for us.

I think you need to put this into perspective.

Their annual upgrade & expansion costs are in the region of £550M. Spending £10M on one piece of kit that has caused two big outages (and maybe causing internal headaches) plus a massive dent in customer (and investor) confidence is a reasonable business decision.

Re: So...

"Their annual upgrade & expansion costs are in the region of £550M. Spending £10M on one piece of kit that has caused two big outages (and maybe causing internal headaches) plus a massive dent in customer (and investor) confidence is a reasonable business decision."

Yes, now it's a reasonable decision - but the OP was suggesting replacing it after ONE incident.

Also, be under no illusions this is one piece of kit - it'll be one system.

Re: So...

When that "bit of kit" cuts off 10% of your customer base(Thats gotta be what 100k people ?) for 24 hrs or so a single spend of £10 million looks essential to me. If people get cut off for a 3rd time for an extended period customers especially business customers will start to jump ship. Its not like o2 are any cheaper than the other mobile companies...

Re: So...

> ...you'd splash out £10million on new kit on a single outage. Please, do NOT ever work for us.

Quite conceivably it won't cost them a penny - I expect a contract for supply of this shiny box comes with KPI definitions and hefty penalties for missing those KPIs. I imagine at least 5 9's uptime is one of the contractual clauses so O2 are probably within their rights for withholding payments.

Similarly, Ericsson will be on damage limitation so will be keen to placate their customer - particularly one spending a million quid a day on kit that presumably Ericsson sell them (maybe not all of it, but probably around 50%, I'd guess).

Re: So...

"...you'd splash out £10million on new kit on a single outage. Please, do NOT ever work for us."

Get real. This was a huge public embarassment, and a failure in their core business of supplying reliable communication services. Not "people couldn't access their bills online", not a remote mast failure cutting off two crofters and their dog, but core service failures affecting millions of people. For the water industry this would be a taps not working moment, for the electricity industry it would be lights out, for car makers it would be recall time, Toyota style.

Even if it had so far been a single instance affecting 7m customers, O2 need to spend whatever it costs to make sure it doesn't happen again, with an upper expenditure limit probably approaching £100m maybe higher.

How come £100m? With 7m customers officially affected in July, that's what, 1.3m contracts ending in the next six months who have been affected? If a typical contract has £7 a month gross profit (I think it's more like £12, but I'll work on the cautious side), and they lose an incremental 10% of those 1.3m affected, then that's lost income of around £11m a year, £22m on typical two year contracts. And that's assuming that 90% of those affected decide to stay with O2, and those with more than six months on contract forget about it at reneal time. And if they chose not to fix it at my up-to-£100m, what happens next time? Another 7m punters persuaded that O2 can't be trusted? Another £22m of income lost, making for a £44m of lost income, and still with a dodgy system waiting to do the same again?

Certainly they need to differentiate between one offs and systematic failures, but business always claims any failure is a one off. People in O2 must have known that the July mess up had a high likelihood of repeating itself, but somebody like you decided that £10m was too much for a one off, and look where it has got them.

three main issue is not placing new masts to fill dead spots in and out doors (less then 10% signal) but that could be more an issue with not having much bandwidth compared to all of the other networks and they use 2100 band as well, so walls Really love three network (i am sitting between 3 t-mobile/three masts problem is 10%< to no signal in doors so it tends to stall the radio on the phone a lot as its switching between the masts, been like that for 5 years)

i cant fault there customer support thought (i know more then most customer service reps do normally, Three they are basically at my level when talking to them so its nice to bad network coverage is not) only issue is when you want to leave that is, it take you an Hour to get connected and they try and keep you by lowering contract price (at least the call is free)

I am impressed. O2 suffered some failures which is bound to have damaged their reputation. This brute force honesty of accepting the failure, identifying the failure and stating they are changing out the problem with a proven solution should not only restore their reputation but I see them as better than I did before.

It would be interesting to see a survey of how much confidence people have in their network has been hit and if their reputation is seen as better or worse after disclosing the facts and attempting to fix the problem.

Even if their new solution hits some bumps I will have far more respect and loyalty to them if they continue to be factual about any issues.

IFFFFFFF they were factual! Although o2 claim they had the problem fixed after 24hours the reality was that users were left struggling well beyond 48 hours and in many cases 72 hours. As a user it was immaterial to me that the failed kit was replaced after 24hours a problem is not fixed until the last user is back to full functionality. The HONEST approach was to recognise the length of time users struggled rather than to tell users all was well to avoid paying compensation.

Funny that...

... I have a Sony Ericsson phone (on 3 network) that seldom has a signal, and when it does it's very weak. It also crashes regularly (probably due to lack of signal) and drops calls, even when it does have a signal. I'm in dispute with 3 at the moment about it has I have had one replacement and two repairs, and the damn thing still won't work properly. I will never again get a Sony Ericsson, and will never again get a contract phone on 3.

@Leebeejeebee Re: Funny that...

Re: Funny that...

I've had Ericsson and Sony Ericsson phones, and also dealt with Ericsson on a professional basis. The phones were the most reliable of any I've owned, and among Scandinavian telco suppliers they're far and away the nicest and most competent ones to work with. I do hope that this blows over for them quickly.

Re: Funny that...

Have O2 UK announced their LTE (I assume that is what is meant by "next gen") core or access vendor? I know they have done trials and O2 DE has gone Huawei. But the UK is a big Ericsson shop (an rightly so its good kit).

Does this mean that the Erlang language doesn't work well?

In my experience O2 internationally has mammothly under-invested in DB admin tools, ignored the few good DBA's that have fleetingly passed through and been driven by the needs of their service suppliers, which in some countries are diametrically opposed to the aims of O2 itself.

The CUDB should be installed on a distributed cluster. How many O2 folk understand that level of technology?

It doesn't matter what software, hardware or DB architecture is used, a budget installation will always be prone to failure. Most of us can recall HLR failures (through poor design and planning) over the last 12 years across several of the O2 /Telefonica operators. Real life testing of failover should be a regular procedure.

well

They spent few million on the project, do you think they have some basic hardware? I doubt it, this is a just a question of bad software/ hardware, wrong strategy, wrong product, salesmen who overpromise to make their money!! Totally useless.

The Consequences of Failure

... of an HLR/(whatever they're calling them this week) are so fundamental that networks are rightly scared of messing with them.

At Vodafone they are still entirely in house kit, running on OpenVMS and then Linux - and with so much network specific customisation over their twenty year lifespan that the supposed 12 month replacement project by Alcatel-Lucent is now approaching its fifth year.

For O2 to let this happen twice is pretty inexcusable, although as a Giffgaff subscriber I was completely unaffected by virtue of the pretty big coverage areas on modern 3G MSCs. Stay put and you'll be OK.

One of the flaws of GSM-ish networks

They were never designed for reliability. That's why they have single points of failure like the HLR (which apparently failed here). There were other network architectures which did have different approaches. For example AMPS (in the US) always accepted the first call from a new phone. It then looked up the identity and wouldn't allow a second call if you were unknown. The German B-Netz in Germany only punched call data onto punchcard, making no real-time verification of the user at all. The German A-Netz had an operator which called you back if you wanted to make outgoing calls.

It's mind boggling to see how much one could save in complexity and cost, if they wouldn't have to bill you for the service. You could use the modern RF interfaces of LTE and simply run Ethernet over it. No MSC, no HLR no VLR and so on.

You only had to look at the O2 and giffgaff forums, plus Facebook, to see that there's a good chance that more than the stated 10% of customers were experiencing issues. Plus, they claimed that the outage started at lunchtime except that in reality it started to slide long before then.