Updates, issues and planned works

Update

This is a notification of a possible low “at risk” issue but we believe this is minimal as we also have our own UPS units covering the services & systems located at HEX.

This is the most recent update from HEX:

“UPS works planned have been unsuccessful. Temporary UPS units are available should we need it. Further replacement parts are being sourced and a more detailed action plan is being drafted in order to resolve the issue as soon as possible.

Resiliency level remains at N and there is no anticipated interruption to your services.

We will provide a further update once resolved or if there is any change to the current situation”

One of our interconnect suppliers will be upgrading their capacity with BTW for DSL services in Telehouse North. This will involve migrating traffic on a pair of interconnects to new, higher capacity ones.

Customer Impact:

We do not anticipate any impact to Merula customer traffic, however all BT DSL services delivered from our Telehouse North node should be considered as AT RISK during this maintenance window.
If you have any queries about this work please raise a support ticket with our helpdesk.

Yesterday was Patch Tuesday from Microsoft. In addition, last night Apple released large updates to both iOS and OS X. As a consequence, all supplier back-haul networks (BT, TalkTalk, Vodafone etc) are seeing a dramatic spike in traffic across their links which is impacting our own services to our customers – those of you on ADSL/FTTC type circuits.

We apologise for this but this traffic should slowly die away over the course of the day so download speeds will start to see improvements.

Please bear with us. If you’re still seeing speed problems tonight/tomorrow, then raise a support ticket in the usual manner.

COMPLETED: All Work was completed during the evening on 26th November and the Data Centre now has full access to both Mains Power, Generator Power and the UPS. There was no outage to hosted servers during the work.

UPDATE: we will have engineers on site on Thursday 26th, to replace parts in our automatic transfer panel. Most of the work will not be service affecting, however there will be a short period where the data centre will be supplied by UPS alone. While we don’t anticipate any disruption, the power to our racks should be considered at risk during this period. All “at risk” work will be carried out outside core business hours and we will have staff and external electricians on site as needed to monitor/resolve any issues found.

UPDATE: the engineers are about to isolate the mains supply and generator and move us over to UPS power. This has been tested to support the data centre for at least 30 minutes but for the period of down-time, we are at risk.

We are aware of an issue with our changeover panel here and an engineer is en-route to work on this.

If the panel needs to go off-line then resilience will be reduced for a short period but the generator and UPS are still available and have just been fully tested as part of our normal weekly maintenance regime and will cut in automatically. As a reminder, the generator has fuel for at least 24-hours of continuous running.

We plan on making some further changes to a couple of our core network routers to isolate and correct a service setting that is causing a few BGP issues.

This may mean a few blips in routing tables as the changes are disseminated and an occasional up-tick in latency as routes converge and stabilise but we expect no downtime or any significant service issues.

15:15 The problem has been traced to one of our core routers which ‘hung’ without (as it should have) notifying the automatic monitoring system of problems. This in turn affected routing for some customers connected to our Telehouse data centre. Some other important switches and routers were also impacted as they were unable to see that this router was down and also failed silently.

We have now restored service to the router and are monitoring for any further issues.

This should not have happened but despite our planning it did and we can only apologise for this; we are working on reconfiguring the core layout very shortly to make sure that this can’t cause such cascading problems for our customers again.

13.30 — we believe that we have resolved these service issues. That said, we are monitoring still and looking for the root cause. Again our apologies for this loss of service and updates will continue to be posted here once we’ve had a chance to check logs etc.

UPDATE: we’re working with our link team as this is mainly affecting services out of our London data centres. Apologies for this extended down-time, we’re all working on this problem and will update here as we know more detail.

We’re aware of and are investigating the cause of outages affecting a number of services inc. some leased lines, ethernet circuits and broadband lines. As soon as we know the root cause & likely time to fix, we’ll update there.

We’ve been advised by BT Openreach that since around 9:46Am today, they’ve been experiencing connectivity problems with a number of their ADSL lines especially in these two counties caused by what they believe is a broken/failed back-haul cable. There’s no time to fix given yet.

Service restored: FINAL: The onsite cable repair team completed the activities by 03:55. All Broadband services were prioritised and restored at either 00:15 or 00:30.

[UPDATE] Cable incident has now been raised as the fault has been proved out from both ends (Colchester & Ipswich). Precision testing officers (PTO) are working on localizing the fault which is 4.72km away from Ipswich telephone exchange. A PTO is in the field driving to the location of Bourne Hill which is in the Wherstead area of Ipswich. The fault has yet to be confirmed but on initial findings it looking like a 96 fibre cable that has been affected.

We’ve been advised by Cogent Network, one of our transit suppliers, of work to be undertaken on some of their core routers. This should not mean any loss of service to us or our end-users as we have multiple diverse routes into and out of our data centres — there may be a few moments of route instability until they re-converge and stabilise.

Date and Time of Maintenance: 20/08/2015 00:00 to 04:00 GMT (01:00 to 05:00 BST)