Applecore - Emergency Restore

One of the disks in the raid array on Applecore has failed, however, due to the way the array degraded, it took the whole raid array down with it. To correct this we must perform a full restore on Applecore and use the most recent backups available which are roughly 10 hours old.

We apologize for any inconvenience this issue will clearly caused and updates will be posted as the restore progresses.

" If you are the web site owner, it is possible you have reached this page because:
* The IP address has changed.
* There has been a server misconfiguration.
* The site may have been moved to a different server.
If you are the owner of this website and were not expecting to see this page, please contact your hosting provider."

...when trying to access them?

Also I cannot access any of my sites via CPanel or FTP - username and/or password doesn't work

This is because the accounts are being restored now. We really appreciate your kind cooperation and patience while we completing the restore process.

Can someone tell me exactly how things are impacted on our end as far as data is concerned? What time were backups, what happens to any data that has been in our MySql Databases. If the backup was from 8am, I take it anything added after that will be lost? Please confirm

Also, seems like applecore has been down so many times in the last few months. What do we tell our clients about our service who has grown weary of the quality of our web site and e-mail service.

Obviously businesses now days rely on their email extensively. Is there some other type of email service we can use that does not rely on this server?

Why is there a mirror not in place in case issues like this occur.

These are several of the questions from our clients that I have received and I would like to give them a quality response. I've been with TC for 5 yrs now and the issues of recent are making our service look extremely bad.

One of my sites is back (blog), but none of the posts or comments put up since February 5th are extant. Another site is in the same boat, suggesting the backups used to restore these two (most of my other customer sites are still not operating) are more than ten hours old. More like 5-6 days.

I realize this is no fun for anyone, but the increasingly frequent outages my customers have experienced the last month have taken their toll, and one is pulling their site and moving to another host.

What is the status? I have been told within the hour to just a few more minutes and as we can tell -not true. I have clients really upset and I can't tell them anything because you wont inform us. STATUS!

Please note that service like Apache, Mysql, exim will not work till the restore completes eventhough cPanel/WHM is up, that is why the sites are not working now. We are working hardly to restore the services asap.

Please note that service like Apache, Mysql, exim will not work till the restore completes eventhough cPanel/WHM is up, that is why the sites are not working now. We are working hardly to restore the services asap.

Please accept our apologies for the delay.

One of my sites is up and running.... I take backups in a directory DAILY and the last file there was from Feb, 3rd. Am I going to get the files from FEB 4-10th ?

My MySQL databases have been restored from Feb 3rd. Are they going to get RESET to the 10 time period originally mentioned !!!

My sites are still down. I can't access email. And I will probably lose clients over this. So any answers to this statement and questions?

"Can someone tell me exactly how things are impacted on our end as far as data is concerned? What time were backups, what happens to any data that has been in our MySql Databases. If the backup was from 8am, I take it anything added after that will be lost? Please confirm

Also, seems like applecore has been down so many times in the last few months. What do we tell our clients about our service who has grown weary of the quality of our web site and e-mail service.

Obviously businesses now days rely on their email extensively. Is there some other type of email service we can use that does not rely on this server?

Why is there a mirror not in place in case issues like this occur.

These are several of the questions from our clients that I have received and I would like to give them a quality response. I've been with TC for 5 yrs now and the issues of recent are making our service look extremely bad.

At present the data restore is completed, we had to ultimately fall back to a backup snapshot from "Feb. 6th 02:30 EST". Though we do have viable data in place in our 12h incremental backups, the speed at which they were restoring was putting us in an untenable position on a realistic or more importantly a sane restore time line. Having the snapshot from the 6th restored at very least gets everyone back online and the server on an even footing so we can begin work on getting data in from the incremental backups.

There is still a number of tasks we are wrapping up on the server but as of this moment most sites should be operating as intended, we do apologize for the clear inconvenience this whole situation has caused and thank everyone for there continued patience.

I will provide a full recap of what happened with applecore and where we stand on the restore shortly.

Alright let me recap what happened with applecore and where we are now...

Over the last couple of weeks we have had a number of issues with applecore, they were thought to initially be hardware induced so we replaced the servers power supply, swapped out memory and bumped it up to 6GB, swapped out disks in the raid array for new ones and replaced the raid card. Things were looking good for a couple of days, there was no notable outages - things were snappy on the server and we thought the worst of it was behind us.

On Thursday afternoon around 2PM EST the server started to throw file system errors and then hard crashed, upon further inspection the server would not come back online and I unracked it for review. It was quickly apparent that the file system on the main hard disk had catastrophically failed, a run through of fsck (disk checker for Linux) was performed but it was unable to recover anything meaningful. The decision was made to restore the server from backups as opposed to fussing around with disk utilities that would be time consuming and likely not yield any productive results given the state of the servers file system.

Now, some context about the hard disks in our servers -- although we do have raid mirroring (using Areca enterprise level raid) on all our servers, this only protects from (physical) hardware failures of the disks. The raid array has no knowledge of the (logical) file system or its current state, it will mirror data across the array no matter the file system operation that takes place, be it bad or good. So, once the file system had failed on the server by then it had already been mirrored across the array in real-time.

Once we had the basics out of the way with the restore - OS reinstall and server prep kits provisioned we immediately started to restore data from backups. Initially we decided to try restore data from our 12h R1 incremental backups however this proved to be an immensely time consuming task which we recognized after a few hours and made the decision to restore data instead from our weekly local snapshots stored on a physical hard drive instead of in a disk safe across our network.

After the decision was made to restore data from the weekly snapshot (dated Feb 6th 0230EST), the rest of the restore was very straight forward as we have done these type of restores countless times - it was more of a tedious waiting time sink than anything else. That brings us to the here and now, all data has been restored and all services are online and operating as intended.

At the moment we are able to restore data from the 12h incremental backups taken the morning of Feb. 11th 0145EST upon request and for /home data only, we acknowledge that this is not ideal but that is the situation we are in and we express our sincere apologies.

We understand we have inconvenienced you and your customers through this incident, we can promise that a full review of all TCH procedures will be undertaken to ensure this situation never happens again - we expect better from ourselves and you deserve better.

Many of my accounts are back, but some are not. In two cases the sites seems to be up, but MySQL not up yet (cannot find database) and also email is not up. I submitted ticket for those but checking here to see if you are 100% done with restore or not?

QUESTION: do you know for sure (or close to it) that what happened yesterday (what you describe above) was the cause of all the problems this server has had starting January and that the fix done overnight will take care of all of that?

Your other servers do not see to have had such problems so hopefully Applecore is now as good as any of them?

Thank you support for getting the server back up and restored. Hopefully this will be the last of the issues we've had with this server and it will get back to being the rock solid performer it has been since I started hosting my sites here.

As for you whiners you really should cut the support staff some slack. Hardware fails, it's a fact of life. But I am confident that the support staff will get everything back to the way it was.

And if you haven't realized by now the support staff does not monitor this forum for a reason, they are busy working the help desk. Where you should be directing questions about your accounts.

Thanks again support for working through the night getting the server back up and restored. As much as I hate downtime I also know that computers do fail eventually. My hats off to you for your hard work.

I am a Forum Moderator. While I can assist in answering most of your hosting related questions, I am unable to answer questions about specifics relating to your account such as billing and server related issues. Should you need assistance in these areas, please contact our Help Desk or our many other options. Another good place to find answers is with our help pages, tutorials and movie tutorials.

I am a Forum Moderator. While I can assist in answering most of your hosting related questions, I am unable to answer questions about specifics relating to your account such as billing and server related issues. Should you need assistance in these areas, please contact our Help Desk or our many other options. Another good place to find answers is with our help pages, tutorials and movie tutorials.

QUESTION: do you know for sure (or close to it) that what happened yesterday (what you describe above) was the cause of all the problems this server has had starting January and that the fix done overnight will take care of all of that?

Your other servers do not see to have had such problems so hopefully Applecore is now as good as any of them?

Since I do not have access to the servers I can't answer this. But if you ask the help desk maybe someone there can give a definitive answer.

I will say this, if it wasn't the cause the support staff will continue to work on this server until it is.

I am a Forum Moderator. While I can assist in answering most of your hosting related questions, I am unable to answer questions about specifics relating to your account such as billing and server related issues. Should you need assistance in these areas, please contact our Help Desk or our many other options. Another good place to find answers is with our help pages, tutorials and movie tutorials.

A word about the quotas when scripts are installed files that are created by those scripts are assigned to user "nobody". When accounts are restored all the files in the account are set to the correct user accounts. This is why some accounts may be over quota. While these files were not being counted as part of the quota they should be and now are.

I am a Forum Moderator. While I can assist in answering most of your hosting related questions, I am unable to answer questions about specifics relating to your account such as billing and server related issues. Should you need assistance in these areas, please contact our Help Desk or our many other options. Another good place to find answers is with our help pages, tutorials and movie tutorials.

Hardware failures are 100% understandable and a simple fact that we all have to deal with. That alone doesn't bother me and I appreciate that TCH worked hard to get everything back up, but that still leaves me with this concern: This hurt all of us on applecore. Support communication about the ongoing issues with applecore have not addressed anything in my opinion. Over the past month I have asked many questions of the help desk "is there anything I can do? Is it my sites causing this? Are there any upgrades I should make? Should I move to another machine?". The boiler plate "We are sorry, it is being worked on' response did nothing to either answer my questions or make any assurances.

I can't rightly remember how many years I have been hosting here, and one thing I appreciated is that you guys did such a great job that, up until the past month or so, I rarely had to contact support, and generally when I did, it ended up being because of my own screw ups/ignorance more than anything else.

This applecore situation, when we needed some real solutions and some assurances the most, left me feeling hung out to dry. This is the TCH Family Forum. In lieu of a making a hasty decision to divorce, maybe it's time for a little family counseling here. How about it TCH? What can we do to help your service run smoothly, and how can you help us feel confident that it won't be such a travesty should something like this happen again?

Can TCH offer some sort of redundant backup server to resellers? Having an "applecore2" set up, that could be flipped on in an emergency like this, would be awesome. I don't know about the rest of you guys who were affected, but I think it would be worth every penny to pay an extra monthly fee to cover the added costs of such a service.

The e-mail protocol is a resilient one, typically when mail fails to deliver due to outages the remote server will queue the message for up to 72h and attempt to resend the mail at 6 or 12h intervals. In other cases, the remote server might be configured to fail immediately when messages do no deliver, in this situation the sender of the message will receive an error indicating there was a problem and that they should try again later.

I can't rightly remember how many years I have been hosting here, and one thing I appreciated is that you guys did such a great job that, up until the past month or so, I rarely had to contact support, and generally when I did, it ended up being because of my own screw ups/ignorance more than anything else.

This applecore situation, when we needed some real solutions and some assurances the most, left me feeling hung out to dry. This is the TCH Family Forum. In lieu of a making a hasty decision to divorce, maybe it's time for a little family counseling here. How about it TCH? What can we do to help your service run smoothly, and how can you help us feel confident that it won't be such a travesty should something like this happen again?

Can TCH offer some sort of redundant backup server to resellers? Having an "applecore2" set up, that could be flipped on in an emergency like this, would be awesome. I don't know about the rest of you guys who were affected, but I think it would be worth every penny to pay an extra monthly fee to cover the added costs of such a service.

Here is the straight deal, with applecore we dropped the ball - there is no excuses - we screwed up. Normally what happens when we have an issue with a server as consistently as applecore is that we would schedule a migration of the server, it would have involved us restoring fresh backups of applecore to a new system then once completed do a final sync of mysql/account data and "flipping" the IP routing from applecore to the new system. This would have and usually does provide a very seamless migration with no downtime and little in the way of support issues for anyone (do not confuse this with the migrations of when we moved into the TCH DC, that was a different kind of migration). This is something we have done in the past on multiple servers when they exhibit signs of failure but in this case with applecore it was simply not done for whatever reason and we take responsibility for that.

We are undertaking a full review of our restore/migration and backup procedures with changes effective very soon. In addition we will be implementing a stand-alone MySQL backup system that will perform full MySQL data dumps of all our shared, reseller and dedicated servers every X hours (we are thinking every 4 or 6h), this would provide a much more reliable and consistent platform for MySQL data to be retrieved from.

Additionally, we are working on a clustered MySQL project which might also be expanded into a high-availability hosting project where should you desire you can sign up to a hosting solution where everything is redundant from the web server, account data, mysql data, e-mail service down to the hardware side of having raided hard disks, redundant power supplies, redundant memory and so on.

We recognize that hosting is evolving and we intend to work towards a next generation hosting platform for all our clients - not an exclusive service - where 0 downtime & total redundancy is the standard. Please keep an eye out for forum updates as we will be posting some blogs in the coming days/weeks with exact details on how we have improved our backup system and what our high-availability hosting project will look like.

We need to recompile apache to correct some of the settings and add a couple of php modules. Since this involves apache downtime of about 30 minutes, we are scheduling this to the service interval of 12am - 6am EST when the loads and traffic on the server is at its minimum. We will update this thread as we move along.

We need to recompile apache to correct some of the settings and add a couple of php modules. Since this involves apache downtime of about 30 minutes, we are scheduling this to the service interval of 12am - 6am EST when the loads and traffic on the server is at its minimum. We will update this thread as we move along.

Using a new method to add php extensions, we were able to complete the required tasks on applecore with no downtime.

Just curious, why does Applecore not show up in the uptime statistics log as being down for all those hours?youneverknow

Edit...Just saw the down time on the 2/11/10 Logs. But still it only shows 1 1/2 hours down...is that right?

Unlike our internal system that monitors all services, the external system is only used monitor the availability of the web server(Apache). While all services were not fully available to all sites, once the OS and cPanel were restored then Hyperspin seen the server as up. This is not normally an issue with the expection of rare case such as a restore.

We do our best to insure the statistics we provide are as accurate as possible, however, in this case that's just not possible.

As for the entire time frame from failure to repair to restore, the issues with this server begin at 11Feb2010 01:57 PM EST and the primary restore of accounts was completed at 12Feb2010 04:30 AM EST.

I just tried to open all my forums on my data bases, and then tried to open my domain site and there's nothing there which is why I'm here looking for some answers.
It looks as if there was a problem a few days ago but it didn't effect me until now.
Is the fact that I can't access my sites today, related to what's been talked about in this thread?