I didn’t have my load-monitoring script running on the front end because I wasn’t that worried about it. But I am now.

My htop session had died but was spiked over 21.5 when it did. For a general idea, a load of 1 == 1 Loaded CPU thread. We have a capacity of 6 threads so a load of 6 == Server is at capacity. A load OVER 6 == We’re over capacity, but if things slow down a bit, there is a chance to catch up.

As best I can tell at 3:11am Central time (4:11am eastern/my time) LogRotate ran and HUP’d all the services so that it could properly rotate the log files, and Apahce threw this error:

I think the CPU load was spiked way over 21 by that point (my htop session died about 12:53am when it showed 21.49 for the immediate average, and 14:50 for the 15 minute aggregated average load. It was climbing and I don’t have any confidence that it would have been able to drop far enough in 2 hours for this to not be the issue when logrotate ran.One front end is NOT going to be enough for BigCloset while using PHP5.6 and we aren’t going to be able to get PHP7/HHVM working for probably a couple weeks (possibly months) at best. And it won’t be an off the shelf solution. At best it would be me tracking down code errors, profiling them, and submitting patches to various Drupal code projects and hoping they get accepted to mainstream and us running custom-patched code till they do. We can either load a 2nd Front end, or load more CPU cores onto the one we have.

I will tweak and install my load monitoring script later. I was originally written for a situation just like this. Where we were throwing too much at a dieing server and wrote a script to watch for dead processes and high loads and kill things appropriately. We eventually fixed this by opening our Bridgewater,NJ pop.

-Piper

]]>http://storyportal.net/2016/02/bigcloset-cloud-more-resources/feed/0Front End – Load Spikinghttp://storyportal.net/2016/02/w-htm/
http://storyportal.net/2016/02/w-htm/#respondSat, 27 Feb 2016 15:07:32 +0000http://storyportal.net/?p=529… Read More →]]>Quick Note, we are DEFINATELY spiking usage on the front end. I’ve seen load average spikes over 15.0 which means, 15 cores worth of usage basically. But it only has 3 real cores, and 3 fake cores (6 total threads). It’s been recovering mostly on it’s own, but I’m afraid with using PHP56 we WILL need the 2nd front end.

HHVM and PHP7 do a lot of redudant call optimizaions which is what makes it MUCH leaner process wise.

I have officially downgraded the cloud cluster. It’s now longer using PHP7 or HHVM since neither was providing a stable environment for which we could properly operate Bigcloset/Topshelf.

Belle (front end server) is now operationally running PHP5.6

Belle, can be switched over to HHVM for limited testing as needed as we try to track down HHVM issues and fix them in code, but right now, the full/operational stack is as follows.

Belle (Front End)

Apache 2.2

Memcached

PHP5.6 (remi stable release)

Ariel (Back End/DB Server)

Percona 5.6 (release 76.1 Revision 5759e76)

Things seem to be operating within acceptable limits at the moment. I’ve had things switched over for maybe 10 minutes at this point. The site is still fast in places, a bit slower in others. It is DEFINATELY stressing the front end server more. Jumping the load average from under 0.5 consistently to over 2.0 consistently.-Piper

]]>http://storyportal.net/2016/02/bigcloset-cloud-downgrade/feed/0Issues Foundhttp://storyportal.net/2016/02/issues-found/
http://storyportal.net/2016/02/issues-found/#respondFri, 26 Feb 2016 13:07:43 +0000http://storyportal.net/?p=524We tracked down what seems to be crashing HHVM and are researching possible fixes.
]]>http://storyportal.net/2016/02/issues-found/feed/0Update on BigCloset – Cloud Editionhttp://storyportal.net/2016/02/update-on-bigcloset-cloud-edition/
http://storyportal.net/2016/02/update-on-bigcloset-cloud-edition/#respondThu, 25 Feb 2016 05:07:28 +0000http://storyportal.net/?p=522… Read More →]]>We are currently up and running using HHVM via Cloud in a “cobbled together” way. It is a less than ideal setup so I still need the VM re-imaged.

The method we have cobbled together right now seems to “fail” for HHVM every 8 hours or so in our initial tests, so I’ve written my own set of scripts to monitor the site and take automated action based on results. It’s kinda primitive but works

]]>http://storyportal.net/2016/02/update-on-bigcloset-cloud-edition/feed/0So the Site is up, but it’s also fixing itself.http://storyportal.net/2014/09/so-the-site-is-up-but-its-also-fixing-itself/
http://storyportal.net/2014/09/so-the-site-is-up-but-its-also-fixing-itself/#commentsThu, 25 Sep 2014 13:13:34 +0000http://storyportal.net/?p=518… Read More →]]>One of the longest steps in this whole process is one known as “Rebuild Permissions”. The BIGGEST issue with it was it seemed DESTINED to fail while the “WALL” was up, so to do it properly, we felt the need to take-down the wall and start letting people in.

Untill the permissions database is fully built, you will notice some stuff missing. Some OLDER stories will show up on the front page with new dates. Same with comments and such. But it IS being worked on.

The three of us pulled a Marathon 2/3 days getting this done, bulldozing every roadblock we ran into at full speed, and then rebuilding using the pieces left behind at each step.

I will post again here, and on BigCloset itself when it seems to be finished with it’s permission rebuild. and hopefully we can finish the lengthy process of fixing any lingering errors.

-Piper, Erin & Cat.

]]>http://storyportal.net/2014/09/so-the-site-is-up-but-its-also-fixing-itself/feed/1Sorry about the Delay….http://storyportal.net/2014/09/sorry-about-the-delay/
http://storyportal.net/2014/09/sorry-about-the-delay/#commentsThu, 25 Sep 2014 00:42:33 +0000http://storyportal.net/?p=516We are on the down-hill run at this point. I’m actually hoping to get some sleep tonite!

Not all the new features will be available at the launch, but the site should be back shortly

]]>http://storyportal.net/2014/09/sorry-about-the-delay/feed/5BigCloset Upgrades, Day 2http://storyportal.net/2014/09/bigcloset-upgrades-day-2/
http://storyportal.net/2014/09/bigcloset-upgrades-day-2/#commentsWed, 24 Sep 2014 13:10:41 +0000http://storyportal.net/?p=511… Read More →]]>We’re still hard at work behind the scenes. I managed to grab about 3 hours sleep (not three hours straight through, but basically 3 hours where I blacked out enough that I’m going to call it sleep).

Things are progressing on the database and we are working on other issues as well right now simultaneously.

We are aiming for mid-day west-coast (USA) time for the site to come back online, and while it could be sooner, I want to warn everyone that it very easily could take longer depending on how everything goes.

Right now, we aren’t anticipating any issues that will cause major delays.

-Piper, Cat, Erin and the BigCloset Band

]]>http://storyportal.net/2014/09/bigcloset-upgrades-day-2/feed/2BigCloset Closed for Upgradeshttp://storyportal.net/2014/09/bigcloset-closed-for-upgrades/
http://storyportal.net/2014/09/bigcloset-closed-for-upgrades/#commentsTue, 23 Sep 2014 19:47:35 +0000http://storyportal.net/?p=508TopShelf is closed while we upgrade the software. This shouldn’t take more than…oh, maybe most of a day? Two days? Frankly, we’re not sure. Check back here now and then for progress reports.

Part of the Equipment Package I’m delivering includes Infrastructure improvements, namely new network cabling.

While the downtime should not be noticeable, less than 30 secs at a time, you may have periods where The site is shown as offline, or the database offline, or everything may LOOK fine, but because the slave database server is offline, you get a 404 error when you click on a story.

The outages should be less than 30 secs at a time, and the whole process shouldn’t take me very long at all, so please bear with us!