When I get to the pfhorums, what I do is open every unread thread (view new posts) in it's own tab. The title in the tabs don't help, they all read The Pfh..., and I have to scroll all they way up to check what the topic actually is.

ukimalefu wrote:Ok, I'm sorry I was wrong, put back the thread's title on every post.

When I get to the pfhorums, what I do is open every unread thread (view new posts) in it's own tab. The title in the tabs don't help, they all read The Pfh..., and I have to scroll all they way up to check what the topic actually is.

It is in every page title, but I agree, it would be more helpful if the thread title were first and the site title following.

Switch wrote:Saturday night the server locked up ( mysql crashed, immediately dropped successful SSH connections, ignored reboot command ) and had to be rebooted manually. I was unable to get to the data center until this morning. From this morning to 17:00 CST, cocide was performing a remote backup, after which I started the web server again.

Hopefully this was a one-time glitch ( though my cynicism doubts it ), and the server doesn't cause any more trouble. Regardless, all the Pfhorums data is fully backed up and will continue to back up remotely.

Cocide thinks yesterday's server crash was caused by the same thing as before, only this time he was able to connect to the machine after it crashed, and I was able to extract the logs last night. The filesystem was unmounted, and since the HDDs are perfectly fine when rebooted, he thinks it was the SATA cables. I replaced the cables and drove the server back out to the datacenter this morning.

No, it is not RAM related, or at least not only the RAM. If we continue to have problems I will ask switch to run a ramtest on it but the symptoms to not agree with that being a cause.

Since last time it was giving IO errors causing the ssh connection to close (it could not read bash or any other programs from the HDD so when it failed to execute anything it closed the connection) I went ahead and put sshd and bash into a ramdisk and consequently was able to connect after this last system failure. All the bash builtin commands ran fine and I was able to work within the ramdisk fine, however the kernel had lost the mount point for its root filesystem.

Since the replacement server is identical to the original I am ruling out driver related problems. Since the HDDs passed a filesystem check, the raid is reporting needing to rebuild after the incident because the drives were lost at different times (on all partitions, not just the rootfs), and the SMART status is all good on both drives I am really hoping that it was a problem with the SATA cables. I know it is unlikely for two SATA cables to be problematic at the same time but these boxes are old, very very old, and they have not always used the cables that were in the case so it is possible that they could have been slightly damaged or had some dust buildup on the contacts or something like that - I have seen stranger.

I will be moving the backups to 12 hours instead of 24, this last failure happened an hour before the backups were supposed to run - go figure.

If anyone wants to donate money for new hardware I wont complain, the RAM in this box is slower than most SD cards and a free smartphone is probably faster by a ton. But hey, its free to host and on a decent enough connection so as long as it works right?

cocide wrote:Since the replacement server is identical to the original I am ruling out driver related problems. Since the HDDs passed a filesystem check, the raid is reporting needing to rebuild after the incident because the drives were lost at different times (on all partitions, not just the rootfs), and the SMART status is all good on both drives I am really hoping that it was a problem with the SATA cables. I know it is unlikely for two SATA cables to be problematic at the same time but these boxes are old, very very old, and they have not always used the cables that were in the case so it is possible that they could have been slightly damaged or had some dust buildup on the contacts or something like that - I have seen stranger.

Cables is just wishful thinking--it's the controller.

If anyone wants to donate money for new hardware I wont complain, the RAM in this box is slower than most SD cards and a free smartphone is probably faster by a ton. But hey, its free to host and on a decent enough connection so as long as it works right?

Whether you're exaggerating about the RAM speed, or just that clueless--why don't you list the specs. Maybe some of us have some old crap lying around that's better.

Ya that's my big fear, the controller is on the motherboard and that board was previously working though... granted it was a few years ago but when it was turned off it was fully functional.

treellama wrote:Whether you're exaggerating about the RAM speed, or just that clueless--why don't you list the specs. Maybe some of us have some old crap lying around that's better.

And yes I am exaggerating, but doing a crowd sourced old crap upgrade instead of building a $400-500 box seems kinda silly though, I know the kind of things switch has access to and if we need another old crap solution we can find one.

Anyway this box has Celeron D 336 w/ 3g DDR2 533 ECC and speed wise it is fine for this site and everything else it hosts, its biggest limitation is the SATA 1 interface, this could be solved w/ an add-on controller which would replace the current (failing?) controller on this machine but that would involve buying things and we are cheap bastards.

These servers were decent servers when they were new so them being old is only becoming a problem due to parts failure, the previous server had been running just about every day for almost 10 years. While that really is not a long time for a server it has reached the age to where the first few parts are failing. Unfortunately I have not had the opportunity to look at the failed server, for about 6 months before the server fully died it had problems with randomly booting, not rebooting or halting or kernel panicking just booting up from an off state when it was previously on. My gut tells me that the PSU on that comp failed - the machine was shutting off as the PSU began to crap out and then the bios was automatically booting the machine as it is set to do. Instead of fiddling with the PSU it was just easier to pull the drives and slide them into an identical server that was last know to work flawlessly. Worst case scenario we combine both the boxes into one working computer.

Sorry switch for all the driving, but hey at least we know you can get into the datacenter now!

But hey, that still doesn't stop the fact that if each one of the 97 users who have logged into the new site since it was put up donated $5 we could easily replace the server with something modern - its not like the box needs a video card or new hdds, its just mobo/ram/cpu/psu.