Comments

Looks like memory errors. Try take out some sticks, or replace till the errors go away. It could also be some bad MB, memory controller, things like those, but I strongly suspect hardware problems.
M

Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

From the looks of it, seems the errors are in the same range, so it is probably a bad memory stick. 2 areas of error at b4 and b7, but that is not sure indication, can you run some memory test ?
You will be sure then.
Something like this: http://linux.m2osw.com/memory-test-on-live-system
M

Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

Yeah, sorry, I just thought of this later and edited my previous post.
M

Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

The test shouldnt render the server unusable, however, if the memory is bad and data is shifted through the bad areas (OS data that is needed for functioning), the kernel might hang.
If you only have one site, moving it is the best thing to do tho.
M

Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

I think the original command was right. The larger the file the more likely to catch the error faster. Only if you have very little spare memory you should do a large count on a little file at a time.
So put bs to something like 100 mb and then do it 40 times if you have large unused memory (likely if you offload the site and the system is idle), or do 10 mb 400 times if your memory is at the limit. I would go with 100 mb.
M
P.S. count=4294967296 That should be 1000 times lower, you already take a chunk of 1 K so the iteration should be 4 mil not 4 bn.

Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

That might cook it. Only testing memory with bogus random data will probably not crash the kernel, but if it does many other things that need memory fed to the kernel, it may die in case memory is bad.
M

Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

Looks ok, try the other script too. By the looks of it also stress might be not so dangerous.
M

Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

Your support is very much appreciated!
I will now fully move the website to another server and wait until DNS propagation is over before I proceed. There will surely be a downtime if I find some hardware error so it seems wise to me to have the website at another place then already.

The server is at OVH (duck and cover) by the way, so I will use their rescue boot mode which offers all kind of testing methods to check the server health.

Honestly said, I did not mess with the kernel at all and to be even more honest: I would not even dare to try to do anything with it. I would consider myself as some kind of advanced amateur when it comes to linux, but some things (like kernel compilation) still scare the hell out of me.

But another thing: I did not want to wait and already started the rescue mode and the hardware tests. Looking forward to the results...

Heat. Could be a problem with OVH. I do not know much about their data center design...
I have looked through my logs and the problem seems to be there since the first logwatch email that I have received. Obviously, I have just ignored that until now.

Runs like a charm (except the kernel errors) by the way. And I cannot even complain about their network or the german support team. I receive answers to tickets within 4 hours and they are friendly so far. But I cannot trust this server with those error messages. And they will not do anything about the hardware as long as the tests that I have run do not report any problem. What would you do in my shoes?

Try upgrade the kernel and put some monitoring in place for temperature of cpu, mb, hard drive.
It might be some bug, since you already offloaded the site, put all back clean and upgrade to latest stable versions of the software you are using, before putting it in production make sure everything runs OK under some stress.
If it does, you are OK to go.
At times, errors are just impossible to track, even misleading. To be on the safe side, do what is under your control (software upgrade and such) since hardware is out of reach or expensive to check.
If you still get errors, then it must be some HW problem, but if you dont, even if it is a hw problem, could run like this indeffinitely, at worst keep your backups current.
Good luck, it is indeed a good deal you have there :P
M

Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

Okay, there was no way to upgrade the kernel as it obviously was already the latest version available. However, I did a complete reinstall. Now with CentOS 6.2 instead of 5.8 and the errors have gone...

So it was some software bug...
I wouldnt rule out hw problems tho. Modern chipsets are extremely complicated and something as simple as some power fluctuation due to some half a millimeter error in some capacitor or something could generate noise inside and make some signals go out of the safe zone creating random errors which are almost impossible to track at the producer QA.
This is why I never put anything into production directly, I had way too many kernel panics randomly occurring due to hw problems.
It is true I dont buy top notch hw either... But, instead of spending 100 dollars on a computer (example) better buy 2 and you have a spare if one goes bad for 50 dollars each. It is, obviously, a lower manufacturing class, but, if it works for a couple of weeks under stress, it is likely it will work for a couple of years and you have spare too. If the expensive one dies you have no parts or a complete copy and those deaths usually occur friday night...
This is what i've done all my life with all kinds of products and never regretted, actually, the faster the tech world goes, the lower is the incentive for quality, you dont need it to last 10 years when you will have to upgrade it in 2 unless you really run mission critical systems, even so, I go with a cheap HA scheme...
M

Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

@Maounique said: It is true I dont buy top notch hw either... But, instead of spending 100 dollars on a computer (example) better buy 2 and you have a spare if one goes bad for 50 dollars each.

This is what I did. :)
I now have two servers in the lower price range and use one of them only as failover machine if the other one faces problems. Works nice. Better than having only one "superduper" high end machine that can fail brilliantly too...

@Amitz said: Right, but as I said above: Kernel Compilation? There might be Dragons... ;-)

Actually, it is not that hard. When I finally got a 386 machine it is the first thing I did. Surely, it was a 20 mhz sx one and took like 2 days, but ended without problems. At that time didnt have that many power failures. It was needed to have the smallest kernel possible as I was trying to save memory, 4 mb were not much, even for that time (about 1994). O tempora :)
Bottom of line, since you have a spare machine, do it :)
M

Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.

I didnt look at it, it depends on virtualization tho. I would suggest you do this on a real machine or at least full virtualization such as KVM or xen HVM for the first time.
M

Extremist conservative user, I wish to preserve human and civil rights, free speech, freedom of the press and worship, rule of law, democracy, peace and prosperity, social mobility, etc. Now you can draw your guns.