(Update: Today, tar works without segfault without known changes on the system. The "rm" command is now segfaulting though. Argh! Maybe this is being caused by ESXi? Or, maybe the hardware.)

I have a Centos 5.3 system running on VMWare ESXi. It's been running for quite a while without much problem. Recently, however, I noticed that the tar command and the rpm command gives me a segmentation fault when I run them.

So, I tried using strace (found someone suggested this online) and below is what I get when using strace. I'd like to know how to repair my system and how I can prevent this from happening in the future.

strace is not a suitable tool for determining the cause of a segfault. Install the debugging symbols for the binary and all dependent libraries, run it under gdb, and when it segfaults run 'bt' to get a backtrace, and try asking about that.
–
wombleJul 11 '09 at 9:22

4 Answers
4

Even without much investigation, from your description that a few programs started to segfault, then they began to work correctly while others started failing, you have either bad memory modules, or a broken prelink.

First, stop all your virtual machines, reboot the host and run a memory test. You have to do this from outside ESXi. If you find any memory defects, that is your problem. Replace the hardware.

If no memory defects were found, check that you are running the latest available kernel from CentOS for your architecture.

Boot the host and the server again, the server in single-user mode (pass "single" to grub kernel parameters) and run:

prelink -avf

When prelink finishes, you should reboot the server. You could also do telinit u && init 3 to resume booting, but it is better to reboot in order to assure that all binaries will be reloaded with their new memory mappings.

se linux is disabled. Its a production environment. tar and rpm worked fine at some point but now they fail. I suspect an update is the culprit so I was hoping someone had seen this before and knew the answer. Rebuilding tar from source (and dependencies) with debug symbols is not something I'd do. If it came to that, I would reinstall but that wouldn't tell me why this happened to begin with.
–
JR LawhorneJul 11 '09 at 19:33

Ok, so not as complex as I thought. The server was compromised and the genius hacker screwed up the root kit install. So, the affect was binaries segfaulting. Another affect was unexpected network traffic from the server. Thanks all who responded!