Pages

Unixbhaskar's Blog

Tuesday, August 31, 2010

Being an administrator of a corporation to manage production box is an daunting task.One has to be very much aware what is going on into the box/servers by looking into it through some tools.One important package I am talking about is called "sysstats" ,which has so many important tool to disclose all the information needed by an administrator.

I do not issue any guarantee that this will work for you.

So this article I am using Arch Linux . As it doesn't come with base installation so I have to get it(sysstat) separately.

Now it puts a crontab entry to run daily on the installed system..although you can control it according your choice..

bhaskar@bhaskar-laptop_07:13:35_Tue Aug 31:/etc/cron.daily> cat sysstat #!/bin/sh# Generate a daily summary of process accounting. Since this will probably# get kicked off in the morning, it would probably be better to run against# the previous days data./usr/lib/sa/sa2 -A &

This package come with so many binary and all of them are very useful tool. I will explain all of them one by one.First tool is callled sar..and it will output like this..

Now bit of explanation is required for the fields it shows which I enumerated below:

Sar is system activity reporter.

%user and %nice refer to your software programs, such as MySQL or Apache.%system refers to the kernelâ€™s internal workings.%iowait is time spent waiting for Input/Output, such as a disk read or write. Finally, since the kernel accounts for 100% of the runnable time it can schedule, any unused time goes into %idle.

It come along with another two binary relates sar is called sa1 and sa2.What does this fellows do to sar??

Now lets talk about two very important tool which will provide different way to view things. Those are called "sadc" and "sadf".I will cover one after another below.

SADC:It is system activity and data collector daemon.Even w can use it manually too!.Sadc command intened to run behind the sar command.Actually it will write the binary format of the statistics it collect day by day and put into a dir i.e /var/log/sadd,where dd stands for the particular day.As the man page said it can only provide local activity,means runs on the same host it installed.

I am putting here few example stright out of the manual page for clear understanding.Here we go:

/usr/lib/sa/sadc 1 10 /tmp/datafile Write 10 records of one second intervals to the /tmp/datafile binary file.

SADF:This tool actually dispaly the collected data by sar in different format.Which is wonderful..because you can fuse your data to various places to get lot many information.It will essentially provide XML,CVS format data .

once again I am putting example stright out of the manual page for easy understanding.Here we go:

sadf -d /var/log/sa/sa21 -- -r -n DEV Extract memory, swap space and network statistics from system activity file 'sa21', and display them in a format that can be ingested by a database.

sadf -p -P 1 Extract CPU statistics for processor 1 (the second processor) from current daily data file, and display them in a format that can easily be handled by a pattern processing command.

Thursday, August 19, 2010

Working in a multi-admin environment ;where more then one administrator controlling servers,as often the case with most of the big corporates.Then you need a mechanism to deal with that which not allowed each other to overlap their work and keep track who is firing what.Sudo is that kind tool ,which is quite indispensable in the multi-admin production environment.

I do not issue any guarantee that this will work for you.

Most of the GNU/Linux distribution come with sudo..if not then please download it through by it(OSes) package manager. It should be in the repository of that distribution.

Once installed a configuration file related to it placed at /etc named sudoers . So you need edit it according to your requirement to get thing going with this tool.

Tool for to edit that file is called "visudo" ..which nothing but a vi/vim editor with a lock..means when someone editing others won't allow to do anything in it.Clear?? right.

You need to called it like this:

root@bhaskar-laptop_08:37:05_Thu Aug 19:/home/bhaskar # visudo

and the file /etc/sudoers should open in it,but with a temporary location and place with a lock.

Ok..now few internals entry need to visit for the sake of clarity of it's function.So here we go:

Suppose we want to allow sudo with some specific host with specific users on it to allow use of sudo.Did I confuse you with the last statement??not worry ...I will explain it in details..read on:

The careful reader will note that there was a bit of a change here. The line used to read jim ALL=(ALL) ALL?, but now there's only one ALL left. Reading the man page can easily leave you quite confused as to what those three ALL??s meant. ALL refers to machines- the assumption is that this is a network wide sudoers file. In the case of this machine (lnxserve) we could do this:

jim lnxserve= /bin/kill, /usr/sbin/jim/

Now let me explain that a host/machine name called "lnxserve" has a user called "jim" and heis entitled to run those two command right side of the "=" .

So what was the (ALL)? for? Well, here;s a clue:

jim lnxserve=(paul,linda) /bin/kill, /usr/sbin/jim/

Yes this line bring another twist into the previous line.Here it says.. a machine called "lnxserv" with a user called "jim" who will be able to run command as paul and linda with specified command mentioned.

That says that jim can (using sudo -u ) run commands as paul or linda. Yes it sometimes necessary to do it because of various reason in the production environment.I not going into that details ,because that might take another whole article to talk about.

This is perfect for giving jim the power to kill paul or linda's processes without giving him anything else. There is one thing we need to add though: if we just left it like this, jim is forced to use sudo -u paul or sudo -u linda every time. We can add a default runas_default:

Monday, August 16, 2010

As it is an important issue to deal with low level thing in the server archtecture. Being an GNU/Linux administrator/NOC/Ops one has to have the clear cut understanding what they are doing.Because handling the production box require lot of common sense and in depth knowlegde about the platform/OS.

So without much ado lets play with it or let me show you the simple tricks.

I do not issue any guarantee that this will work for you.

So the first question come into the mind why the hell you need to check the filesystem?? Specially the root(/) part of it...sound pretty dull and boring...huh..please don't ignore this.You know ignorance is a sin...so do not commit it.

Now filesystem can be corrupted in various ways..few common ways are :

1) Not properly shutdown the server(although most of the cases journaling will do the healing)

2) Sudden power cut left your system down with lot of processing going on

3)Somebody has done something special(bad sense) to corrupt the data on that particular partition.

It is a bad idea and not recommended to run fsck(yes,this is the inbuilt tool you need to use)the mounted partition or drive.So don't do that.

Now, running fsck on other partition like /home,/var,/usr ...

First and foremost thing to be done is get into a single user mode..how do you do that?

ok once you type init 1 at the terminal prompt you will be taken to the singe user mode.From there simply unmount the partions as show below:

root@bhaskar-laptop_08:16:36_Mon Aug 16:/home/bhaskar # init 1 ---> this will bring to the single user mode

y------> it will try to detect and fix any filesystem related corruption without manual intervention.

f-----------> this will force check even the system check says it's clean.

v--------> It will provide you the verbose explanation what that comming going through on the terminal screen.

Now a major problem in our hand. That we find out that root(/)partition of the filesystem gor corrupted due to some reasons.So we need to fix that issue to get back the system as soon as possible on the track.

For this kind of problem..it significant that on a mounted system you just cannot run fsck...as I said earlier..becauase it will corrupt the data on it.So we need a installation cd/dvd for our rescue. The first cd/dvd will do the job for us or get a systemrescuecd to do that.

Once you boot with one of those cd/dvd and put the below text at the command prompt it presents:

#linux rescue nomount

Now once you fire that one you are on the prompt so you can begin work on that.First we need to do is fire a mknod command.Now ask me why need to do that???

Because we had passed the option nomount in the last section so it will not parse any file system or it will not initialize any filesystem or create any device to operate on.If you try to run fsck now it will fail.

So to run correctly the fsck to on a filesystem we need to create device file for that.For that we need to run mknod.But to use mknod we need to know the Major number and Minor number of the device.Lets get those number...wait before that I need to tell you few thing about what Major number and Minor number of a device and how it signifies.

What is Major Number and Minor number??

Traditionally, the major number identifies the driver associated with the device. For example, /dev/null and /dev/zero are both managed by driver 1, whereas virtual consoles and serial terminals are managed by driver 4; similarly, both vcs1 and vcsa1 devices are managed by driver 7. Modern Linux kernels allow multiple drivers to share major numbers, but most devices that you will see are still organized on the one-major-one-driver principle.

The minor number is used by the kernel to determine exactly which device is being referred to. Depending on how your driver is written, you can either get a direct pointer to your device from the kernel, or you can use the minor number yourself as an index into a local array of devices. Either way, the kernel itself knows almost nothing about minor numbers beyond the fact that they refer to devices implemented by your driver.

So it's clear?? right.lets move on we need to find out the major number and minor number of the device to run mknod: