More info

Tips & tricks

Red Hat's customer service and support teams receive technical support questions from users all over the world. Red Hat technicians add the questions and answers to Red Hat Knowledgebase on a daily basis. Access to Red Hat Knowledgebase is free. Every month, Red Hat Magazine offers a preview into the Red Hat Knowledgebase by highlighting some of the most recent entries.

Tips from RHCEs

Wiping a hard drive

by Dominic Duval, Red Hat Certfied Engineer®

Ever needed to completely wipe out critical data off a hard drive? As we
all know, mkfs doesn't erase a lot (you already knew this, right?). mkfs
and its variants (such as mkfs.ext3 and mke2fs) only get rid of a few
important data structures on the filesystem. But the data is still
there! For a SCSI disk connected as /dev/sdb, a quick:

dd if=/dev/sdb | strings

will let anyone recover text data from a supposedly erased hard drive.
Binary data is more complicated to retrieve, but the same basic
principle applies: the data was not completely erased.

To make things harder for the bad guys, an old trick was to use the 'dd'
command as a way to erase a drive (note that this command WILL erase
your disk!):

dd if=/dev/zero of=/dev/sdb

There's one problem with this: newer, more advanced, techniques make it
possible to retrieve data that was replaced with a bunch of 0's. To make
it more difficult, if not impossible, for the bad guys to read data that
was previously stored on a disk, Red Hat ships the 'shred' utility as
part of the coreutils RPM package. Launching 'shred' on a disk or a
partition will write repeatedly (25 times by default) to all locations
on the disk (be careful with this one too!):

shred /dev/sdb

This is currently known to be a very safe way to delete data from a hard
drive before, let's say, you ship it back to the manufacturer for repair
or before you sell it on eBay!

What are the different utilities used to manage the System V initialization?

by Ryan Del Rosario

Red Hat Enterprise Linux includes several utilities that facilitate the management of System V initialization:

ntsysv- is a console-based interactive utility that allows you to control what services run when entering a given run level. This utility is used during system installation, but can be run from the command line. It configures the current run level by default. By using the --level option, it can be configured for other run levels

serviceconf- is an X client that presents a display of each of the services that are started and stopped at each run level. Services can be added, deleted, or re-ordered in run levels 3 through 5 with this utility

chkconfig- is a command-line utility. When passed the --list switch, it displays a list of all System V scripts and whether each one is turned on or off at each run level. Scripts can be managed at each run level with the --add or --del switches, or with the "on" and "off" chkconfig directives should the script not have predefined run levels for service.

service- is used to start or stop a standalone service immediately; most services accept the arguments 'start', 'stop', 'restart', 'reload', 'condrestart', and 'status' as a minimum.

The "serviceconf" and "chkconfig" commands will start or stop an xinetd-managed service as soon as they are configured on or off. Standalone services won't start or stop until the system is rebooted or the "service" command is run.

What is a Caching-only Name Server and how do I configure it to run in chroot environment?

by Liju Gopinath

Caching-only Name Server

A caching-only name server is used for looking up zone data and caching (storing) the result which is returned. Then it can return the answers to subsequent queries by using the cached information.

A caching-only server is authoritative only for the local host i.e 0.0.127.in-addr.arpa, but it can automatically send requests to the Internet host handling name lookups for the domain in question.

In most situations, a caching-only name server sends queries directly to the name server that contains the answer. Because of its simplified nature, a DNS zone file is not created for a caching-only name server.

Running the Caching-only Name Server in an chroot environment is a secure approach. The chroot environment has more security compared to the normal environment.

Configuration

The packages which needs to be installed are:

bind-9.2.4-16.EL4.i386.rpm

bind-chroot-9.2.4-16.EL4.i386.rpm

caching-nameserver-7.3-3.noarch.rpm

These packages can be installed from the CD using the command:

# rpm -ivh <PACKAGE NAME>

or using the up2date command:

# up2date <PACKAGE NAME>

The configuration files associated with the caching name server are:

/etc/sysconfig/named

/var/named/chroot/etc/named.conf

/var/named/chroot/var/named/named.local

/var/named/chroot/var/named/named.ca

/var/named/chroot/var/named/localhost.zone

/var/named/chroot/var/named/localdomain.zone

Edit /etc/sysconfig/named and ensure that the following entry is made in the file, which tells named to run the chroot environment.

ROOTDIR=/var/named/chroot

Note: /etc/named.conf is a symbolic link to /var/named/chroot/etc/named.conf file.

To configure the /etc/named.conf file for a simple caching name server, use this configuration for all servers that don't act as a master or slave name server. Setting up a simple caching server for local client machines will reduce the load on the network's primary server. Many users on dialup connections may use this configuration along with bind for such a purpose. Ensure that the file /etc/named.conf highlights the entries below:

With the forwarders option, A.B.C.D and W.X.Y.Z are the IP addresses of the Primary/Master and Secondary/Slave DNS server on the network in question. They can also be
the IP addresses of the ISPs DNS server and another DNS server, respectively. With the forward only option set in the named.conf file, the name server doesn't try to contact other servers to find out information if the forwarders does not give it an answer.

nslookup now asked the named to look for the machine www.redhat.com. It then contacted one of the name server machines named in the root.cache file, and asked it's way from there. It might take a while before the result is shown, as it searches all the domains the user entered in /etc/resolve.conf. When tried again, the result should be similar to this example:

Note the Non-authoritative answer in the result this time. This means that named did not go out on the network to ask this time, it instead looked up in its cache and found it there. But the cached information might be out of date. So the user is informed of this danger by it saying Non-authoritative answer. When nslookup says this the second
time when a user ask for a host, it is a sign that it caches the information and that it's working. Now exit nslookup by giving the command exit.

Why is the X11 Forwarding not working on my system when my ssh daemon is correctly configured with 'X11Forwarding yes'?

by Eduardo Damato

The easiest way to debug the ssh connection is to run ssh with increased verbosity. This normally shows many silent errors/problems.

In this case, the xauth program is not installed in the system, and therefore the ssh target system can not add itself to the X authentication database of the X server, so Xforwarding is silently denied. To resolve the problem, install the xauth package:

# up2date xorg-x11-xauth

try to ssh in to the machine and verify that the display environment variable is correctly setup by the tunnel:

$ ssh -X -l root <USERNAME>
# echo $DISPLAY
localhost:10.0

Why does the ext3 filesystems on my Storage Area Network (SAN) repeatedly become read-only?

by Chris Snook

When ext3 encounters possible corruption in filesystem metadata, it
aborts the journal and remounts it as read-only to prevent causing damage to the metadata on disk. This can occur due to I/O errors while reading metadata, even if there is no metadata corruption on disk.

If filesystems on multiple disk arrays or accessed by multiple clients are repeatedly becoming read-only in a SAN environment, the most common cause is a SCSI timeout while the Fibre Channel HBA driver is handling an RSCN event on the Fibre Channel fabric.

An RSCN (Registered State Change Notification) is generated whenever the configuration of a Fibre Channel fabric changes, and is propagated to any HBA that shares a zone with the device that changed state. RSCNs may be generated when an HBA, switch, or LUN is added or removed, or when the zoning of the fabric is changed.

Resolution:

Some cases of this behavior may be due to a known bug in the interaction between NFS and ext3. For this reason, it is recommended that users experiencing this problem on NFS servers update their kernel, at least to version 2.6.9-42.0.2.EL. Here is the link to the related bugzilla entry https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199172

The lpfc driver update in Red Hat Enterprise Linux 4 Update 4 includes a change to RSCN handling which prevents this problem in many environments. Users of Emulex HBAs experiencing this problem are advised to update their kernel, at least to version 2.6.9-42.EL. Here is the link to the related bugzilla entry https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=179752

The lpfc and qla2xxx drivers also have configuration options which cause the driver to handle RSCNs in a less invasive manner, which often prevents timeouts during RSCN handling. These options must be set in the /etc/modprobe.conf file:

options lpfc lpfc_use_adisc=1
options qla2xxx ql2xprocessrscn=1

After making these changes, the initrd must be rebuilt and the system must be rebooted for the changes to take effect.

Recommendation:

This problem may be prevented or mitigated by applying SAN vendor recommended configurations and firmware updates to HBAs, switches, and disk arrays on the fabric, as well as recommended configurations and updates to multipathing software. This particularly applies to timeout and retry settings.

The architecture of Fibre Channel assumes that the fabric changes infrequently, so RSCNs can be disruptive even on properly configured fabrics. Events which generate RSCNs should be minimized, particularly at times of high activity, since this causes RSCN handling to take longer than it would on a mostly idle fabric.

In multipathed environments with separate fabrics for different paths, zone changes to the fabrics should be made far apart in time. It is not uncommon for complete handling of a zone change to take many minutes on a busy fabric with many systems and LUNs. Performing zone changes separately minimizes the risk of all paths timing out due to RSCN handling.

What does the /proc/cluster/lock_dlm/drop_count file do and why do some nodes exceed this value?

by Wade Mealing

The lock_dlm lock manager keeps a certain number of DLM locks for GFS files in memory, even after the GFS files are closed. This DLM "lock cache" boosts performance for GFS files that are frequently opened
and closed. The /proc/cluster/lock_dlm/drop_count file is used to tune the number of locks that lock_dlm keeps in its cache.

When /proc/cluster/lock_dlm/drop_count is not zero, the lock_dlm lock manager attempts to keep the number of local locks for GFS below this level (on a per node basis). This is not a "hard fixed" value but when the number of locks on each node exceeds this value, DLM will begin to release some of these locks. As a result, a minor performance penalty may occur when attempting to access files or posix locks no longer in the cache.

The current DLM implementation in Red Hat Enterprise Linux 4 defaults to 50000 locks per node. This may be modified but must be done so before the GFS file system is mounted on the node.

To change this value, use the following command:

/bin/echo "12345" > /proc/cluster/lock_dlm/drop_count

Where "12345" is the upper limit on the number of cached DLM locks.

Changes to this file will not take effect on currently mounted file systems. If the value is set to zero, DLM will never purge locks from its cache.

This value is not persistent across reboots, so the command should be executed on a node after each time it has been rebooted or fenced. It may also be automated by adding the command to the gfs service init script, /etc/rc.d/init.d/gfs.

The current number of cached locks for a single mount point may be obtained with the command:

# gfs_tool counters /mount/point

(Where the /mount/point is the location of where the GFS file system is mounted)

This command will return the total number of locks cached by all nodes. This may often be greater than the number of nodes multiplied by the number of open files per node. This is not unusual or dangerous.

Carefully consider whether this value should be changed on any node. Modifying this value only works on nodes using the dlm_lock lock manager. All changes should be tested in a test environment with a production work load before making these changes in the production
environment.

How do I set up single master replication in Directory Server?

by Andrew Ryan

Replication is when a Directory Server gets information from another Directory Server and allows clients to access that information. This article assumes that there are two instances of Red Hat Directory Server already configured.

The server containing the data that is to be replicated is the master server and the server receiving the data is the slave server.

Follow these steps to configure the system:

Create a replication user

The first step is to create a user that has access to the part of the directory
that needs replicating. To do this, start up the Directory Server console on the slave server:

Select the Directory tab, then right-click on the
config tab.

Select New -> User.

Fill in the mandatory fields on the dialog box. Remember the user ID that is chosen, and specify a password for the user. The DN (needed later) for this user is uid=username,cn=config.

Enable Changelog on master server

Open the Directory Server console on the master server.

Select the Configuration tab.

Click on the Replication item.

Check the box marked Enable Changelog.

Click the button marked Use Default.

Select Save.

Enable master server as a single master replica

Open the Directory Server console on the master server.

Select the Configuration tab.

Expand the Replication item.

Click on userRoot item.

Click on Enable Replica.

Select Single Master

Choose a unique Replica ID

Click on the Save button.

Enable slave server as a dedicated consumer

Open the Directory Server console on the slave server.

Select the Configuration tab.

Expand the Replication item.

Click on userRoot item.

Click on Enable Replica.

Enter the DN of the Replication User set up in Step 1 as a
supplier DN, in the box marked "Enter a new Supplier DN".
You may need to scroll down to see this box.

Click on the Add button next to this box.

Ensure that the Replica type is Dedicated
Consumer

Click on the Save button.

Configure a replication agreement

Open the Directory Server console on the master server.

Select the Configuration tab.

Expand the Replication item.

Right-click on userRoot item.

Select New Replication Agreement...

Enter a name and description for the agreement, and click Next

Select the slave server from the drop down list, or, if it is not present
click on Other... and enter the slave's details.

In the Uid field, enter the DN for the replication user. For example,
uid=Replication User,cn=config.

Enter the password for the replication user.

Select Next.

Choose attributes to replicate, or select Next
for a full replica.

Choose a Synchronisation schedule, or select
Always Keep in Sync.

Select Next.

Select Initialise Consumer now. If you do not
want the slave server to be updated, or want to manually configure the slave server, select another option, such as exporting to LDIF.

Select Finish.

Configure clients to handle failover

Edit /etc/ldap.conf, changing the
host line to add in the slave server. In the example,
the slave server is used as the primary server, with failover to the master if
the slave is down.

host slave.example.com master.example.com

Test the configuration

Make sure the openldap-clients package is installed.
Using the ldapadd command, check that the slave is a read-only replica, and
attempts to add information to it result in referrals to the master server:

Why should manual fencing be avoided in a production cluster?

by Demosthenes Mateo

Global File System (GFS) manual fencing should be avoided in production clusters. It is meant to be used for testing purposes only when an appropriate fencing device is not yet available. Red Hat recommends a network power switch or a fiber channel switch fencing device for production clusters to guarantee filesystem integrity.

Outlined below is a scenario that explains how manual fencing might lead to filesystem corruption:

A node stops sending heartbeats long enough to be dropped from the
cluster but has not panicked or the hardware has not failed. There a
number of ways this could happen: faulty network switch, gulm
hanging while writing to syslog, rogue application on the system locking out other applications,etc.

The fencing of the node is initiated in the cluster by one of the other
members. Fence_manual is called, lock manager operations are put on hold until the fencing operation is complete. (NOTE: Existing locks are still valid and I/O still continues for those activities not requiring additional lock requests.)

Administrator sees the fence_manual and immediately enters fence_ack_manual to the get cluster running again, prior to checking on the status of the failed node.

Journals for the fenced node are replayed and locks cleared for those
entries so other operations can continue.

Fenced node continues to do read/write operations based on its last
lock requests. File system is now corrupt.

The information provided in this article is for your information only. The origin of this information may be internal or external to Red Hat. While Red Hat attempts to verify the validity of this information before it is posted, Red Hat makes no express or implied claims to its validity.