Disclaimer: Any use of information and code found on this page
is at your own risk. You may write to
me if you have any problems,
but I will not promise to help you.

Unofficial comp.os.linux.development.* FAQ

On this page I will try to answer a few of
the questions that I often see on
comp.os.linux.development.apps
and
comp.os.linux.development.system.
If anybody knows an official FAQ for
these newsgroups please tell
me about it. This page is in
an urgent need for cleanup and reorganization,
I just don't feel I
have the time to do that right now. You may also mail
me if you have suggestions
for additions or some of the currently missing answers.

Did you find this page on google? I
can see in the webserver log that a lot of people reach this page that
way. Some of you will find the answer to your question on this page
and some of you will not. If you have read the entire page without
finding the answer to your question, the obvious next step would be to
come along and ask on the appropriate of the two newsgroups. If you
have additional questions after reading this page, go ahead and ask. I
cannot answer questions unless you ask. Don't do like that guy who
first found this page by searching for information about the return
value of kernel_thread just to find out that it was the same as
sys_fork. Three hours later he was back searching for information
about the return value of sys_fork. Did he really waste three hours
searching for the answer to a question I could have answered in five
minutes?

Unless stated otherwise code snippets on this page are copyrighted by
me, and may be used under the GPL version 2.

You forgot to include the header defining a macro or inline
function you are trying to use.

You didn't use optimization, some of the kernel headers require
optimization to work. Use -O2 on the gcc commandline to optimize. This
is needed for some function inlining, and to remove some dummy
references used for detecting certain programming errors.

The symbol is not defined in the kernel. Notice that most standard
C library functions does not exist in the kernel.

The symbols are defined but not exported. A symbol can only be
used by modules if it is exported by the kernel somewhere. Many of the
exported symbols are exported by kernel/ksyms.c. Notice that symbols
exported by code that could potentially be kernel modules must be
exported by the module itself.

Where does the messages from my printk statements go?
If you are using a textmode VC they will usually go directly to the
screen. If you are using X this will not be the case. In most cases
debugging kernel modules is best done without X. If you insist on
debugging your kernel modules under X, there is a couple of places to
look for the output:

You can use xconsole which can display the informations.

You can read the system log files. Usually the file to look in is
named /var/log/messages. The command "tail -f /var/log/messages" could
be handy in a separate xterm.

You can use the "dmesg" command.

You can use the command "cat /proc/kmsg".

You can use a serial console, notice you will
need two computers to do that.

The loging directly to the screen depends on the serverity of
the message. There are eight levels, the most important is level 0 and
is used only if the kernel is essentially crashed. The least important
is level 7 which is used for debuging output. The level is indicated
with three chars at the start of each line "". There are
defines in <linux/kernel.h> for every level. If no level is
specified a default will be used. The kernel has a variable specifying
how important a message must be to get logged directly to the screen.
Usually the klogd process will change this variable so fewer messages
makes it to the screen, but all messages will be sent to the syslgd
process for logging in files.

A kernel module is a piece of code placed in the kernel address
space. This is similar to shared libraries. The module can contain
data and code as you wish. Being in shared memory the module and it's
code can be accessed by any thread in kernel mode. When the module is
loaded a special initialization function will be called and should
quickly return success or failure. Before the module is unloaded an
optional cleanup function is called, this must do whatever cleanup
needs to be done.

A kernel thread is a process without an user address space. When a
process is in kernel mode there is little difference, they all share
the same kernel address space. If a module has registered functions in
the kernel, they can be called by any process and even by multiple
processes at the same time.

Can I start kernel threads from a kernel module?
Yes you can do that. Most modules don't need to create any kernel
threads. If you are writing a driver and think you need a kernel
thread you could easily be wrong. Anyway if you find that you really
need a kernel thread it can easily be created with the
kernel_thread function:

This will create a new kernel thread, the return value from
kernel_thread is interpreted in the same way as the return
value from sys_fork. That is a positive number is the pid of
the created thread, and a negative number indicates an error
condition. A return value of zero should be impossible, because the
child terminates after calling fn and does not return to the caller of
kernel_thread.
The action performed by the new thread
is equivalent to the statement exit(fn(arg));. The
arg parameter is often used to pass a pointer to a struct to
the newly created thread, take care to ensure that the struct still
exist when the thread needs it. If the struct is a local variable the
function calling kernel_thread could easily have returned
before the struct has been read. If arg is not needed it is
usually just filled with the value NULL. The first you want
to do in a new kernel thread usually is this:

Notice that like all other processes kernel threads will become
zombies until the parent has seen the status. If a kernel thread is
created in the initialization function the parent will be the module
loader. When the module loader terminates the process will get init as
its new parent. In that case init will take care of waiting for the
terminated threads. In all other cases you have to ensure that the
zombies are taken care of. Kernel threads cannot ignore their children
like userspace programs can, when a userspace program wants to ignore
its children the kernel will do the waiting and just not tell the
userspace program.

Removing a module using kernel threads is very difficult to do without
creating race conditions. The safest solution is to create a module
that cannot be removed. In the initialization call MOD_INC_USE_COUNT
to get a usecount of 1. Don't change the usecount anywhere else. For
an example that can be removed look on the kernel usb driver
linux/drivers/usb/hub.c.
It has also had its race conditions, but they have presumably been
fixed. Here goes another (untested) example which I believe is race
free:

Can I read and write files from a kernel module?
Yes, but it is a little tricky. You should not use the usual user
space functions open, read, write, and close. Instead you should use
filp_open and the methods in the returned struct. Always remember to
cleanup after yourself. Here is an example: (FIXME: Put the updated version here)
kcp.c. If you need more
examples I suggest you take a look on the sourcecode for the khttpd
or tux webserver. Always remember that file access is only possible in
a process context. When doing file access from within the kernel,
think carefully about your design. In many cases it is a bad idea.
Configuration files should never be read by kernel code, instead pass
arguments to the kernel or the module loader. If it is something
complicated write a userspace utility to read and parse the
configuration file, and then pass it to the kernel in some appropriate
way. Possibly by building up a kernel structure through a sequence of
calls. One example doing this is the iptables-restore command.
Firmware which a driver needs to transfer to a device is often stored
as an array of chars in a header file. This is usually init data, so
the memory is freed right before /sbin/init is started. If you want to
load firmware from a file, that can also be done by a user mode
utility. If you still want to access a file from within the kernel, at
least don't hardcode the path. The filename can be passed as an
argument to the kernel or the module loader.

Longer answer: In some kernel versions it is possible to add or
modify a systemcall by changing the sys_call_table. But
because this table is not intended to be changed at runtime, it has no
protection. Changing the table will lead to race conditions. Even
without the race conditions, there are problems related to removing
the modules while they are in use or stacked. Because of the problems
with changing sys_call_table from modules, the symbol is no
longer exported in new kernels. In other words if you get
"unresolved symbol sys_call_table" when trying to load a
module, it means there is a bug in the module, and the kernel does no
longer accept such buggy modules.

Does Linux have CPU affinity?
The answer below is mostly outdated. In kernel version 2.6 user mode
processes can use the sched_setaffinity and
sched_getaffinity system calls. There is a
patch which
backports them to 2.4.

All recent Linux versions will try to keep processes as long time as
possible on the same CPU. But processes will be moved if the CPUs are
not equally loaded. True CPU affinity was introduced in 2.4.0, but is
only available in kernel mode. It can be used from kernel modules, but
was not used by the kernel itself. Starting in 2.4.7-pre5 it is used by
ksoftirqd
to start one instance for each CPU. In 2.5.8-pre3 a
systemcall to set CPU affinity was introduced.
I have written a module implementing a userspace interface for the CPU
affinity in 2.4.x kernels. WARNING: this is untested codecpus_allowed.tgz

How do I use files larger than 2GB?
To use files of 2GB or more on a 32bit architecture, you need to pass
the O_LARGEFILE flag to the open system call. If anybody knows which
header file to include to get this macro please tell me about it.
Meanwhile you can use these few lines in your code:

#ifndef O_LARGEFILE
#define O_LARGEFILE 0100000
#endif

When calling open you can do like in this example:

fd=open(filename,O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE,0666);

If you need to use large files through stdio you need a version where
fopen uses the O_LARGEFILE flag when calling open. Otherwise a
possible workaround is to avoid fopen, you can instead build your own
fopen implementation using open followed by fdopen.

A helpful person told me a little more about using large files and
devices:

Hello.

I needed to essentially treat an entire hard drive as one large file, and
discovered I couldn't get past 2 GB into it until I defined the following
macros in my source:

The _LARGEFILE_SOURCE macro permits the use of fseeko() and ftello(), among
other things. _LARGEFILE64_SOURCE actually permits the use of 64-bit
functions like fseeko64() and ftello64() and, finally, _FILE_OFFSET_BITS
determines which interface will be used by default. If the macro isn't
defined or is defined as 32 then the default interface is 32 bits. If it is
defined as 64 then the default interface is the 64-bit interface, and a call
to fseeko() will use this 64 bit interface.

I cribbed from the file NOTES that accompanies glibc v2.2.5 on my Slackware
8.1 box to write this E-mail but I know these extensions exist in glibc
v2.2.3 as well.

HTH

How do I use more than 2GB of virtual memory in one process? The only real solution to your problem is using a
64bit architecture, but there are workarounds that will help you.
Normally the 4GB address space is split into 4 equal sized sections
for different purposes. The first is used for executable and brk/sbrk
allocations. The second is used for mmaps, shared libraries and malloc
uses mmap. The third is used for stack, the fourth is used for the
kernel itself. Since mmaps grows bottom up and stack grows top down,
the part not used by the one may be used by the other. You may want
to change the split between brk and mmap by changing the
TASK_UNMAPPED_BASE define in linux/include/asm/processor.h.
Or you may try
this patch
for Linux version 2.4.17. The patch still needs more testing. You can
also change the split between userspace and kernelspace, that is the
define PAGE_OFFSET_RAW in linux/include/asm/page_offset.h.
Notice that incorrect settings could make the kernel unusable. You
should always leave at least 8MB plus 3% of your physical memory for
the kernel. The size for the kernel probably needs to be divisible by
4MB.

A completely different approach has been suggested on the kernel
mailing list (linux-kernel@vger.kernel.org). You can allocate a number
of shared memory segments using SysV or POSIX shared memory. Then map
and unmap segments as needed. (This is a little similar to EMM known
from DOS.) The segments will be limited in size, but you can have a
lot of segments with a total size even going beyond 4GB if you have
enough physical ram or swap space. It will also be possible to mmap
files or parts of files in your process. Memory mapped files behaves
in most cases like shared memory.

Why does Red Hat Linux 7.3, 8.0, and 9 load glibc on address 42000000?
On Red Hat Linux 7.3, 8.0, and 9 the C library is compiled for a fixed address
rather than a dynamic address. The constant RedHat has chosen is a
little above the default TASK_UNMAPPED_BASE, so even though a few
memory mappings have been made before libc is mapped, it can get the
address it wants. In most cases this fixed address for libc is a good
idea, but it does have a single disadvantage. If you change
TASK_UNMAPPED_BASE to get more contiguous address space, the choice of
42000000 is very bad. You can install the source rpm and change this
address in the spec file to something else like 07000000, you should
also add your initials to the release field so it is always clear that
this is a version you changed. Then you can
build a new rpm file and install it. On
RedHat 7.3 an easier, but not as good solution is to install the i386
version of glibc. (Does this work on RH8.0 as well?)

How do I use posix shared memory in Linux
You might find the shm_open(3)
man page a help. What it does not mention is that you need to also
include <sys/fcntl.h>. And remember you must compile
with -lrt which is easy to miss in the man page. Basically
the Linux implementation of shm_open just prepends
/dev/shm to the name and calls open(2),
the /dev/shm directory must exist and should be the mountpoint of a
tmpfs filesystem (This filesystem used to be called shm, but that name
is now obsolete). Notice that with Fedora Core 1 running on an i586
programs compiled with -lrt will not work reliably.

How do I prevent zombie processes?
Zombie processes are dead children ignored by their parents. If you
want to create a child process and don't want to care about it
anymore, you can use the double fork trick. In the simple case, where
you just want to know success or failure, it can be done like this
(untested code):

How can I wait for termination of a process not my child?
It would often be convenient to have a way to get notified about
termination of an arbitrary process not necessarily your own child.
Unfortunately there is no standard way to do that. There are a few
different possible tricks that may help:

If the process you want to wait for is a relative to yourself you
might be able to get away with a pipe trick. This trick requires that
a pipe was created by a common ancestor through the use of the
pipe() system call. (In some cases that could be one of the
involved processes because the one is an ancestor of the other.) The
process to be waited for has to be the writer, and the process that is
to wait shall be the reader. The trick is, that reading a pipe gives
you an EOF when there are no more writers. For this trick to work it
is important to close the write side of the pipe in all other
processes than the one to be waited for. There are a few nice ways to
use the pipe for more than just waiting for a single process to
terminate:

You can wait for all processes in some group to terminate by
having all of them being writers of the pipe.

Multiple processes can be waiting for the same process to
terminate because multiple readers will work as expected.

You can make use of the close on exec flag on the write end of the
pipe to indicate if you want to be notified if the process calls
execve and not only if it terminates.

You can even use select on the read end of the pipe.

You can create a loop where you check once every half second if
the process in question still exists. This is not a good solution, but
it is very portable and probably the best that can be done if the
scenario does not allow you to use the pipe trick.

Finally you can make use of ptrace(), but that is rarely
a good idea because:

It will affect the process, which might not behave exactly as it
is intended to.

It can slow down the process being ptraced.

A process can have only one tracer.

You can only trace your own processes.

How do I detect if a pid exists?
You can do that by using the kill() system call. If you use 0
in the signal field no signal will be sent, but error checking is
still done. A return value of 0 indicates the process exists, and you
could send a signal to it. A return value of -1 indicates that it does
not exist or is a process you could not send a signal to, in that case
use errno to find out what is the case. A value of
ESRCH indicates the process does not exist. A value of
EPERM indicates the process does exist, but you are not
allowed to send a signal to it. See the kill(2)
man page for more information.

The core would have been larger than the current limit. Verify if
there is a limit and try to disable it. In bash, sh and similar shells
type ulimit -a to see the current limits and type ulimit
-S -c unlimited to disable the limit, finally verify that it has
been disabled. In tcsh, csh and similar shells use limit and
limit coredumpsize unlimited.

You don't have the necessary permissions to dump core, verify that
you have write permissions to the directory and that you also have
write permissions to the old core file if any such exist. Notice that
core dumps are placed in the dumping process' current directory which
could be different from the parent process.

Verify that the filesystem is writable and have sufficient free
space.

If a sub directory named core exist in the working directory no
core will be dumped.

If a file named core already exist but has multiple hardlinks the
kernel will not dump core. Either remove this link to the file or
delete all other links to the file.

Verify the permissions on the executable, if the executable has the
suid or sgid bit enabled core dumps will by default be disabled.The
same will be the case if you have execute permissions but no read
permissions on the file.

Verify that the process has not changed working directory, core
size limit, or dumpable flag.

Some kernel versions cannot dump processes with shared address
space (AKA threads). Newer kernel versions can dump such processes but
will append the pid to the filename.

The executable could be in a nonstandard format not supporting
core dumps. Each executable format must implement a core dump routine.

The segmentation fault could actually be a kernel Oops, check the
system logs for any Oops messages.

How do I change the name and location of the core file?
The two pseudofiles /proc/sys/kernel/core_uses_pid and
/proc/sys/kernel/core_pattern controls the naming of core
dumps in Linux 2.4 and later. If core_uses_pid is set to 1,
the pid will be appended to the end of any core dump. By default it is
only appended for multithreaded programs.

You can get more control over the name by changing
core_pattern. You can either use it to just change the name
of the file, which will still be dumped in the current directory. You
can also specify a full path to have it dumped in a fixed location
rather than the current directory. That can be handy because some file
systems are not nice to core dump to. (The root file system, tmpfs,
and nfs are examples of places where a core dump can hurt). You can
use certain escape sequences in the name specification

%%

A % character

%p

The pid (if you use this, it will no longer be appended).

%u

The user id as a number

%g

The group id as a number

%s

The signal causing the dump

%t

The time when the dump started

%h

The hostname

%e

The name of the executable

For example you could set the path as
/mnt/bigfs/coredumps/%u/%e which would put core dumps from
different users in different directories and name the core dumps
according to the program name. If used in this way you would have to
create the directories beforehand and chown them to the appropriate
user. If you wanted to access them by user name rather than uid, you
could create symlinks.

How do I get a core dump from a running program?
If you just want a program to terminate now and dump core, you can use
the SIGABRT signal. This signal can be send from another process using
kill. Or it can be send by the process itself using, kill, raise, or
abort. If you want a core dump without killing the process, things
start getting more tricky. You can create a child process by using the
fork system call, and let the child dump core. The init program
actually does this in its signal handlers. From the outside, the
kernel offers no simple way to get a core dump from a process without
killing it. But gdb have a gcore command that will do the hard work.
On Fedora Core you can also call gcore from your shell (in which case
it is just a script that call gdb).

#include <sys/prctl.h>
...
/* The last three arguments are just padding, because the
* system call requires five arguments.
*/
prctl(PR_SET_DUMPABLE,1,42,42,42);

If anybody know a clean and secure way to get core dumps from suid
executables without a recompile please tell me about it. Meanwhile I
hacked myself a module to use as
a last resort when you really need a core dump from a suid executable
and don't want to go through the entire compile process. When loaded
with the pid symbol set to a number, the dumpable flag of that process
will be enabled. The module will always return an error code to get
unloaded. (FIXME: Find out what the purpose of
/proc/sys/kernel/core_setuid_ok is)

Why can't I delete /proc/kcore?
Files in /proc are not really files, they are pseudo files. Kcore can
be used to debug the running kernel. The format is similar to a usual
core file. You don't have to worry about this file, it will always
exist and doesn't indicate any problem. If you use ls, the file will
appear to have a size close to the amount of kernel memory in the
system. But it does not take up any disk space or memory.

How do I find a user's home directory?
There are different ways depending on the situation. If you just want
to find the current user's home directory, you can use the
HOME environment variable. Simply use getenv("HOME")
in your program. You should verify that the returned pointer is not
null and that the string it points to is not empty. If the HOME
environment variable is not properly set up print an error and abort
the program. In emacs and most shells, the current users home is also
called "~". Here is a code example:

Finding a named users home directory is a different matter. To do that
you need to use the getpwnam function. Given a username it returns a
struct which amongst others contains a field with the home directory.
You have to verify the return code to know if the user exists. But if
you get a non NULL pointer, you can safely assume the returned
structure is indeed valid. In emacs and most shells, this is called
"~username". Here is a code example:

In the rare case of a suid executable needing the current users home
directory, you should not use the HOME environment variable. In
general a suid executable should not use anything from the
environment. In this case you can use the getpwuid function.
Here is a code example:

If you are not writing a suid executable, you should not worry about
the user changing the HOME environment variable. In fact you
should respect the users wish to use another home directory.
Respecting the changed home is not a bug it is a feature. Not
respecting HOME would be a bug.

crypt takes two strings as arguments and outputs a pointer to one
string. The same function can be used in two different ways to
generate and verify a password. The first argument string is the
password typed by the user. The second argument string is the salt.
You are allowed to pass a string with something appended to the salt,
crypt knows how long the salt is, and will only use the start of the
salt string if it is too long. The output is the salt concatenated
with the "encoded" password. This output string is what is actually
stored in the password file.

To verify a password give the password typed by the user, and the
entry from the password file as arguments to crypt. Compare the output
against the entry from the password file. Because the salt is a prefix
of the password file entry, crypt will find the salt in the string.

To generate an entry for a password file, your program must choose a
random salt. Actually there are different formats for the salt
depending on the type of encoding. Conventionally UNIX systems have
used a variant of DES. That is not very secure, so Linux have another
more secure version based on MD5, which is unfortunately less
portable. Since crypt look on the salt to find out which type of
password is being used, the above verification code works without
modifications for both types of passwords. When generating the
password you however have to make the decission as you generate a
salt. The salt must contain some random chars. To generate random
chars using /dev/random is the recomended approach. The random chars
are taken from the set of upper and lower case letters, digits,
period, and slash. The DES salt is just two random chars. The MD5 salt
is dollar one dollar eight random chars dollar. Here is some example
code.

Notice that crypt returns a pointer to a static buffer. That means you
don't need to call free on the pointer returned by crypt. OTOH the
string will be overwritten by the next crypt call. For that reason you
should not expect crypt to be thread safe. A program using crypt need
to be linked with -lcrypt

How do I find the amount of memory on the system?
On Linux you can use sysconf(_SC_PHYS_PAGES) to get the
number of physical pages available on the system. This number does
exclude some reserved pages, but it basically gives you the right
number to use in your applications. You can use
sysconf(_SC_PAGE_SIZE) to find out how large each page is.
Notice that you cannot just multiply the two numbers to find the
number of bytes of physical memory, as that multiplication might cause
an overflow. The number of free pages can be found with
sysconf(_SC_AVPHYS_PAGES).

Take care before using these values. Most programs shouldn't care how
much memory is in the system. Just allocate as much as you need and
use it. The system will take care of the rest. Never forget to verify
the return value from malloc.

Some algorithms need to know how much physical memory is available to
perform optimally. In such cases keep in mind, that your program is
not the only one running. Allocating a litle less is usually a good
idea, as a litle unused memory is better than trashing the swap
partition because you tried to allocate too much. Put an upper limit
on the number of pages you are going to allocate, as in some cases
like for example 32-bit architectures with more than 4GB of physical
memory, a single process cannot allocate all of it because of limited
address space. You could use a code snippet similar to this.

Rather than trying to use as much physical memory as possible you
could try to get the best possible performance from what memory you
get. If there is a good cache oblivious algorithm for the problem you
are trying to solve, you should use it.

A lot of kernel versions still overcommits memory, that means it will
allow you to allocate more than possible and start killing processes
as problems appear. To avoid processes getting killed unexpectedly,
you must make sure there is enough available swap space. My
recommendation is to make the swap space three times the size of the
physical memory.

First of all you could take a look on the source code of the simplest
of the existing filesystems. The simplest of all Linux filesystems
is ramfs.
There are two types of filesystems, those using a blockdevice and
those not using a blockdevice. Ramfs is one of those not using a
blockdevice, this group also contains all the network filesystems and
pseudo filesytems like procfs.

The filesystems you actually store on disks are those using
blockdevices, they can be divided into two groups. The simplest are
those that can be implemented using a get_block/bmap function, this
includes ext2, minix, and fat. The simplest from this group and yet
fully posix compliant is
minix. If the data
on disk is stored in some more compact way, the simple solution will
not be possible. At this point it obviously start getting a
little complicated, there does however still exist a quite simple
readonly filesystem which has it's own readpage implementation, that
is romfs.

Before implementing the filesystem as a kernel module, you should get
a feeling with the datastructures you will be using. If you are using
a blockdevice, you should write usermode tools to manipulate the
datastructures on the disk. You are going to need these pieces of code
anyway, because you will eventually need three tools for your
filesystem: mkfs, fsck, and debugfs. Once you have working usermode
code for the basic tasks, you can start doing it in the kernel. If you
are writing another kind of filesystem, there is still a few things to
do before writing kernel code. If you are writing a networking
filesystem, you want to know the protocols you are going to use, and
you want to test them in usermode first.

Does my kernel leak memory?
If you didn't modify the kernel, it probably doesn't leak memory. Of
course eventually kernels do get released with a bug that could cause
a leak. Before you conclude that you have found a bug, you will need
to know how to find and interpret informations about the memory usage.
The kernel will attempt to use most of the otherwise free memory for
caching disk conents. This means that there will normally be very
litle memory, that is actually free. Take a look on this output from
the free command:

The first line tells me, that only 5784KB of my 448MB of memory are
free. But in the second line where the 332MB of memory used for buffer
and cache memory is substracted from the used memory I see, that only
100MB of the memory is used for other allocations. Some of those 100MB
will be used for slabs. You can use slabtop to see information about
this memory usage. Another possibility is to use
my
script which will list slab allocations (requires kernel version
2.4). Look on a few selected
lines from the output:

Here we see, that most of the 33MB used for slabs is actually also
cached disk contents and management for the buffers. This is quite
normal, and those slabs will get freed if memory gets tight. The rest
of the memory in my system is mostly used by applications. Even if we
sum up all the allocations I have talked about, there will still be a
few KB of memory that is not accounted for. That is also normal, there
are different types of allocations that use get_free_pages() directly.
Those pages are only listed as used, and does not figure anywhere
else, but that doesn't mean they have leaked. Also notice that there
is a difference between the 448MB I have and the 438MB free shows.
That is because the mem_map will use 1.7% of my physical memory, and
the kernel image itself also use a few MB. Those are not part of the
total memory as reported by the kernel.

So now that you know a litle about the many different ways memory is
allocated under Linux, how do you tell, if there is a leak? First of
all if you suspect some action causes a leak, you should repeat it
over and over again. If memory is allocated only the first time, it is
probably not a leak. If there really is a leak, it will allocate more
memory each time. Look on the slabs, if one type of slabs keep
growing, you might have found a leak. Ignore the three slabs I told
you about earlier. If the leaked memory is not allocated through
slabs, it must be allocated at least one page at a time, in which case
it will quickly grow large enough to be easilly noticed. Eventually
the system will die when it really runs out of memory.

In addition to just monitoring the memory usage, you can also try to
force the system to free memory. Let a program allocate a lot of
virtual memory and touch each page. That will force the kernel to
free anything that can be freed. An easy way to do this is by letting
tail read from /dev/zero.

tail -c400m /dev/zero

This will allocate 400MB and access it over and over again.

I get SIGSEGV in malloc, calloc, realloc, or free. Are they buggy?
No, they are probably not buggy. These four functions are some of the
most often used code in the entire system. If there were any major
bugs, they would have been found and fixed long ago. When one of these
functions produce a SIGSEGV in more than 99% of the cases it will be a
bug in your code. Imagine one of the following situations:

You allocate too little memory and write beyond the end of the
allocated memory.

You write to memory after you have already freed it.

You free some memory twice.

In all these cases, you damage the internal data structures used by
malloc and friends. This is the real reasons for those functions to
give a SIGSEGV.

There are tools for debugging this kind of bugs. First of all, you can
try setting MALLOC_CHECK_=2 in the environment. That may give you a
core dump to debug at the first occurrence of a memory-management
related problem. (Notice that with some library versions suid
executables will ignore MALLOC_CHECK_ unless a file named
/etc/suid-debug exists.)
The second simplest tool to
use is electric fence, just add -lefence to the command line while
compiling. Your program will still give you a SIGSEGV, but it no more
happens in malloc, now it happens in code closer to the real bug.
Notice there are a few environment variables which you must try
different combinations of to spot all memory access bugs. Don't use
electric fence in your final program, it will make the program slower
and increase the memory usage.

My program doesn't work, will you look at my source?
No, we will not look at your source, and you probably don't want us to
see your source anyway. Take a copy of your entire source and start
stripping it down, remove all parts unrelated to your problem. When
you have an absolutely minimal program still demonstrating your
problem, then consider posting it. But before posting it read it over
a few times, you might actually find the bug without our help, and you
might actually find that it can get stripped even more down.

When you have a small piece of code that doesn't work, and you don't
understand why, then you can ask us. Use cut'n'paste when writing your
posting. Typing it all in again is a waste of your time, and it is a
waste of our time too, because we will just see the typos you made and
never find the real problem.

You should provide us with source that we can compile without warnings
and test. (Unless of course your question is about compilation
problems.)

One of the simplest pieces of example code you can find is the
script program from the util-linux package. script.c
is only around 8KB. It chooses between two different aproaches
depending on the HAVE_openpty define. Using openpty
is the best aproach and should work on any recent Linux distribution.
(If you have problems compiling script.c try removing the
localization stuff and add "#define _".)

Where is the itoa
function?
There is no itoa function in ansi C. If you know a standard
specifying itoa please tell
me about it. Here are a few
suggestions on how to use sprintf when you want to convert an integer
to a string:

many of them. The only limitation is the process address space
(which grows slowly on each dlopen, which mmap-s it text & data
segments). In practice, on linux x86 with libc6, a program can dlopen
30 000 -thirty thousands- different shared objects (and probably much
more). See
http://starynkevitch.net/Basile/manydl.c
for an example

not permanently (after the dlopen succeed). But internally, dlopen
needs to open the shared object (but closes it after having mmap-ed
it).

What are vdso and linux-gate.so?
It is a virtual dynamically-linked shared object, which implements the
gate used to perform system calls. Depending on the architecture and
kernel version there are different ways to do it. The kernel provides
a page with a version suitable for the actual setup, which can then be
used by all processes. On this page you can read more about
linux-gate.so

What is a misc device?
There are two types of device special files representing the two
different types of devices: block devices and character devices. A
device identifier is split into two numbers, the major and the minor
number. The major number is in the range 1-255 and the minor number is
in the range 0-255. This allows for a total of 130560 devices 65280 of
each type.

Though the number of identifiers may seem large, we can still run
short of device numbers. This is due to the way they are allocated.
Usually a driver will register a major number, and thus this single
driver will own all 256 minor numbers under this major.

Because many drivers only needs a single device number, a special
character device, the misc device, has allocated one major of which
the minors can be allocated individually.

In /proc/devices you can see the list of character- and
block-drivers currently loaded, the misc device will always be on the
list of character devices. In /proc/misc you can see the list
of misc-drivers currently loaded. The assigned numbers is listed in
the file linux/Documentation/devices.txt.

Where can I find the sourcecode?
That depends on what distribution you are using, and what component of
the system you want the source for. Most components of the system are
available from independent websites. The kernel sources can be found
on kernel.org, and other components
can be found on other websites.

Often the sources used in a distribution are not the original sources,
but rather a patched version. If you want those sources, you will need
to know how to find them for your particular distribution.
Availability of sources differs between distributions, but of course
all distributors must respect the license. Many components are
licensed under GPL,
and for those components the modified sources must be available.

Sometimes you will get the sources with the distribution, sometimes
you can order them separately, and sometimes you can download them
from the net.

RedHat You can use the rpm command to find out
which package a file comes from. Type rpm -qif filename to
get a brief description of the package. You can type rpm -qil
packagename to get information about a package and a list of the
files in it. Read the documentation about rpm for
more options. In the info you will find the name of the .src.rpm file.
That file you can find on one of the source rpm CDs that came with
your distribution, or you can download it from
redhat.com. For example you can find
the Red Hat Linux 9 sources in
http://ftp.redhat.com/pub/redhat/linux/9/en/os/i386/SRPMS/
and sources for updated packages in
http://updates.redhat.com/9/en/os/SRPMS/.
When you have the source
you can install it with the command rpm -i filename.src.rpm.
By default it will install sources in /usr/src/redhat, if you
are not root, you will need to change that default. To change the
default create a file named ~/.rpmmacros and write a single
line with %_topdir followed by a tabulator and the full path
to the directory. The directory must contain subdirs with the names
BUILD, RPMS, SOURCES, SPECS, and SRPMS just like
/usr/src/redhat does. Example session follows: