Tuesday Jun 14, 2005

Solaris originated from BSD and SVR4 UNIX. Over the years, many
enhancements have been made to address business needs. One area
of big change is the I/O framework and device name management.
Traditional UNIX kernel configures all devices at boot time.
Device access is supported via two indexed arrays, bdevsw[] and
cdevsw[], for block and character devices, respectively. The array
elements contain references to driver entry points compiled into
the kernel. Applications access device by opening device special
files, created via the mknod(2) syscall. A device special file has
a type (block or char) and a device number (dev_t). The type
informs the kernel whether to use bdevsw[] or cdevsw[].
The device number contains two parts, major and minor. The major
number is used to index into the arrays, and minor number is used
by driver only, typically to determine which device instance to access.
Solaris modified and extended the model in many ways.

bdevsw[] and cdevsw[] are merged into a single array,
devopsp[], indexed by a major number common to both block
and character drivers. The elements of devopsp[] reference
driver's dev_ops structure, containing driver entry points
for device autoconfiguration (probe, attach, detach),
bus nexus oriented operations (bus_ops), and block/char
operations (cb_ops).

All drivers are loadable kernel modules. The modules are
loaded on-demand. During system startup, the kernel only loads
those driver modules required to boot the system. As a result,
a normal system boot may not initialize all hardware attached
to the system. To initialize all hardware at boot time, a
reconfiguration boot (boot -r) is required.

Devices are represented in the kernel by a tree of device
information (struct dev_info) nodes. Inner (nexus) nodes represent
bus controllers and adaptors while leaf nodes represent
devices. Leaf nodes bind to "leaf" drivers, which implements
cb_ops to handle I/O requests. Inner nodes bound to bus nexus
drivers, which implements bus_ops to satisfy leaf driver
requests or pass the requests to parent bus or the hardware
platform. This design allows generic leaf drivers to be
written without knowing the details of the transport. For
example, the scsi disk driver (sd) can be used to control
many types of disks such as SCSI-2, USB storage, Fibre-channel,
and atapi CD-ROM.

A private namespace, /devices, is introduced to mirror device
names in the Open Boot PROM (obp) defined in the IEEE 1275
standard. The namespace reflects the physical topology of
I/O devices and bus interconnects. This namespace is controlled
by a filesystem named "devfs", first introduced in Solaris 10.
A key feature of devfs is that a filesystem lookup operation
actually drives configuration of the specific device instance
corresponding to the pathname. For example,
# ls /devices/pci@0,0/pci-ide@11,1/ide@1/sd@0,0:a
would cause the ATAPI cdrom drive to attach even if it is
currently not configured in the kernel.

The public names in /dev are symbolic links to a pathname
in /devices. The /dev names are created at Solaris Install time
by devfsadm(1M). When new devices are added, the /dev name space
is updated via devfsadmd(1M), the daemon version of devfsadm.

The current Solaris I/O framework is flexible and scales well
from a single CPU system to high-end servers with 100+ CPUs and
1000+ devices. In addition, I/O devices can be reconfigured
dynamically without rebooting the system. This functionality
is also referred to as Dynamic Reconfiguration or hotplugging.
In a future blog entry, I hope to explain in more detail the
inner workings of devfs and the kernel device tree.