Oracle Blog

The Lumberjack

Tuesday Jul 17, 2007

slog_blogI've
been slogging for a while on support for separate intent logs (slogs)
for ZFS.
Without slogs, an intent log is allocated dynamically from the main pool. It
consists of a chain of varying block sizes which are anchored in fixed
objects. Specifying a separate log device enables the use of limited
capacity but fast block devices such NVRAM and Solid State Drives
(SSDs).

Using chained
logs (clogs?) can also lead to pool fragmentation. This is because log
blocks are allocated and then freed as soon as the pool transaction
group has committed. So we get a swiss cheesing effect.

Interface

zpool create <pool> <pool devices> log <log devices>

Creates a pool with separate intent log device(s). If
more than one log device is specified then writes are load-balanced
between devices. It's also possible to mirror log devices. For example
a log consisting of two sets of two mirrors could be created thus:

zpool create whirl <pool
devices> \\

log mirror c1t8d0 c1t9d0 \\

mirror c1t10d0 c1t11d0

zpool add <pool> log <log devices>

Creates a log device if it doesn't exist, or adds extra log devices if it does.

zpool replace
<pool> <old device> <new device>

Replace old log device with new log
device.

zpool attach <pool> <log device> <new log device>

Attaches a new log device to an
existing log device. If the existing device is not a mirror then a 2
way mirror is created. If device is part of a two-way log mirror,
attaching new_device creates a three-way log mirror, and so on.

zpool detach
pool <log device>

Detaches a log device from a mirror.

zpool status

Additionally displays the log devices

zpool iostat

Additionally shows IO statistics for
log devices.

When a slog is full or if a non mirrored log device fails then ZFS will
start using chained logs within the main pool.

Performance

The performance of databases and NFS is dictated by the latency of
making data stable. They need to be assured that their transactions are
not lost on power or system failure. So they are heavily dependent on
the speed of the intent log devices.
Here's some database performance testing results:

Test program creates 32 threads and each does 8K O_DSYNC writes
randomly to a 400MB byte file.

Test hardware was a Sun X4500 (aka thumper) with 48 x 500GB disks.

The NVRAM is the battery backed pci Micro Memory pci1332,5425
card.

Table values are MB/s

Main
pool disks

1

2

4

8

16

32

0 slogs

11

14

17

15

16

13

1 slog

12

12

12

12

12

11

2 slogs

17

17

17

19

19

16

4 slogs

17

16

15

15

16

16

8 slogs

18

19

20

18

16

18

NVRAM

221

221

218

217

215

217

I also ran the same without write disk cache flushing
(echo zfs_nocacheflush/W 1 | mdb -kw)
Note, this should not be done
on a real system unless the device cache is non-volatile.

Main
pool disks

1

2

4

8

16

32

0 slogs

33

83

123

136

142

143

1 slog

45

46

44

45

45

46

2 slogs

97

99

90

94

94

95

4 slogs

124

125

127

124

127

127

8 slogs

135

137

134

138

138

138

NVRAM

225

220

226

226

226

227

Note, these tables can be a bit mis-leading. If you had 2 disks
you'd have a choice of 2 main pool device or 1 slog and 1 main
pool device. So looking at the table you should compare the
following entries:

2 main pool: 83MB/s

1 slog, 1 main pool: 45MB/s

The first table highlights some issues of scaling which will be
investigated further.

Perf summary

For this micro-benchmark and from limited other perf testing it makes
sense to only use fast devices for the slog. However, there may be some
cases where using regular disks as slog disks is faster than putting the same
disks in the main pool.

Status/Bugs

This support was recently putback into Solaris Nevada build snv_68.
Here's a list of slog bugs - fixed and to be fixed.

6574298
"slog still uses main pool for dmu_sync()" - now fixed in snv_696574286 "removing
a slog doesn't work" - now fixed 6575965 "panic/thread=2a1016b5ca0:
BAD TRAP: type=9 ...:" - panic when no main pool devices present - now fixed in snv_83