Serendipity was on my side. I was fortunate enough to catch some of Dan Roberts' live Q&A about the future of OpenSolaris. And apparently, so were members of the press. There's a good summary on Datamation today - OpenSolaris Alive and Well at Oracle.

Thursday Jan 14, 2010

snv_132 has a nice fix to avoid a scenario I've seen plenty of times in the lab (and not so much in the field, thankfully :). FMD is a service within SMF and as a service it will be restarted if it crashes. And all things being equal, that's a good thing. FMD also produces a core dump in /var/fm/fmd upon crashing, so we can go figure out what's wrong.

SMF's restarter algorithm includes protection to avoid restarting a service too rapidly. But, the protection parameters are slanted toward a service that starts very rapidly. FMD can take 10s of seconds to start, sometimes longer. If there's a nasty bug in FMD or one
of its modules that causes FMD to crash on start, SMF's restarter algorithm doesn't detect this and continually restarts the service. (The same holds true for any service that takes "a while" to start.) And with each restart comes a core file in /var/fm/fmd, which is typically part of the / filesystem. From there, it's only a matter of time before the filesystem fills up.