Re: Using SMARTd to monitor drives

Thane Sherrington wrote:

> I've read through the manpage on SMARTD, but clearly I'm too dense
> to grasp it.
>
> I understand that I can do a devicescan in the smartd.conf file to
> scan for all devices, so I have the following in the smartd.conf
>
> DEVICESCAN -l error -l xerror -s L/../.././(00|06|12|18)
>
> Now as I understand it, this should log errors and extended errors, and
> run a long test at 12AM, 6AM, 12PM and 6PM.

Yes, but this does not actually 'log errors'. It issues LOG_CRIT
messages if the maximum of both error counts has increased. Pending and
offline uncorrectable sectors (attribute 197 and 198) are also checked
by default unless -C 0 -U 0 is specified.

Note that you could print the test schedules with 'smartd -q showtests'
(see smartd man page).

> Am I doing this correctly?

Yes and no. It is probably a bad idea to run 4 long(!) tests a day.

Here is the default setting I use on one machine ('console' and ',ns'
are Windows specific). One long test on Saturday, short tests each other
day.

It really would be nice if one could control the timing of reports
Christian. As above, I've found that it doesn't matter how often I
test, the running of smartd-notifier remains random. Here's my little log:

... and I'm not sure the 'error count increased' really tells me
anything other than that I still have 77 pending sectors as a have had
for such a long time. I suspect I'm going to try to kill the 'error
count' messages. I just want a report of changes in the actual health
of the disk and I'd like to be able to have those changes reported every
time I do a check of the disk. Why check four times a day when there
might or might not be a message, and that at some unknown time? "smartd
-q onecheck" is verbose about the fact that it might, or might not be
sending a message, but how about a guaranteed message? It's like I can
ask my secretary four times per day if I've missed any calls, but she
might or might not answer me and if she does, it will be at random times
that I can't predict. I'm beginning to think that my disk is actually
perfectly healthy apart from the 77 damaged sectors due to a bump while
it was running.

Re: Using SMARTd to monitor drives

Ray Andrews wrote:
> ...
> It really would be nice if one could control the timing of reports
> Christian.

The current possible settings -M once (default if state persistence is
disabled) and -M daily (default if state persistence is enabled) are
IMO sufficient. The warning emails (or scripts) are only intended to
alert the admin about serious problems immediately after detection -
with optional daily reminders if the problem persists.

> As above, I've found that it doesn't matter how often I
> test, the running of smartd-notifier remains random.

No, if smartd is running 24x7 a reminder email is sent each ~24 hours
for each device and each type of disk problem detected. If smartd is not
running 24x7, it depends on state persistence setting (enabled by
default on Debian/Ubuntu).

It is logged to syslog as: "Sending warning via MAILER to ADDRESS ..."
immediately after the actual warning message.

Re: Using SMARTd to monitor drives

On 31/01/17 01:21 AM, Christian Franke wrote:

It is logged to syslog as: "Sending warning via MAILER to ADDRESS ..."
> immediately after the actual warning message.

That's interesting, how do I enable that?
>
> This means that another SMART self test has failed. Did you check the
> self-test log(s) in output of 'smartctl -x /dev/sdb' ? A failed
> self-test typically reports the LBA of the first bad sector.

It would sure be nice if there was a way of just getting the errors or
important changes. The above is so hard to use as far as seeing
important things.
> error no matter what I do :-/
> In a previous mail from 2017-01-19, I already explained how to disable
> the pending sectors alert with the -C directive.

Yes, BUT, I want to see if the count ever increases. That is, I don't
want to see '77' over and over again, but I do want to see if it goes to 78.

You know Christian, you have a very powerful program, but for a
non-expert it can be difficult to figure out how to make it 'just work'
in a simple, easy and normal way. I had not expected to devote dozens
of hours to figuring this out, I just want timely messages of *new*
problems. The complexity gives one great power, but one is lost trying
to do simple things. A basic tutorial or howto would be very nice!

Re: Using SMARTd to monitor drives

Re: [smartmontools-support] Using SMARTd to monitor drives

RA> You know Christian, you have a very powerful program, but for a
RA> non-expert it can be difficult to figure out how to make it 'just work'
RA> in a simple, easy and normal way. I had not expected to devote dozens
RA> of hours to figuring this out, I just want timely messages of *new*
RA> problems. The complexity gives one great power, but one is lost trying
RA> to do simple things. A basic tutorial or howto would be very nice!

This certainly isn't an attempt at snark - but smartmontools is, I think, intended at sysadmin type people. There are tools for end-users that are less technical and more idiot-light style. I think, in general, that I'd prefer smartmontools to keep the detailed technical focus and not have the more friendly interface, if doing so meant that the technical prowess of smt had to be diminished. [All projects are resource limited - and spending more time on one thing usually means less somewhere else.]

And note that smt is provided to users at no cost. Those "easy" programs aren't free and, IMO, provide less - so the "cost" for them is dual; you get less technical detail and programmatic control, and a closed eco-system, with no access to source code, etc along with the license fees.

>A basic tutorial or howto would be very nice!

And here's where you can "pay-it-forward" so to speak. Christian has done all this work for you [and me] for no charge. How about you spend some time giving back to the community and do a well-written FAQ? Again, this isn't meant as an attack, just a gentle reminder that this is a community, and a community's success is measured by how well the whole community steps up to take on their responsibilities and offer their time/money/resources.

Re: Using SMARTd to monitor drives

> >A basic tutorial or howto would be very nice!
>
> And here's where you can "pay-it-forward" so to speak. Christian has done all this work for you [and me] for no charge. How about you spend some time giving back to the community and do a well-written FAQ? Again, this isn't meant as an attack, just a gentle reminder that this is a community, and a community's success is measured by how well the whole community steps up to take on their responsibilities and offer their time/money/resources.

Thanks for your impassioned plea!

I will be delighted about tutorials on recent smartmontools version
especially when covering the handling of notifications
and will link to these on our homepage:

Re: Using SMARTd to monitor drives

This certainly isn't an
attempt at snark - but smartmontools is, I think, intended at
sysadmin type people. There are tools for end-users that are
less technical and more idiot-light style.

I'm not an idiot but at the same time there are folks who want
utility without devoting years of study to get it.

I think, in
general, that I'd prefer smartmontools to keep the detailed
technical focus and not have the more friendly interface, if
doing so meant that the technical prowess of smt had to be
diminished.

Does it really have to be one or the other?

[All
projects are resource limited - and spending more time on one
thing usually means less somewhere else.]

And note that smt is provided to users at no cost. Those
"easy" programs aren't free and, IMO, provide less - so the
"cost" for them is dual; you get less technical detail and
programmatic control, and a closed eco-system, with no access
to source code, etc along with the license fees.

>A basic tutorial
or howto would be very nice!

And here's where you
can "pay-it-forward" so to speak. Christian has done all
this work for you [and me] for no charge. How about you
spend some time giving back to the community and do a
well-written FAQ? Again, this isn't meant as an attack,
just a gentle reminder that this is a community, and a
community's success is measured by how well the whole
community steps up to take on their responsibilities and
offer their time/money/resources.

No, that's fine, a very valid request. You know, I believe that the
only ones who can write a useful how-to are the people who have just
now learned how to do something because once you are an expert, you
forget what it was like to not be an expert. We 'forget what we
know', that is, we know things without being aware of the knowledge
and so we forget to write them down for the one who knows nothing --
things become too obvious to mention. For Christian it is now
second nature how SMT works so he has no empathy for the newbie,
that's understandable. If I myself could become competent I'd do
just as you say! Alas, I'm still trying to figure it out myself.

Re: Using SMARTd to monitor drives

On 03/02/17 01:28 PM, Gabriele Pohl wrote:
> https://www.smartmontools.org/wiki/TocDoc#Tutorials> where you can find several about older versions already
> which are still helpful as Christian pays high attention
> on backward compatibility ~
>
> fyi and cheers!
>
> Gabriele
Many thanks!

Re: Using SMARTd to monitor drives

This certainly isn't an attempt at snark - but smartmontools is, I think, intended at sysadmin type people. There are tools for end-users that are less technical and more idiot-light style.

I'm not an idiot but at the same time there are folks who want utility without devoting years of study to get it.

I think you're putting us on a bit. Years of study? :)
But yes, it is rather voluminous, the options and such. *nix tools are like that, often. Go read the man page for find sometime. But to do so much in a command-line tool, that's usually the way of it.

I wish it were different, when I'm trying to do something complex [for example in find] and I have to spend 45 minutes trying to make it work, and can't. I wish it were easy. I wish a google search would give me something I could mostly cut/paste in. [As long as I'm wishing, I'd like a pony too!]

But alas, sometimes it's just not that easy.

I think, in general, that I'd prefer smartmontools to keep the detailed technical focus and not have the more friendly interface, if doing so meant that the technical prowess of smt had to be diminished.

Does it really have to be one or the other?

I'm sure it wouldn't, if resources were unlimited. But Christian only has so much time he can work on this. Heck, I wish he'd write a native Windows port with a built in mailer and GUI. But to do that, he'd have to either get someone to pay for it, or cut somewhere else. Since I'm sure I can't pay for his dev time, I make do with what I get: A great tool that does a vast amount of stuff, and supports a vast number of devices. I've written my own code to handle routine-checks, emailing when non success events occur etc. I did it in power-shell. No, it's not at all perfect - but it works and Christian didn't have to forgo some other development work that's probably more important and certainly demands more technical skill that he has and I don't.

If I myself could become competent I'd do just as you say! Alas, I'm still trying to figure it out myself.

I hope you'll pause and take the time, once you've become more competent, to write up what you can. At worst, no one will use it. At best, it could be a distinct help to many.

Re: Using SMARTd to monitor drives

But yes, it is rather voluminous, the options and such. *nix
tools are like that, often. Go read the man page for find
sometime. But to do so much in a command-line tool, that's
usually the way of it.

God, I know. I sometimes make cut down man pages that get rid of
the settings for the Coptic calendar and base 7 time reporting --
one will only ever use 10% of what's available in some commands.
And the text is usually written by experts for experts, that is, you
have to already be an expert to even understand what's being said.

But alas, sometimes it's just not that easy.

Sure, that's the culture. But I rebel! I like docs that tell real
people what they need to know for real situations in language that
they can understand.

I
think, in general, that I'd prefer smartmontools to keep
the detailed technical focus and not have the more
friendly interface, if doing so meant that the technical
prowess of smt had to be diminished.

Does it really have to be one or the other?

I'm sure
it wouldn't, if resources were unlimited. But Christian only has
so much time he can work on this. Heck, I wish he'd write a
native Windows port with a built in mailer and GUI. But to do
that, he'd have to either get someone to pay for it, or cut
somewhere else. Since I'm sure I can't pay for his dev time, I
make do with what I get: A great tool that does a vast amount of
stuff, and supports a vast number of devices. I've written my
own code to handle routine-checks, emailing when non success
events occur etc. I did it in power-shell. No, it's not at all
perfect - but it works and Christian didn't have to forgo some
other development work that's probably more important and
certainly demands more technical skill that he has and I don't.

Sure! We don't forget this is a volunteer effort.

If I myself could become competent I'd do just
as you say! Alas, I'm still trying to figure it out
myself.

I hope
you'll pause and take the time, once you've become more
competent, to write up what you can. At worst, no one will use
it. At best, it could be a distinct help to many.

Maybe! I'm a pretty famous document writer in my other life.

Cheers!
-Greg

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot

>> In a previous mail from 2017-01-19, I already explained how to disable
>> the pending sectors alert with the -C directive.
> Yes, BUT, I want to see if the count ever increases. That is, I don't
> want to see '77' over and over again, but I do want to see if it goes to 78.

That is very useful thanks. Honestly I thought you could only examine
syslog under systemd using the official tools like 'journalctl' , they
are always saying that it's binary now. I had no idea you could just
grep. The entire subject of access to logs is another thing that has
become very difficult to understand. Until now, I've been using
"journalctl -p err -b -1", and not seeing very much :(
>
>> Yes, BUT, I want to see if the count ever increases. That is, I don't
>> want to see '77' over and over again, but I do want to see if it goes to 78.
> This is exactly the effect of the example from the mentioned mail.
I didn't realize that. The docs seem to say it kills the entire test.
As I replied last night, all of this is obvious to you, but not to
others. But sorry to test your patience.

Re: Using SMARTd to monitor drives

Ray Andrews wrote:

> On 04/02/17 03:24 AM, Christian Franke wrote:
>> Ray Andrews wrote:
>>> On 31/01/17 01:21 AM, Christian Franke wrote:
>>>
>>>
>>> Your question suggests that you never checked the SYSLOG output of
>>> smartd. Please do this first before asking further questions here.
>>>
>>> On Debian, try:
>>> grep -w smartd /var/log/daemon.log
>>> or for older logs:
>>> zgrep -w smartd /var/log/daemon.log.*
>>>
> That is very useful thanks. Honestly I thought you could only examine
> syslog under systemd using the official tools like 'journalctl' , they
> are always saying that it's binary now. I had no idea you could just
> grep.

Fortunately, the traditional log files created by some variant of the
syslog daemon (on Debian: rsyslogd) still exist even on systems that
have moved to systemd.

> The entire subject of access to logs is another thing that has
> become very difficult to understand. Until now, I've been using
> "journalctl -p err -b -1", and not seeing very much :(

Or course, as you restricted the messages to LOG_ERR or worse (-p err)
and to previous boot (-b -1). Most of smartd's messages are invisible
then because these use LOG_INFO level.

Note that systemd journals are often not persistent by default (at least
on Debian). Using 'journalctl -b -1' always fails then.

Re: Using SMARTd to monitor drives

On 05/02/17 07:30 AM, Christian Franke wrote:
>
> Or course, as you restricted the messages to LOG_ERR or worse (-p err)
> and to previous boot (-b -1). Most of smartd's messages are invisible
> then because these use LOG_INFO level.

Live and learn. Smartd is the first program that has required me to
look any further into these logs, so before now I've only learned to
list the errors and I thought all these messages would be considered
errors. There is so much to know, and very little help for the
beginner. Things can become so complex that even the masters forget
how they work. IMHO there should always be a simple introduction to any
program and a simple way to use it for simple things. Anyway, after
what you showed me I wrote an alias:

... and now I can try to figure out how to filter out the pointless
messages. BTW interesting that grep considers the file binary, but you
can do text searches.
> Note that systemd journals are often not persistent by default (at least
> on Debian). Using 'journalctl -b -1' always fails then.

Re: Using SMARTd to monitor drives

> On 04/02/17 03:24 AM, Christian Franke wrote:
>> Ray Andrews wrote:
>>> On 31/01/17 01:21 AM, Christian Franke wrote:
>>>
>>>
>>> Your question suggests that you never checked the SYSLOG output of
>>> smartd. Please do this first before asking further questions here.
>>>
>>> On Debian, try:
>>> grep -w smartd /var/log/daemon.log
>>> or for older logs:
>>> zgrep -w smartd /var/log/daemon.log.*
>>>
> That is very useful thanks. Honestly I thought you could only examine
> syslog under systemd using the official tools like 'journalctl' , they
> are always saying that it's binary now. I had no idea you could just
> grep. The entire subject of access to logs is another thing that has
> become very difficult to understand. Until now, I've been using
> "journalctl -p err -b -1", and not seeing very much :(

With the systemd journal you could do:

journalctl | grep -w smartd

with the same effect as the commands above. Internally, the journal is
binary, but you can inspect the text and work on it as always - albeit
more slowly.

I don't know about your distro, but in openSUSE both journal and syslog
can coexist, although the journal is the boss (and the only one by
default). A syslog daemon had to be installed, I used rsyslog. I don't
have permanent journal files, and I limited its size.

Re: Using SMARTd to monitor drives

On 06/02/17 11:16 AM, Carlos E. R. wrote:

> On 2017-02-06 19:49, Ray Andrews wrote:
>> On 06/02/17 03:41 AM, Carlos E. R. wrote:
>>> journalctl | grep -w smartd
>> Thanks, that's better. The previous thing stopped giving me any
>> information. One day I'm going to have to study the entire logging
>> mechanism.
> I don't know about your distro, but in openSUSE both journal and syslog
> can coexist, although the journal is the boss (and the only one by
> default). A syslog daemon had to be installed, I used rsyslog. I don't
> have permanent journal files, and I limited its size.

God knows, I've just not given it much study. As I said, before smartd,
all I ever cared about was errors.

BTW, just now I wondering what the relationship is between

# default is every 30 minutes:
#smartd_opts="--interval=1800"

in '/etc/default/smartmontools'

and the '-i' switch used in '/etc/smartd.conf'. The latter seems to
have a different unit but in 'man smartd.conf' the switch is mentioned
only in the text, it does not have a section devoted to it. It seems
legal to use but I'm not sure what's going on there and the value can't
be above 255, it seems.

And:

I've tried '-n standby' but it seems to be ignored, all my disks spin up
after '$ smartd -q onecheck' even if I had just spun them down:

Re: Using SMARTd to monitor drives

On 2017-02-06 20:46, Ray Andrews wrote:

> BTW, just now I wondering what the relationship is between
>
>
> # default is every 30 minutes:
> #smartd_opts="--interval=1800"
>
> in '/etc/default/smartmontools'
>
> and the '-i' switch used in '/etc/smartd.conf'.

Well, I don't even have '/etc/default/smartmontools', it appears to be
an addition of your distribution. On a previous email, Christian Franke
said (9 Dec 2016 07:44:44 +0100, Re: '#start_smartd=yes' in
/etc/default/smartmontools is ignored):

«Please note that /etc/default/smartmontools and its evaluation in the
/etc/init.d/smartmontools is specific to the Ubuntu (and Debian)
packages. It is not part of upstream smartmontools code.»

«The settings in this file may no longer be effective for distributions
using systemd.»

Thus I don't know what does settings do, my documentation does not apply
to your setup.

> And:
>
> I've tried '-n standby' but it seems to be ignored, all my disks spin up
> after '$ smartd -q onecheck' even if I had just spun them down: