I've got a second hard drive, identical to the main one, onto which I
(nightly) duplicate the main hard drive, thus ensuring that I always
have a complete backup that is less than a day old.

I wrote a script to do this, and cron fires it off at 4am. I log the
results of that script, complete with a running account of how far it's
got [1] so I can see when the slowdown (if there is one) happens. A
sample is at [url]http://royalty.no-ip.org:81/backup-throughput-2006-11-24.txt[/url] .
Columns are time, GB_copied, MB/s. You can see that between 4:31:13 and
4:31:43 throughput dropped from40: MB/s to ~34 MB/s. Apparently
something started then that used up the IDE or PCI bandwidth (dd isn't
very CPU-bound), probably something invoked by cron, as nobody should be
on the computer at 4:30. I'm guessing the mystery process runs after
something else finishes (which might happen at a different time each
day), because the slowdown happens at a slightly different time each
day. Some days the backup doesn't happen at all, as if dd started then
immediately exited; I haven't figured that one out yet.

Inside the loop that does "kill -USR1 ; sleep" until dd exits, I also
have it dump a process list to disk, sorted as top(1) does, but I see
nothing peculiar at that time.

From two successive such readings I can compute the throughput in the
interval.

--
I firmly believed we should not march into Baghdad ...To occupy Iraq
would instantly shatter our coalition, turning the whole Arab world
against us and make ... a latter-day Arab hero assigning young soldiers
to a fruitless hunt for a securely entrenched dictator[.] -- GHWB

Hactar wrote:
[color=blue]
>
> I've got a second hard drive, identical to the main one, onto which I
> (nightly) duplicate the main hard drive, thus ensuring that I always
> have a complete backup that is less than a day old.
>
> I wrote a script to do this, and cron fires it off at 4am. I log the
> results of that script, complete with a running account of how far it's
> got [1] so I can see when the slowdown (if there is one) happens. A
> sample is at [url]http://royalty.no-ip.org:81/backup-throughput-2006-11-24.txt[/url]
> .
> Columns are time, GB_copied, MB/s. You can see that between 4:31:13 and
> 4:31:43 throughput dropped from40: MB/s to ~34 MB/s. Apparently
> something started then that used up the IDE or PCI bandwidth (dd isn't
> very CPU-bound), probably something invoked by cron, as nobody should be
> on the computer at 4:30. I'm guessing the mystery process runs after
> something else finishes (which might happen at a different time each
> day), because the slowdown happens at a slightly different time each
> day. Some days the backup doesn't happen at all, as if dd started then
> immediately exited; I haven't figured that one out yet.[/color]

You would be well-advised to treat your backup process with a little more
care; start by running the backup live, and watch what it actually does.
Then I would also enable verbose reporting (or a debug mode) when running
dd, as that will allow you a chance to see what happens when it suddenly
dies.
Consider using something a little less "atomic" than dd; either rsync or an
actual backup program will be faster and more reliable.

As to what could be started around 4:30 every night - it is most likely one
of the other cron processes, like updatedb or logrotate.
Both of these will consume significant disk resources for a short period.

Also, if your system partitions are on the drives used for the backup, of
course you'll never get a constant throughput - on IDE, this is fantasy in
any case.

--
All your bits are belong to us.

09-30-2007, 01:04 PM

unix

Re: backup throughput

In comp.os.linux.hardware Hactar <ebenZEROONE@verizon.net>:
[color=blue]
> I've got a second hard drive, identical to the main one, onto which I
> (nightly) duplicate the main hard drive, thus ensuring that I always
> have a complete backup that is less than a day old.[/color]
[..]
[color=blue]
> 4:31:43 throughput dropped from40: MB/s to ~34 MB/s. Apparently
> something started then that used up the IDE or PCI bandwidth (dd isn't
> very CPU-bound), probably something invoked by cron, as nobody should be
> on the computer at 4:30. I'm guessing the mystery process runs after[/color]

Remote guess, updatedb fired up from cron.daily, check when and
what is launched from cron and the cron logfiles in addition.
There is no need to guess.

Good luck

BTW
I'd suggest looking into rsync/unison to mirror to the second
disk, this should be much more effective after the first run.