Re: [Loc-xferutils-mail] bag update?

Timestamp would definitely be more efficient in some circumstances. If you (or anyone else) wanted to write a Completer implementation that relied on a timestamp, I'd be happy to merge into the code base.
I believe you can limit by a list of files or filepaths -- that leaves the programmer the flexibility to determine how that list is compiled.
Best.
--Justin
________________________________________
From: Illtud Daniel [illtud.daniel@...]
Sent: Tuesday, August 30, 2011 5:25 AM
To: Littman, Justin
Cc: loc-xferutils-mail@...
Subject: Re: [Loc-xferutils-mail] bag update?
Hi Justin, thanks for the reply,
> When executed from the commandline, the update command will check
> each file checksum. (So, yes, it may take a long time.)
Wouldn't a timestamp approach be better (more efficient at least)?
I understand that in some cases you'd want to ensure that the
existing files matched their checksums, and I guess it's
possible for somebody to get a file on there with a timestamp
earlier than the manifest timestamp, but for my use case the
option to do a timestamp-based update would be a big blessing.
> If working directly with the java code, there are options for
> limiting the files that are checksummed.
Is this by passing a file list, or can it do its own selection
based on some properties?
Since
--
Illtud Daniel illtud.daniel@...
Prif Swyddog Technegol Chief Technical Officer
Llyfrgell Genedlaethol Cymru National Library of Wales

Thread view

Hi all,
I'm trying to make sense of how the bag update works - it seems to
be taking a looong time on the update we've initiated. Is there
documentation on this command?
I've looked in the code, but my java isn't great, so I'm having
difficulty in following where it's going when invoked with 'update'.
How does it find files to add to the manifest? My (possibly naïve)
approach (and one I've considered adding to the python version of
the libs, although I'm no python programmer either) is to find
all files in data/ with a timestamp newer than the manifest file
(although you'd miss files that were added during the manifest
creation, but I'd just say Don't Do That).
Is the current code checking each file checksum against the manifest
to see if it needs updating? On my 2TB bagit on a USB disk, that's
going to take a while...
--
Illtud Daniel illtud.daniel@...
Prif Swyddog Technegol Chief Technical Officer
Llyfrgell Genedlaethol Cymru National Library of Wales

Hello-
When executed from the commandline, the update command will check each file checksum. (So, yes, it may take a long time.)
If working directly with the java code, there are options for limiting the files that are checksummed.
Best.
--Justin
________________________________________
From: Illtud Daniel [illtud.daniel@...]
Sent: Friday, August 26, 2011 7:07 AM
To: loc-xferutils-mail@...
Subject: [Loc-xferutils-mail] bag update?
Hi all,
I'm trying to make sense of how the bag update works - it seems to
be taking a looong time on the update we've initiated. Is there
documentation on this command?
I've looked in the code, but my java isn't great, so I'm having
difficulty in following where it's going when invoked with 'update'.
How does it find files to add to the manifest? My (possibly naïve)
approach (and one I've considered adding to the python version of
the libs, although I'm no python programmer either) is to find
all files in data/ with a timestamp newer than the manifest file
(although you'd miss files that were added during the manifest
creation, but I'd just say Don't Do That).
Is the current code checking each file checksum against the manifest
to see if it needs updating? On my 2TB bagit on a USB disk, that's
going to take a while...
--
Illtud Daniel illtud.daniel@...
Prif Swyddog Technegol Chief Technical Officer
Llyfrgell Genedlaethol Cymru National Library of Wales
------------------------------------------------------------------------------
EMC VNX: the world's simplest storage, starting under $10K
The only unified storage solution that offers unified management
Up to 160% more powerful than alternatives and 25% more efficient.
Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev
_______________________________________________
Loc-xferutils-mail mailing list
Loc-xferutils-mail@...
https://lists.sourceforge.net/lists/listinfo/loc-xferutils-mail

Hi Justin, thanks for the reply,
> When executed from the commandline, the update command will check
> each file checksum. (So, yes, it may take a long time.)
Wouldn't a timestamp approach be better (more efficient at least)?
I understand that in some cases you'd want to ensure that the
existing files matched their checksums, and I guess it's
possible for somebody to get a file on there with a timestamp
earlier than the manifest timestamp, but for my use case the
option to do a timestamp-based update would be a big blessing.
> If working directly with the java code, there are options for
> limiting the files that are checksummed.
Is this by passing a file list, or can it do its own selection
based on some properties?
Since
--
Illtud Daniel illtud.daniel@...
Prif Swyddog Technegol Chief Technical Officer
Llyfrgell Genedlaethol Cymru National Library of Wales

Timestamp would definitely be more efficient in some circumstances. If you (or anyone else) wanted to write a Completer implementation that relied on a timestamp, I'd be happy to merge into the code base.
I believe you can limit by a list of files or filepaths -- that leaves the programmer the flexibility to determine how that list is compiled.
Best.
--Justin
________________________________________
From: Illtud Daniel [illtud.daniel@...]
Sent: Tuesday, August 30, 2011 5:25 AM
To: Littman, Justin
Cc: loc-xferutils-mail@...
Subject: Re: [Loc-xferutils-mail] bag update?
Hi Justin, thanks for the reply,
> When executed from the commandline, the update command will check
> each file checksum. (So, yes, it may take a long time.)
Wouldn't a timestamp approach be better (more efficient at least)?
I understand that in some cases you'd want to ensure that the
existing files matched their checksums, and I guess it's
possible for somebody to get a file on there with a timestamp
earlier than the manifest timestamp, but for my use case the
option to do a timestamp-based update would be a big blessing.
> If working directly with the java code, there are options for
> limiting the files that are checksummed.
Is this by passing a file list, or can it do its own selection
based on some properties?
Since
--
Illtud Daniel illtud.daniel@...
Prif Swyddog Technegol Chief Technical Officer
Llyfrgell Genedlaethol Cymru National Library of Wales

On 30/08/11 11:06, Littman, Justin wrote:
> Timestamp would definitely be more efficient in some circumstances.
> If you (or anyone else) wanted to write a Completer implementation
> that relied on a timestamp, I'd be happy to merge into the code
> base.
Thanks. I'll see if I can get one of our java developers to
look at this.
--
Illtud Daniel illtud.daniel@...
Prif Swyddog Technegol Chief Technical Officer
Llyfrgell Genedlaethol Cymru National Library of Wales