Re: [PATCH] Add wipename option to shred

On 06/13/2013 05:13 PM, Joseph D. Wagner wrote:
> On 06/13/2013 8:35 am, Pádraig Brady wrote:
>
>> On 06/13/2013 12:51 AM, Joseph D. Wagner wrote:
>>
>>> ## perchar ##
>>> real 678m33.468s
>>> user 0m9.450s
>>> sys 3m20.001s
>>>
>>> ## once ##
>>> real 151m54.655s
>>> user 0m3.336s
>>> sys 0m32.357s
>>>
>>> ## none ##
>>> real 107m34.307s
>>> user 0m2.637s
>>> sys 0m21.825s
>>>
>>> perchar: 11 hours 18 minutes 33.468 seconds
>>> once: 2 hours 31 minutes 54.655 seconds
>>> * a 346% improvement over perchar
>>> none: 1 hour 47 minutes 34.307 seconds
>>> * a 530% improvement over perchar
>>> * a 41% improvement over once
>>
>> Whoa, so this creates 23s CPU work
>> but waits for 1 hour 47 mins on the sync!
>> What file system and backing device are you using here
>> as a matter of interest?
>
> ext4 data=ordered (default) + 7200 SATA
>
> Just to be clear, the times also include shredding the data part of the files.
>
> For my test I used 16 character file names and 100,000 files each 4k in size,
> which comes to:
> perchar: (1 data fsync + 16 name fsync) * 100,000 files = 1,700,000 fsync
> once: (1 data fsync + 1 name fsync) * 100,000 files = 200,000 fsync
> none: (1 data fsync + 0 name fsync) * 100,000 files = 100,000 fsync
>
> I included the exact script I used to generate those statistics in a previous
> email. Feel free to replicate my experiment on your own equipment, using my
> patched version of shred of course.
>
> Alternatively, if you still have reservations about adopting my patch,
> would you be more open to a --no-wipename option? This would be the
> equivalent of my proposed --wipename=none. It would not imply any
> additional security; to the contrary, it implies less security. Yet, it
> would still give me the optional performance boost I am trying to
> achieve.
Yes these sync latencies really add up.
I timed this simple test script on ext4 on an SSD
and traditional disk in my laptop:
import os
d=os.open(".", os.O_DIRECTORY|os.O_RDONLY)
for i in range(1000):
os.fdatasync(d)
That gave 2ms and 12ms per sync operation respectively.
This seems to be independent of dir size and whether any
changes were made to the dir, which is a bit surprising.
Seems like there could be only sync on change optimizations possible.
Anyway...
So with the extra 1.6M syncs above on spinning rust,
that would add an extra 5.3 hours by my calc.
Your latencies seemed to be nearly double that,
but fair enough, same ball park.
Now we could handle this outside of shred, if we only
wanted to choose between wiping names and simple delete.
Given that the above latencies, the overhead of a couple
of microseconds to start a process per file is insignificant:
find /files | xargs -n1 -I{} sh -c 'shred "{}" && rm "{}"'
But yes this is a bit awkward.
Also if you did want to select wipe, but avoid the explicit syncs,
because you knew your file system had synchronous metadata updates
then we couldn't support that operation with this scheme.
So I'm leaning a bit towards adding control through shred options.
So how about this interface:
-u, --remove[=HOW]
truncate and remove file after overwriting.
HOW indicates how to remove the directory entry:
unlink => just call unlink, wipe => also first obfuscate the name,
wipesync => also sync each obfuscated character to disk (the default)
thanks,
Pádraig.