Hi,I'm looking into writing a patch for HDFS which will provide a new methodwithin HDFS which can securely delete the contents of a block on all thenodes upon which it exists. By securely delete I mean, overwrite with1's/0's/random data cyclically such that the data could not be recoveredforensically.

I'm not currently aware of any existing code / methods which provide this,so was going to implement this myself.

I figured the DataNode.java was probably the place to start looking intohow this could be done, so I've read the source for this, but it's notreally enlightened me a massive amount.

I'm assuming I need to tell the NameServer that all DataNodes with aparticular block id would be required to be deleted, then as each DataNodecalls home, the DataNode would be instructed to securely delete therelevant block, and it would oblige.

Unfortunately I have no idea where to begin and was looking for somepointers?

I guess specifically I'd like to know:

1. Where the hdfs CLI commands are implemented2. How a DataNode identifies a block / how a NameServer could inform aDataNode to delete a block3. Where the existing "delete" is implemented so I can make sure my securedelete makes use of it after successfully blanking the block contents4. If I've got the right idea about this at all?

- When doing a file deletion, the NameNode turns the file into a set ofblocks that need to be deleted.- When datanodes heartbeat in to the NN (see BPServiceActor#offerService),the NN replies with blocks to be invalidated (see BlockCommand andDatanodeProtocol.DNA_INVALIDATE).- The DN processes these invalidates inBPServiceActor#processCommandFromActive (look for DNA_INVALIDATE again).- The magic lines you're looking for are probably inFsDatasetAsyncDiskService#run, since we delete blocks in the background

> Hi,> I'm looking into writing a patch for HDFS which will provide a new method> within HDFS which can securely delete the contents of a block on all the> nodes upon which it exists. By securely delete I mean, overwrite with> 1's/0's/random data cyclically such that the data could not be recovered> forensically.>> I'm not currently aware of any existing code / methods which provide this,> so was going to implement this myself.>> I figured the DataNode.java was probably the place to start looking into> how this could be done, so I've read the source for this, but it's not> really enlightened me a massive amount.>> I'm assuming I need to tell the NameServer that all DataNodes with a> particular block id would be required to be deleted, then as each DataNode> calls home, the DataNode would be instructed to securely delete the> relevant block, and it would oblige.>> Unfortunately I have no idea where to begin and was looking for some> pointers?>> I guess specifically I'd like to know:>> 1. Where the hdfs CLI commands are implemented> 2. How a DataNode identifies a block / how a NameServer could inform a> DataNode to delete a block> 3. Where the existing "delete" is implemented so I can make sure my secure> delete makes use of it after successfully blanking the block contents> 4. If I've got the right idea about this at all?>> Kind regards,> Matt Fellows>> --> [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]> First Option Software Ltd> Signal House> Jacklyns Lane> Alresford> SO24 9JJ> Tel: +44 (0)1962 738232> Mob: +44 (0)7710 160458> Fax: +44 (0)1962 600112> Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<http://bespokesoftware.com/>>> ______________________________**______________________>> This is confidential, non-binding and not company endorsed - see full> terms at www.fosolutions.co.uk/**emailpolicy.html<http://www.fosolutions.co.uk/emailpolicy.html>>> First Option Software Ltd Registered No. 06340261> Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.> ______________________________**______________________>>

I'd also recommend implementing this in a somewhat pluggable way -- eg aconfiguration for a Deleter class. The default Deleter can be the one weuse today which just removes the file, and you could plug in aSecureDeleter. I'd also see some use cases for a Deleter implementationwhich doesn't actually delete the block, but instead moves it to a localtrash directory which is deleted a day or two later. This sort of policycan help recover data as a last ditch effort if there is some kind ofaccidental deletion and there aren't snapshots in place.

> Hi Matt,>> Here are some code pointers:>> - When doing a file deletion, the NameNode turns the file into a set of> blocks that need to be deleted.> - When datanodes heartbeat in to the NN (see BPServiceActor#offerService),> the NN replies with blocks to be invalidated (see BlockCommand and> DatanodeProtocol.DNA_INVALIDATE).> - The DN processes these invalidates in> BPServiceActor#processCommandFromActive (look for DNA_INVALIDATE again).> - The magic lines you're looking for are probably in> FsDatasetAsyncDiskService#run, since we delete blocks in the background>> Best,> Andrew>>> On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <> [EMAIL PROTECTED]> wrote:>> > Hi,> > I'm looking into writing a patch for HDFS which will provide a new method> > within HDFS which can securely delete the contents of a block on all the> > nodes upon which it exists. By securely delete I mean, overwrite with> > 1's/0's/random data cyclically such that the data could not be recovered> > forensically.> >> > I'm not currently aware of any existing code / methods which provide> this,> > so was going to implement this myself.> >> > I figured the DataNode.java was probably the place to start looking into> > how this could be done, so I've read the source for this, but it's not> > really enlightened me a massive amount.> >> > I'm assuming I need to tell the NameServer that all DataNodes with a> > particular block id would be required to be deleted, then as each> DataNode> > calls home, the DataNode would be instructed to securely delete the> > relevant block, and it would oblige.> >> > Unfortunately I have no idea where to begin and was looking for some> > pointers?> >> > I guess specifically I'd like to know:> >> > 1. Where the hdfs CLI commands are implemented> > 2. How a DataNode identifies a block / how a NameServer could inform a> > DataNode to delete a block> > 3. Where the existing "delete" is implemented so I can make sure my> secure> > delete makes use of it after successfully blanking the block contents> > 4. If I've got the right idea about this at all?> >> > Kind regards,> > Matt Fellows> >> > --> > [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]> > First Option Software Ltd> > Signal House> > Jacklyns Lane> > Alresford> > SO24 9JJ> > Tel: +44 (0)1962 738232> > Mob: +44 (0)7710 160458> > Fax: +44 (0)1962 600112> > Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<> http://bespokesoftware.com/>> >> > ______________________________**______________________> >> > This is confidential, non-binding and not company endorsed - see full> > terms at www.fosolutions.co.uk/**emailpolicy.html<> http://www.fosolutions.co.uk/emailpolicy.html>> >> > First Option Software Ltd Registered No. 06340261> > Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.> > ______________________________**______________________> >> >>

"Journaling filesystems (such as Ext3 or ReiserFS) are now being used bydefault by most Linux distributions. No secure deletion program that doesfilesystem-level calls can sanitize files on such filesystems, becausesensitive data and metadata can be written to the journal, which cannot bereadily accessed. Per-file secure deletion is better implemented in theoperating system."

You might be able to work around this by turning off the journal on thesefilesystems. But even then, you've got issues like the drive remapping badsectors (and leaving around the old ones), flash firmware that is unable toerase less than an erase block, etc.

The simplest solution is probably just to use full-disk encryption. Thenyou don't need any code changes at all.

Doing something like invoking shred on the block files could improvesecurity somewhat, but it's not going to work all the time.

> Hi,> I'm looking into writing a patch for HDFS which will provide a new method> within HDFS which can securely delete the contents of a block on all the> nodes upon which it exists. By securely delete I mean, overwrite with> 1's/0's/random data cyclically such that the data could not be recovered> forensically.>> I'm not currently aware of any existing code / methods which provide this,> so was going to implement this myself.>> I figured the DataNode.java was probably the place to start looking into> how this could be done, so I've read the source for this, but it's not> really enlightened me a massive amount.>> I'm assuming I need to tell the NameServer that all DataNodes with a> particular block id would be required to be deleted, then as each DataNode> calls home, the DataNode would be instructed to securely delete the> relevant block, and it would oblige.>> Unfortunately I have no idea where to begin and was looking for some> pointers?>> I guess specifically I'd like to know:>> 1. Where the hdfs CLI commands are implemented> 2. How a DataNode identifies a block / how a NameServer could inform a> DataNode to delete a block> 3. Where the existing "delete" is implemented so I can make sure my secure> delete makes use of it after successfully blanking the block contents> 4. If I've got the right idea about this at all?>> Kind regards,> Matt Fellows>> --> [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]> First Option Software Ltd> Signal House> Jacklyns Lane> Alresford> SO24 9JJ> Tel: +44 (0)1962 738232> Mob: +44 (0)7710 160458> Fax: +44 (0)1962 600112> Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<http://bespokesoftware.com/>>> ______________________________**______________________>> This is confidential, non-binding and not company endorsed - see full> terms at www.fosolutions.co.uk/**emailpolicy.html<http://www.fosolutions.co.uk/emailpolicy.html>>> First Option Software Ltd Registered No. 06340261> Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.> ______________________________**______________________>>

> > If I've got the right idea about this at all?>> From the man page for wipe(1);>> "Journaling filesystems (such as Ext3 or ReiserFS) are now being used by> default by most Linux distributions. No secure deletion program that does> filesystem-level calls can sanitize files on such filesystems, because> sensitive data and metadata can be written to the journal, which cannot be> readily accessed. Per-file secure deletion is better implemented in the> operating system.">> You might be able to work around this by turning off the journal on these> filesystems. But even then, you've got issues like the drive remapping bad> sectors (and leaving around the old ones), flash firmware that is unable to> erase less than an erase block, etc.>> The simplest solution is probably just to use full-disk encryption. Then> you don't need any code changes at all.>> Doing something like invoking shred on the block files could improve> security somewhat, but it's not going to work all the time.>> Colin>>> On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <> [EMAIL PROTECTED]> wrote:>>> Hi,>> I'm looking into writing a patch for HDFS which will provide a new method>> within HDFS which can securely delete the contents of a block on all the>> nodes upon which it exists. By securely delete I mean, overwrite with>> 1's/0's/random data cyclically such that the data could not be recovered>> forensically.>>>> I'm not currently aware of any existing code / methods which provide>> this, so was going to implement this myself.>>>> I figured the DataNode.java was probably the place to start looking into>> how this could be done, so I've read the source for this, but it's not>> really enlightened me a massive amount.>>>> I'm assuming I need to tell the NameServer that all DataNodes with a>> particular block id would be required to be deleted, then as each DataNode>> calls home, the DataNode would be instructed to securely delete the>> relevant block, and it would oblige.>>>> Unfortunately I have no idea where to begin and was looking for some>> pointers?>>>> I guess specifically I'd like to know:>>>> 1. Where the hdfs CLI commands are implemented>> 2. How a DataNode identifies a block / how a NameServer could inform a>> DataNode to delete a block>> 3. Where the existing "delete" is implemented so I can make sure my>> secure delete makes use of it after successfully blanking the block contents>> 4. If I've got the right idea about this at all?>>>> Kind regards,>> Matt Fellows>>>> -->> [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]>> First Option Software Ltd>> Signal House>> Jacklyns Lane>> Alresford>> SO24 9JJ>> Tel: +44 (0)1962 738232>> Mob: +44 (0)7710 160458>> Fax: +44 (0)1962 600112>> Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<http://bespokesoftware.com/>>>>> ______________________________**______________________>>>> This is confidential, non-binding and not company endorsed - see full>> terms at www.fosolutions.co.uk/**emailpolicy.html<http://www.fosolutions.co.uk/emailpolicy.html>>>>> First Option Software Ltd Registered No. 06340261>> Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.>> ______________________________**______________________>>>>>

Thanks for the heads up, but I think I've managed to implement it crudely by overwriting sequentially with 1s, 0s and random bytes and tested it successfully on an ext4 partition.

I tested it by dd-ing the entire partition to a file, confirming a particular string was not present with strings, uploaded a large file with a chosen string repeated in it many times, dd'd the partition to confirm it was present, issued a delete, repeated the test and confirmed it had been removed.

I'm sure some journal information may be leaked, but the entire block can't be reconstructed from the journal else your disk would be halved in useable size right?

>> If I've got the right idea about this at all?> From the man page for wipe(1);> "Journaling filesystems (such as Ext3 or ReiserFS) are now being used by> default by most Linux distributions. No secure deletion program that does> filesystem-level calls can sanitize files on such filesystems, because> sensitive data and metadata can be written to the journal, which cannot be> readily accessed. Per-file secure deletion is better implemented in the> operating system."> You might be able to work around this by turning off the journal on these> filesystems. But even then, you've got issues like the drive remapping bad> sectors (and leaving around the old ones), flash firmware that is unable to> erase less than an erase block, etc.> The simplest solution is probably just to use full-disk encryption. Then> you don't need any code changes at all.> Doing something like invoking shred on the block files could improve> security somewhat, but it's not going to work all the time.> Colin> On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <> [EMAIL PROTECTED]> wrote:>> Hi,>> I'm looking into writing a patch for HDFS which will provide a new method>> within HDFS which can securely delete the contents of a block on all the>> nodes upon which it exists. By securely delete I mean, overwrite with>> 1's/0's/random data cyclically such that the data could not be recovered>> forensically.>>>> I'm not currently aware of any existing code / methods which provide this,>> so was going to implement this myself.>>>> I figured the DataNode.java was probably the place to start looking into>> how this could be done, so I've read the source for this, but it's not>> really enlightened me a massive amount.>>>> I'm assuming I need to tell the NameServer that all DataNodes with a>> particular block id would be required to be deleted, then as each DataNode>> calls home, the DataNode would be instructed to securely delete the>> relevant block, and it would oblige.>>>> Unfortunately I have no idea where to begin and was looking for some>> pointers?>>>> I guess specifically I'd like to know:>>>> 1. Where the hdfs CLI commands are implemented>> 2. How a DataNode identifies a block / how a NameServer could inform a>> DataNode to delete a block>> 3. Where the existing "delete" is implemented so I can make sure my secure>> delete makes use of it after successfully blanking the block contents>> 4. If I've got the right idea about this at all?>>>> Kind regards,>> Matt Fellows>>>> -->> [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]>> First Option Software Ltd>> Signal House>> Jacklyns Lane>> Alresford>> SO24 9JJ>> Tel: +44 (0)1962 738232>> Mob: +44 (0)7710 160458>> Fax: +44 (0)1962 600112>> Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<http://bespokesoftware.com/>>>>> ______________________________**______________________>>>> This is confidential, non-binding and not company endorsed - see full>> terms at www.fosolutions.co.uk/**emailpolicy.html<http://www.fosolutions.co.uk/emailpolicy.html>>>>> First Option Software Ltd Registered No. 06340261>> Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.____________________________________________________

This is confidential, non-binding and not company endorsed - see full terms at www.fosolutions.co.uk/emailpolicy.html First Option Software Ltd Registered No. 06340261Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.____________________________________________________

+

Matt Fellows 2013-08-20, 22:14

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext