I implemented AES and Triple DES with CompressionCodec in java cryptographyarchitecture (JCA)The encryption is performed by a client node using Hadoop API.Map tasks read blocks from HDFS and these blocks are decrypted by each maptasks.I tested my implementation with generic HDFS.My cluster consists of 3 nodes (1 master node, 3 worker nodes) and eachmachines have quad core processor (i7-2600) and 4GB memory.A test input is 1TB text file which consists of 32 multiple text files (1text file is 32GB)

I expected that the encryption takes much more time than generic HDFS.The performance does not differ significantly.The decryption step takes about 5-7% more than generic HDFS.The encryption step takes about 20-30% more than generic HDFS because it isimplemented by single thread and executed by 1 client node.So the encryption can get more performance.

May there be any error in my test?

I know there are several implementation for encryting files in HDFS.Are these implementations enough to secure HDFS?

So far I have not found 100% satisfactory solutions to this hardproblem. I've written OSS (Open Secret Server) partly to address thisproblem in Pig, i.e. accessing encrypted data without embedding keyinfo into the job description file. Proper encrypted data handlingimplies striict code review though, as in the case of Pig databags arespillable and you could end up with unencrypted data stored on diskwithout intent.

On Tue, Feb 26, 2013 at 6:33 AM, Seonyeong Bak <[EMAIL PROTECTED]> wrote:> I didn't handle a key distribution problem because I thought that this> problem is more difficult.> I simply hardcode a key into the code.>> A challenge related to security are handled in HADOOP-9331, MAPREDUCE-5025,> and so on.

I am also interested in your research. Can you share some insight about the following questions?1) When you use CompressionCodec, can the encrypted file split? From my understand, there is no encrypt way can make the file decryption individually by block, right? For example, if I have 1G file, encrypted using AES, how do you or can you decrypt the file block by block, instead of just using one mapper to decrypt the whole file?2) In your CompressionCodec implementation, do you use the DecompressorStream or BlockDecompressorStream? If BlockDecompressorStream, can you share some examples? Right now, I have some problems to use BlockDecompressorStream to do the exactly same thing as you did.3) Do you have any plan to share your code, especially if you did use BlockDecompressorStream and make the encryption file decrypted block by block in the hadoop MapReduce job.ThanksYongFrom: [EMAIL PROTECTED]Date: Tue, 26 Feb 2013 14:10:08 +0900Subject: Encryption in HDFSTo: [EMAIL PROTECTED]

Hello, I'm a university student.I implemented AES and Triple DES with CompressionCodec in java cryptography architecture (JCA)The encryption is performed by a client node using Hadoop API.

A test input is 1TB text file which consists of 32 multiple text files (1 text file is 32GB)I expected that the encryption takes much more time than generic HDFS. The performance does not differ significantly.

The decryption step takes about 5-7% more than generic HDFS. The encryption step takes about 20-30% more than generic HDFS because it is implemented by single thread and executed by 1 client node.

So the encryption can get more performance. May there be any error in my test?I know there are several implementation for encryting files in HDFS. Are these implementations enough to secure HDFS?best regards,seonpark* Sorry for my bad english

> I am also interested in your research. Can you share some insight about the following questions?> > 1) When you use CompressionCodec, can the encrypted file split? From my understand, there is no encrypt way can make the file decryption individually by block, right? For example, if I have 1G file, encrypted using AES, how do you or can you decrypt the file block by block, instead of just using one mapper to decrypt the whole file?> 2) In your CompressionCodec implementation, do you use the DecompressorStream or BlockDecompressorStream? If BlockDecompressorStream, can you share some examples? Right now, I have some problems to use BlockDecompressorStream to do the exactly same thing as you did.> 3) Do you have any plan to share your code, especially if you did use BlockDecompressorStream and make the encryption file decrypted block by block in the hadoop MapReduce job.> > Thanks> > Yong> > From: [EMAIL PROTECTED]> Date: Tue, 26 Feb 2013 14:10:08 +0900> Subject: Encryption in HDFS> To: [EMAIL PROTECTED]> > Hello, I'm a university student.> > I implemented AES and Triple DES with CompressionCodec in java cryptography architecture (JCA)> The encryption is performed by a client node using Hadoop API.> Map tasks read blocks from HDFS and these blocks are decrypted by each map tasks.> I tested my implementation with generic HDFS. > My cluster consists of 3 nodes (1 master node, 3 worker nodes) and each machines have quad core processor (i7-2600) and 4GB memory. > A test input is 1TB text file which consists of 32 multiple text files (1 text file is 32GB)> > I expected that the encryption takes much more time than generic HDFS. > The performance does not differ significantly. > The decryption step takes about 5-7% more than generic HDFS. > The encryption step takes about 20-30% more than generic HDFS because it is implemented by single thread and executed by 1 client node. > So the encryption can get more performance. > > May there be any error in my test?> > I know there are several implementation for encryting files in HDFS. > Are these implementations enough to secure HDFS?> > best regards,> > seonpark> > * Sorry for my bad english

1) To my knowledge, there is no way to split the encrypted file in apachehadoop. In hadoop 1.1.X, however, It is possible to decrypt the encryptedfile individually by block, using SplittableCompressionCodec andSplitCompressionInputStream.