hadoop-common-user mailing list archives

Hi ch huang,
It may seem strange, but the fact is,
CorruptBlocks through JMX means "Number of blocks with corrupt replicas". May not be all replicas
are corrupt. This you can check though jconsole for description.
Where as Corrupt blocks through fsck means, blocks with all replicas corrupt(non-recoverable)/
missing.
In your case, may be one of the replica is corrupt, not all replicas of same block. This corrupt
replica will be deleted automatically if one more datanode available in your cluster and block
replicated to that.
Related to replication 10, As Peter Marron said, some of the important files of the mapreduce
job will set the replication of 10, to make it accessible faster and launch map tasks faster.
Anyway, if the job is success these files will be deleted auomatically. I think only in some
cases if the jobs are killed in between these files will remain in hdfs showing underreplicated
blocks.
Thanks and Regards,
Vinayakumar B
From: Peter Marron [mailto:Peter.Marron@trilliumsoftware.com]
Sent: 10 December 2013 14:19
To: user@hadoop.apache.org
Subject: RE: how to handle the corrupt block in HDFS?
Hi,
I am sure that there are others who will answer this better, but anyway.
The default replication level for files in HDFS is 3 and so most files that you
see will have a replication level of 3. However when you run a Map/Reduce
job the system knows in advance that every node will need a copy of
certain files. Specifically the job.xml and the various jars containing
classes that will be needed to run the mappers and reducers. So the
system arranges that some of these files have a higher replication level. This increases
the chances that a copy will be found locally.
By default this higher replication level is 10.
This can seem a little odd on a cluster where you only have, say, 3 nodes.
Because it means that you will almost always have some blocks that are marked
under-replicated. I think that there was some discussion a while back to change
this to make the replication level something like min(10, #number of nodes)
However, as I recall, the general consensus was that this was extra
complexity that wasn't really worth it. If it ain't broke...
Hope that this helps.
Peter Marron
Senior Developer, Research & Development
Office: +44 (0) 118-940-7609 peter.marron@trilliumsoftware.com<mailto:peter.marron@trilliumsoftware.com>
Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK
[cid:image001.png@01CEF5BC.07D01FE0]
[cid:image002.png@01CEF5BC.07D01FE0]<https://www.facebook.com/pages/Trillium-Software/109184815778307>
[cid:image003.png@01CEF5BC.07D01FE0]<https://twitter.com/TrilliumSW>
[cid:image004.png@01CEF5BC.07D01FE0]<http://www.linkedin.com/company/17710>
www.trilliumsoftware.com<http://www.trilliumsoftware.com/>
Be Certain About Your Data. Be Trillium Certain.
From: ch huang [mailto:justlooks@gmail.com]
Sent: 10 December 2013 01:21
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: how to handle the corrupt block in HDFS?
more strange , in my HDFS cluster ,every block has three replicas,but i find some one has
ten replicas ,why?
# sudo -u hdfs hadoop fs -ls /data/hisstage/helen/.staging/job_1385542328307_0915
Found 5 items
-rw-r--r-- 3 helen hadoop 7 2013-11-29 14:01 /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens
-rw-r--r-- 10 helen hadoop 2977839 2013-11-29 14:01 /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar
-rw-r--r-- 10 helen hadoop 3696 2013-11-29 14:01 /data/hisstage/helen/.staging/job_1385542328307_0915/job.split
On Tue, Dec 10, 2013 at 9:15 AM, ch huang <justlooks@gmail.com<mailto:justlooks@gmail.com>>
wrote:
the strange thing is when i use the following command i find 1 corrupt block
# curl -s http://ch11:50070/jmx |grep orrupt
"CorruptBlocks" : 1,
but when i run hdfs fsck / , i get none ,everything seems fine
# sudo -u hdfs hdfs fsck /
........
....................................Status: HEALTHY
Total size: 1479728140875 B (Total open files size: 1677721600 B)
Total dirs: 21298
Total files: 100636 (Files currently being written: 25)
Total blocks (validated): 119788 (avg. block size 12352891 B) (Total open file blocks
(not validated): 37)
Minimally replicated blocks: 119788 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 166 (0.13857816 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0027633
Corrupt blocks: 0
Missing replicas: 831 (0.23049656 %)
Number of data-nodes: 5
Number of racks: 1
FSCK ended at Tue Dec 10 09:14:48 CST 2013 in 3276 milliseconds
The filesystem under path '/' is HEALTHY
On Tue, Dec 10, 2013 at 8:32 AM, ch huang <justlooks@gmail.com<mailto:justlooks@gmail.com>>
wrote:
hi,maillist:
my nagios alert me that there is a corrupt block in HDFS all day,but i do not
know how to remove it,and if the HDFS will handle this automaticlly? and if remove the corrupt
block will cause any data lost?thanks