hadoop-hdfs-issues mailing list archives

[jira] [Updated] (HDFS-2093) 1073: Handle case where an entirely empty log is left during NN crash

Date

Tue, 21 Jun 2011 05:27:47 GMT

[ https://issues.apache.org/jira/browse/HDFS-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HDFS-2093:
------------------------------
Attachment: hdfs-2093.txt
Attached patch considers such logs as corrupt at startup time. Thus in the situation above,
where the only log we have is this corrupted one, it will refuse to let the NN start, with
a nice message explaining that the logs starting at this txid are corrupt with no txns. The
operator can then double-check whether a different storage drive which possibly went missing
might have better logs, etc, before starting NN.
> 1073: Handle case where an entirely empty log is left during NN crash
> ---------------------------------------------------------------------
>
> Key: HDFS-2093
> URL: https://issues.apache.org/jira/browse/HDFS-2093
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: name-node
> Affects Versions: Edit log branch (HDFS-1073)
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2093.txt
>
>
> In fault-testing the HDFS-1073 branch, I saw the following situation:
> - NN has two storage directories, but one is in failed state
> - NN starts to roll edits logs to edits_inprogress_5160285
> - NN then crashes
> - on restart, it detects the truncated log, but since it has 0 txns, it finalizes it
to the nonsense log name edits_5160285-5160284.
> - It then starts logs again at edits_inprogress_5160285.
> - After this point, no checkpoints or future NN startups succeed since there are two
logs starting with the same txid
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira