Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

4.
4/27/13Hadoop Golden EraHadoop Golden EraOne of the most exciting open-source projects on the planetOne of the most exciting open-source projects on the planetBecomes the standard for large-scale processingBecomes the standard for large-scale processingSuccessfully deployed by hundreds/thousands of companiesSuccessfully deployed by hundreds/thousands of companiesLike a salutary virusLike a salutary virus... But it has some little drawbacks... But it has some little drawbacks

5.
4/27/13HDFS Main LimitationsHDFS Main LimitationsSingle NameNode thatSingle NameNode thatkeeps all metadata in RAMkeeps all metadata in RAMperforms all metadata operationsperforms all metadata operationsbecomes a single point of failure (SPOF)becomes a single point of failure (SPOF)

10.
4/27/13HDFS DataNode (DN)Stores and retrieves blocksA block is stored as a regular files on local diske.g. blk_-992391354910561645A block itself does not know which file it belongs toSends the block report to NN periodicallySends the heartbeat signals to NN to say that it is alive

15.
4/27/13Multiple independent namespacesNameNodes does nottalk each otherDatanode can store blocksmanaged by any NameNodeNameNode manages onlyslice of namespace

16.
4/27/13The client-view of the clusterThere are multiple namespaces (and corresponding NNs)Client can use any of them to compose its own view of HDFSIdea is much like the Linux /etc/fstab fileMapping between paths and NameNodes

19.
4/27/13Deploying HDFS FederationDeploying HDFS FederationDeploying HDFS FederationUseful for small (+isolation) and large (+scalability) clusterUseful for small (+isolation) and large (+scalability) clusterHowever not so many clusters use itHowever not so many clusters use itNameNodes can be added/removed at any timeNameNodes can be added/removed at any timeCluster does have to be restartedCluster does have to be restartedAdd Federation to the existing clusterAdd Federation to the existing clusterDo not format existing NNDo not format existing NNNewly added NN will be “empty”Newly added NN will be “empty”

20.
4/27/13Motivation for HDFS High AvailabilitySingle NameNode is a single point of failure (SPOF)Single NameNode is a single point of failure (SPOF)Secondary NN and HDFS Federation do not change thatSecondary NN and HDFS Federation do not change thatRecovery from failed NN may take even tens of minutesRecovery from failed NN may take even tens of minutesTwo types of downtimesTwo types of downtimesUnexpected failure (infrequent)Unexpected failure (infrequent)Planned maintenance (common)Planned maintenance (common)

21.
4/27/13HDFS High AvailabilityIntroduces a pair of redundant NameNodesOne Active and one StandbyIf the Active NameNode crashes or is intentionally stopped,Then the Standby takes over quickly

22.
4/27/13NameNodes resposibilitiesNameNodes resposibilitiesThe Active NameNode is responsible for all client operationsThe Active NameNode is responsible for all client operationsThe Standby NameNode is watchfulThe Standby NameNode is watchfulMaintains enough state to provide a fast failoverMaintains enough state to provide a fast failoverDoes also checkpointing (disappearance of Secondary NN)Does also checkpointing (disappearance of Secondary NN)

23.
4/27/13Synchronizing the state of metadataThe Standby must keep its state as up-to-date as possibleEdit logs - two alternative ways of sharing themShared storage using NFSQuorum-based storage (recommended)Block locationsDataNodes send block reports to both NameNodes

28.
4/27/13The split brain scenarioA potential scenario when two NameNodesBoth think they are activeBoth make conflicting changes to the namespaceExample:The ZKFC crashes, but its NN is still running

29.
4/27/13FencingFencingEnsure that the previous Active NameNode is no longer able toEnsure that the previous Active NameNode is no longer able tomake any changes to the system metadatamake any changes to the system metadata

30.
4/27/13Fencing with Quorum-based storageOnly a single NN can be a writer to JournalNodes at a timeWhenever NN becomes active, it generated an “epochnumber”NN that has a higher “epoch number” can be a writerThis prevents from corrupting the file system metadata

31.
4/27/13“Read” requests fencingPrevious active NN will be fenced when tries to write to the JNUntil it happens, it may still serve HDFS read requestsTry to fence this NN before that happensThis prevents from reading outdated metadataMight be done automatically by ZKFC or hdfs haadmin

32.
4/27/13HDFS Highly-Available Federated ClusterAny combination is possibleFederation without HAHA without FederationHA with Federatione.g. NameNode HA for HBase, but not for the others

33.
4/27/13“Classical” MapReduce LimitationsLimited scalabilityPoor resource utilizationLack of support for the alternative frameworksLack of wire-compatible protocols

48.
4/27/13NodeManager ContainersNodeManager creates a container for each taskContainer contains variable resources sizese.g. 2GB RAM, 1 CPU, 1 diskNumber of created containers is limitedBy total sum of resources available on NodeManager

51.
4/27/13UberizationUberizationRunning the small jobs in the same JVM as AplicationMasterRunning the small jobs in the same JVM as AplicationMastermapreduce.job.ubertask.enablemapreduce.job.ubertask.enable true (false is default)true (false is default)mapreduce.job.ubertask.maxmapsmapreduce.job.ubertask.maxmaps 9*9*mapreduce.job.ubertask.maxreducesmapreduce.job.ubertask.maxreduces 1*1*mapreduce.job.ubertask.maxbytesmapreduce.job.ubertask.maxbytes dfs.block.size*dfs.block.size** Users may override these values, but only downward* Users may override these values, but only downward

53.
4/27/13Fault-TolleranceFailure of the running tasks or NodeManagers is similar as in MRv1Applications can be retried several timesyarn.resourcemanager.am.max-retries 1yarn.app.mapreduce.am.job.recovery.enable falseResource Manager can start Application Master in a new containeryarn.app.mapreduce.am.job.recovery.enable false

55.
4/27/13Wire-Compatible ProtocolWire-Compatible ProtocolClient and cluster may use different versionsMore manageable upgradesRolling upgrades without disrupting the serviceActive and standby NameNodes upgraded independentlyProtocol buffers chosen for serialization (instead of Writables)

56.
4/27/13YARN MaturityYARN MaturityStill considered as both production and not production-readyStill considered as both production and not production-readyThe code is already promoted to the trunkThe code is already promoted to the trunkYahoo! runs it on 2000 and 6000-node clustersYahoo! runs it on 2000 and 6000-node clustersUsing Apache Hadoop 2.0.2 (Alpha)Using Apache Hadoop 2.0.2 (Alpha)There is no production support for YARN yetThere is no production support for YARN yetAnyway, it will replace MRv1 sooner than laterAnyway, it will replace MRv1 sooner than later

63.
4/27/13It Might Be A DemoBut what about runningBut what about runningproduction applicationsproduction applicationson the real clusteron the real clusterconsisting of hundreds of nodes?consisting of hundreds of nodes?Join Spotify!Join Spotify!jobs@spotify.comjobs@spotify.com