20 Notable Difference Between Hadoop 2.x vs Hadoop 3.x.

1. Objective

The objective of this Hadoop tutorial is to provide you a clearer understanding between different Hadoop version. In this blog we have covered top, 20 Difference between Hadoop 2.x vs Hadoop 3.x. This blog covers the difference between Hadoop 2 and Hadoop 3 on the basis of different features.

2. Difference Between Hadoop 2.x vs Hadoop 3.x

Apache Hadoop is an open source software framework for distributed storage & processing of huge amount of data sets.

Hadoop 3.x was introduced to overcome the limitation of Hadoop 2.x. Hadoop 3.x has added some new features, although the old features are still used.

2.14. Support for Microsoft

2.15. Slots/container

Hadoop 2.x- Hadoop 1.x works on the concept of slots while Hadoop 2.X works on the concept of the container.

Hadoop 3.x- Hadoop 3.x also works on the concept of a container.

2.16. Single point of failure

Hadoop 2.x- It has the features to overcome SPOF. So, whenever NameNode fails it recovers automatically.

Hadoop 3.x- It also has the features to overcome SPOF. So, whenever NameNode fails it recovers automatically no need of manual intervention.

2.17. HDFS Federation

Hadoop 2.x- In Hadoop 1.x only single NameNode to manage all Namespace. But Hadoop 2.x has multiple NameNode for multiple Namespace.

Hadoop 3.x- It also has multiple Namenode for multiple namespaces.

2.18. Scalability

Hadoop 2.x- We can scale up to 10000 Nodes per cluster.

Hadoop 3.x- We can scale more than 10000 Nodes per cluster.

2.19. HDFS Snapshot

Hadoop 2.x- It adds the support for a snapshot. It also provides disaster recovery and protection for user error.

Hadoop 3.x- It also support for the snapshot feature.

2.20. Platform

Hadoop 2.x- It serves as a platform for a wide variety of data analytics. It is also possible to run event processing, streaming, and real-time operations.

Hadoop 3.x- It is also possible to run event processing, streaming and real-time operation on the top of YARN.

3. Conclusion

In conclusion, Hadoop 3.0 has added new features like erasure coding to handle fault tolerance. Hadoop 3.x also reduces the storage overhead by 200% to 50%. It also introduced a new command line tool called Disk balancer. Hence, Hadoop 3.x has improved overall performance.

If you find any other difference between Hadoop 2.x vs Hadoop 3.x, so do let us know in the comment section.