HBase is a type of "NoSQL" database. "NoSQL" is a general term meaning that the database isn't an RDBMS which supports SQL as its primary access language.

• When we should think of using it •
HBase isn't suitable for every problem. We should have lot of data, if data is less RDBMS is better.

• Difference Between HDFS and HBase •
HDFS is a distributed file system that is well suited for the storage of large files. It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files. HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables.

THINK ON THIS

• Facebook, for example, is adding more than 15 TB, and processing daily • Google adding Peta-Bytes of data and processing. • Companies storing Logs, temperature details, and many other prospective
to store and process, which come in Peta-byte for which conventional technologies will days to read the data forget about processing it.

WHAT IS COLUMNS ORIENTED MEANS
instead is based on the assumption

• Grouped by columns, • The reason to store values on a per-column basis • that, for specific queries, not all of the values are
needed.

• Reduced I/O

COMPONENTS

HMASTER
• Master
server is responsible for monitoring all RegionServer instances in the cluster, and is the interface for all metadata changes, it runs on the server which hosts namenode.

• Master controls critical functions such as
RegionServer failover and completing region splits. So while the cluster can still run for a time without the Master, the Master should be restarted as soon as

ZOOKEEP ER

• Zookeeper is an open source software
providing a highly reliable, distributed coordination service

• Entry point for an HBase system
• It includes tracking of region servers,
where the root region is hosted

API
• Interface to HBase • Using these we can we can access HBase and perform
read/write and other operation on HBase.