Create a HBaseConfiguration object to connect to a HBase server. You need to tell configuration object that where to read the HBase configuration from. to do this add a resource to the HBaseConfiguration object.

// The other approach is to use a foreach loop. Scanners are iterable! for (RowResult result : scanner) { // print out the row we found and the columns we were looking for System.out.println("Found row: " + Bytes.toString(result.getRow()) + " with value: " + result.get(Bytes.toBytes("columnfamily1:column1"))); }

If you do not assign the configurations to conf object (using hadoop xml file) your HDFS operation will be performed on the local file system and not on the HDFS.2. Adding file to HDFS: Create a FileSystem object and use a file stream to add a file.

Wednesday, June 17, 2009

Before you start configure HBase, you need to have a running Hadoop cluster, which will be the storage for hbase. Please refere to Hadoop cluster setup document before continuing.

On the HBaseMaster (master) machine:

1. In file /etc/hosts, define the ip address of the namenode machine and all the datanode machines. Make sure you define the actual ip (eg. 192.168.1.9) and not the localhost ip (eg. 127.0.0.1) for all the machines including the namenode, otherwise the datanodes will not be able to connect to namenode machine).

<configuration> <property> <name>hbase.master</name> <value>hbase-masterserver:60000</value> <description>The host and port that the HBase master runs at. A value of 'local' runs the master and a regionserver in a single process. </description> </property>

<property> <name>hbase.regionserver.class</name> <value>org.apache.hadoop.hbase.ipc.IndexedRegionInterface</value> <description>This configuration is required to enable indexing on hbase and to be able to create secondary indexes </description> </property>

<property> <name>hbase.regionserver.impl</name> <value> org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer </value> <description>This configuration is required to enable indexing on hbase and to be able to create secondary indexes </description> </property> </configuration>

<property> <name>hbase.regionserver.class</name> <value>org.apache.hadoop.hbase.ipc.IndexedRegionInterface</value> <description>This configuration is required to enable indexing on hbase and to be able to create secondary indexes </description> </property>

<property> <name>hbase.regionserver.impl</name> <value> org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer </value> <description>This configuration is required to enable indexing on hbase and to be able to create secondary indexes. </description> </property> </configuration>

Start and Stop hbase daemons:

You need to start/stop the daemons only on the masterserver machine, it will start/stop the daemons in all regionserver machines. Execute the following command to start/stop the hbase.

$HBASE_INSTALL_DIR/bin/start-hbase.sh or $HBASE_INSTALL_DIR/bin/stop-hbase.sh

1. In file /etc/hosts, define the ip address of the namenode machine and all the datanode machines. Make sure you define the actual ip (eg. 192.168.1.9) and not the localhost ip (eg. 127.0.0.1) for all the machines including the namenode, otherwise the datanodes will not be able to connect to namenode machine).

<configuration> <property> <name>dfs.name.dir</name> <value>/opt/hdfs/name</value> <description>Determines where on the local filesystem an DFS name node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. </description> </property>

<property> <name>dfs.data.dir</name> <value>/opt/hdfs/data</value> <description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. </description> </property>

<property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>

<configuration> <property> <name>fs.default.name</name> <value>hdfs://hadoop-namenode:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem. </description> </property>

<configuration> <property> <name>dfs.name.dir</name> <value>/opt/hdfs/name</value> <description>Determines where on the local filesystem an DFS name node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. </description> </property>

<property> <name>dfs.data.dir</name> <value>/opt/hdfs/data</value> <description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. </description> </property>

<property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>

About Me

I am a technology enthusiast with 9+ years of experience in software
development. I have worked on multiple platforms, technologies and
domains in my career. I love to read about
upcoming technologies and trends in various domains. Currently, I am
exploring the world of mobile and web apps and working on pet projects
around html5, no-sql and video streaming.