Extract hbase cell command reference guide

I am trying to get more information on extracHbaseCells command.How ever I am unable to find it in the morphline reference guide.Can some one please let me know where I can find the documentation on this.The following is the refernce guide Iam looking at.

Re: Extract hbase cell command reference guide

I understood that part.But let us say i extract an xml from the hbase cell with following elements(name,city,country) and I want to index the solr .My solr schema also has fields (name,city and country).Now I need to parse the xml ,ge these fileds and index it to solr.

This would have been possible If was able to retrieve data from hbase in this format.But what extractHbaseCell would give me is an xml file.I am loking for a way to parse this using xquery and then assign then assign values to solr field.

Re: Extract hbase cell command reference guide

You can just specify an extractHBaseCells command followed by an xquery command in the same morphline config file. Each command pipes into the subsequent command, and you can specify as many commands as you like. The links I mentioned contain a (commented out) example for extractHBaseCells followed by readAvroContainer. just uncomment that and replace readAvroContainer with xquery.

Re: Extract hbase cell command reference guide

Yes, you can write a custom morphline command in Java [1] and add the corresponding custom jar that to the classpath, e.g via the HBASE_INDEXER_CLASSPATH environment variable in menu ?Service-Wide/Advanced/Safety Valve? in Cloudera Manager (for Near Real Time Indexing) or via the --libjars CLI option on HBaseMapReduceIndexerTool (for Batch Indexing).

Alternatively, you also write a mini script in Java and paste it into the body of the ?java" morphline command [2].

Re: Extract hbase cell command reference guide

Thanks a lot.I have created the solr cloud and was able to index a sample data(extract the mesagae and put it into one solr field) just to check that my configuration is correct and it works.

How ever when I try to extract data and assign it to solr schema elements it does not work.Extract hbase cell looks like this.Do I need to have _attachment_body" field or an "_attachment_mimetype" field defined in my schema?