Hadoop

The application uses mini hdfs and mini mr cluster for test cases.
If you want to use the same for external hdfs location, please change relevant configurations and use accordingly.

Flume

FlumeAgentService to control map search events to both hdfs and ES bases on multiplexing selector approach.
The application uses inbuilt rolling file sink for the EmbeddedAgent. You can also setup and start external flume agent and point the embedded agent to the same.

JSONSerDe

To map the json data to hive queries, custom SerDe is used. Create jar and add to your own hive environment to query data if you use external flume source as configured above.
To create json SerDe jar,

$ jar cf jaihivejsonserde-1.0.jar org/jai/hive/serde/JSONSerDe.class

ElasticSearch

—————

ElasticSearchJsonBodyEventSerializer

Customer ES serializer is used to put data from hadoop to ElasticSearch using hive.
To create ES jsons erializer jar,
$ cd target/classes

Product Search Functionality

ElasicSearch is used to index products data and to be able to filter on the products.
SearchCriteria store different user selection information which can be specific query string, sorting information, pagination information, different facet/filter selection etc.
SearchQueryInstruction to generate json data for customer clicks,

Oozie

—–
Coordinator jobs runs hourly to create hive partitions based on hadoop data.
Bundle job to query top query strings and index to elasticsearch on daily basis.
LocalOozie is used to start oozier server for testing purpose.

Spring Data Hadoop

—–
Spring data is used for hive server management. The bean and context loading support to manage dependent start/shutdown of different servers/services.