TODO

We need to access ElasticSearch in a namespace within minikube and the other Pods can’t connect to 9200. It turns out that from the box its limited to localhost and the network.host property needs updating.

Setting network.host in the elasticsearch.yml configuration file on a docker container will put the instance into “Production” mode which will invoke a load of limit checks including, but not limited to the number of threads allocated for a user.

To my knowledge setting ulimits in Docker isn’t trivial so another way to expose ElasticSearch to other pods is required.

The answer appears to be, set http.host: 0.0.0.0 so that its listening on all interfaces. This will allow you to stay as a development instance without all the ulimit issues stopping startup and you can access outside of the Pod.

Regularly when writing a shell script I find that I want to be able to pass an argument into the script but only sometimes. For example if I want the script to output to /tmp folder for the most part but I’d like the opportunity to select the output myself.

Default arguments can be used in scripts using the following simple syntax

Creating the SparkContext

Creating the newAPIHadoopRDD

We have a HBaseConfiguration and a SparkContext so now we can create the newAPIHadoopRDD. The newAPIHadoopRDD needs the config with the table name and namespace and needs to know to use a TableInputFormat for the InputFormat. We’re expecting the class of the keys to be ImmutableBytesWritable and for the values a Result.