I have been working on getting various frameworks working on my MapRCluster that is also running Mesos. Basically, while I know that there is apackage from MapR (for Drill) I am trying to find a way to better separatethe storage layer from the computer layer.

This isn't a dig on MapR, or any of the Hadoop distributions, it's only Iwant flexibility to try things, to have an R&D team working with the datain an environment that can try out new frameworks etc. This combinationhas been very good to me (maybe not to MapR support who received lots ofquirky questions from me. They have been helpful in furthering myunderstanding of this space!)

My next project I wanted to play with was Drill. I foundhttps://github.com/mhausenblas/dromedar (Thanks Michael!) as a basic startto a Drill on Mesos approach. I read through the code, I understand it, butI wanted to see it at a more basic level.

So I just figured out how to run Drill bits in Marathon (manually fornow). Basically, for anyone wanting to play along at home, This actuallyworks VERY well. I used MapR FS to host my package from Drill, I set aconf directory. (Multiple conf directories actually, I set it up so Icould launch different "sized" drillbits). I have been able to get thingsrunning, and be performant on my small test cluster.

For those who may be interested here are some of my notes.

- I compiled Drill 1.2.0-SNAPSHOT from source. I ran into some compilingissues that Jacques was able to help me through. Basically, Java 1.8 isn'tsupport for building yet (fails some tests) but there is a work around tothat.

- I took the built package and placed it in MapR FS. Now, I have everynode mounting MapRFS to same NFS location. I could be using a hdfs(maprfs) based tarball but I haven't done that yet. I am just playingaround and the NFS mounting of MapRFS sure is handy in this regard.

- At first I created a single sized Drill bit, the Marathon JSON is likethis:

}So I can walk you through this. The first is the command obviously. Iuse runbit instead of drillbit.sh start because I want this process to stayrunning (from Marathon's perspective). If I used the drillbit.sh, it usesnohup and backgrounds it, Mesos/Marathon thinks it died and tries to startanother.

cpus: obvious, maybe a bit small, but I have a small cluster.

mem: When I set mem to 6144 (6GB) in my drill-env.sh, I set max directmemory to 6GB and max heap to 3GB. I wasn't sure if I needed to set mymarathon memory to 9GB or if the heap was used inside the direct memory. Icould use some pointers here.

id: This is the id of my cluster in the drill-overides.conf. I did this soHA proxy would let me connect to the cluster via drillpot.marathon.mesosand it worked pretty well!

instances: I started with one, but could scale up with marathon

constrains; I only wanted one drill bit per node because of portconflicts. If I want to be multi tenant and have more than one drill bitper node, I would need to figure out how to abstract the ports. This issomething that I could potentially do in a frame work for Mesos. But at thesame time, I wonder if if when a drill bit registers with a cluster, itcould just "report" it ports in the zookeeper information.. This isintriguing because if it did this, we could allow it to pull random portsoffered to it from Mesos, registers the information, and away we go. Itwould be intriguing.Once I posted this to marathon, all was good, bits started, queries werehad by all! It worked well. Some challenges:1. Ports (as mentioned above) I am not managing those, so port conflictscould occur.

2. I should use a tarball for Marathon, this would allow drill to work onMesos without the MapR requirement.

3. Logging. I have the default logback.xml in the conf directory and I amgetting file not found issues in my stderr on the Mesos tasks. This isn'tkill drill, and it still works, but I should organize my logging better.Hopeful for the future:

1. It would be neat to have a frame work that did the actual running of thebits. Perhaps something that could scale up and down based on query usage.I played around with some smaller drillbits (similar to how myriad definesprofiles) so I could have a drill cluster of 2 large bits, and 2 small bitson my 5 node cluster. That worked, but lots of manual work. A frameworkwould be handy for managing that.

2. Other?I know this isn't a production thing, but I could see being able to go fromthis to something a subset of production users could use in MapR/Mesos (orjust Mesos) I just wanted to share some of my thought processes and showa way that various tools can integrate. Always happy to talk to shop withfolks on this stuff if anyone has any questions.John

Great write up and information! Will be interesting to see how this evolves.

A quick note, memory allocation is additive so you have to allocate for direct plus heap memory. Drill uses direct memory for data structures/operations and this is the one that will grow with larger data sets, etc.

I played with that, and the performance I was getting in Docker was abouthalf that I was getting native. I think that for me, that was occurringbecause if I ran it in Docker, I needed to install the MapR Client in thecontainer too, whereas when I run it in marathon, it's using the node'saccess to the disk. I am comfortable in places where performance stufflike this occurs, to not docker all the things, and allow for the tar ballmethod. Perhaps Mesos could find a way to cache locally? (Note, puttingit in MapR FS still has it load pretty quick)

The nice thing about the approach you are taking and adding a dockerdeployment with something like Drill is that you really don't care wherethose docker instance land in your cluster because you can build yourconfiguration into your docker image and you are off and running and shouldhave no problem dynamically spinning up a few more instances whenever youwant. Should hopefully simplify administration.

Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext