mesos-user mailing list archives

That approach sounds similar to Smartstack (http://nerds.airbnb.com/smartstack-service-discovery-cloud/).
From: Adam Shannon [mailto:adam.shannon@banno.com]
Sent: Wednesday, April 01, 2015 10:58 AM
To: mesos-users
Subject: Re: Current State of Service Discovery
I figured I would comment on how Banno is setting up service discovery with mesos. We've built
everything around docker containers and then a wrapper around that which we call "sidecar"
that handles service discovery, basic process supervision, and hot reloads of the underlying
app config.
Basically sidecar wraps an existing docker image (with FROM) and runs the underlying command
but monitors it. From there sidecar also has a concept of being able to format templates which
are written to the filesystem (in the container). When writing the template sidecar tracks
the ports and configs allocated and used. It uses that information to add watches into zookeeper
(where we store overrides from the default config options per app).
The nicest thing we've found is that we're able to use a large range of ports per mesos slave.
Further, because our port information is stored in zookeeper any other app (and therefore
sidecar) we write can lookup host/port info for a service they need.
From the host/port information in zookeeper we've created a sidecar for haproxy which can
have backends created on it which are tied to app names from services registered in zookeeper.
This allows haproxy to query (and watch) for all instances of apps and proxy them from the
known host/port of haproxy. When changes occur (and thus watches fired from zookeeper) the
haproxy-sidecar instances are able to reload with the updates.
We're still working to get this all fully deployed to production (and then open sourced),
but it seems to combine some of the best features of other public options.
On Wed, Apr 1, 2015 at 9:05 AM, John Omernik <john@omernik.com<mailto:john@omernik.com>>
wrote:
I have been researching service discovery on Mesos quite a bit lately, and due to my background,
may be making assumptions that don't apply to a Mesos Datacenter. I've read through docs,
and I have come up with two main approaches to service discovery, and both appear to have
strengths and weaknesses, and I wanted to describe what I've seen here, as well as the challenges
as I understand them to perhaps have any misconceptions I may have corrected.
Basically, I see two main approaches to the service discovery on Mesos. You have the mesos-dns
(https://github.com/mesosphere/mesos-dns) package with is a DNS based service discovery, and
then you have HAProxy based discovery (which can be represented by both the haproxy-marathon-bridge
(https://github.com/mesosphere/marathon/blob/master/bin/haproxy-marathon-bridge) script and
the Bamboo project (https://github.com/QubitProducts/bamboo)).
HAProxy
With the HAProxy method, as I see it, you basically install HAProxy on every node. The two
above mentioned projects query marathon to determine where the services are running, and then
rewrite the haproxy config on every node to allow basically every node to listen on a specific
port, and from there, that port will be forwarded, via round robin to the actual node/port
combinations where the services running.
So, let's use the example of a Hive Thrift server running in a Docker container on port 10000.
Lets say you have a 5 node cluster, node1, node2, etc. You spin that container up with instances
= 3 in marathon, and Marathon/docker run the container on node2, node3 and another on node2
There is a bridged port to 10000 inside the container, that is tied to an available port
on the physical node. Perhaps one instance on node2 gets 30000 and the other instance gets
30001. node3's instance is tied to port 30001. So now you have 3 instances that exposed
at
node2:30000 -> dockercontainer:10000
node2:30001 -> dockercontianer:10000
node3:30000 -> dockercontainer:10000
With the Haproxy setup, each node would get this in its local haproxy config:
listen hivethrift-10000
bind 0.0.0.0:10000<http://0.0.0.0:10000>
mode tcp
option tcplog
balance leastconn
server hivethrift-3 node2:30000 check
server hivethrift-2 node2:30001 check
server hivethrift-1 node3:30000 check
This would allow you to connect to any node in your cluster, on port 10000 and be served one
of the three containers running your hive thrift server.
Pretty neat? However, there are some challenges here:
1. You now have a total of 65536 ports for your data center. This method is port only, basically
your whole cluster listens on a port and it's dedicated to one service. This actually makes
sense in some ways because if you think of Mesos as a cluster operating system, the limitations
of TCP/UDP are such that each kernel has that many ports. There isn't a cluster TCP or UDP,
just TCP and UDP. That still is a lot of ports, however, you do need to be aware of the limitation
and manage your ports. Especially since that number isn't really the total number of available
ports. There are ports in that 65536 that are reserved for cluster operations, and/or stuff
like hdfs.
2. You are now essentially adding a hop to your traffic that could affect sensitive applications.
At least with the haproxy-marathon-bridge script, the settings for each application is static
from the script (an update here would be to allow timeout settings, and other haproxy options
to be set per application and managed somewhere, and I think that maybe what bamboo may offer,
just haven't dug in yet). So the glaring issue I found was specifically with the hive thrift
service. You connect, you run some queries, all is well. However, if you submit a query,
and it's a long query (longer then the default 50000 ms timeout). There may not be any packets
actually transferred in that time. The client is ok with this, the server is ok with this,
however, haproxy sees no packets in it's timeout period, and decides the connection is dead,
closes it, and then you get problems. I would imagine Thrift isn't the only service that
may have situations like this occur. I need to do more research on how to get around this,
there may be some hope in hive 1.1.0 with thrift keep alives, however, not every application
service will have the option in the pipeline.
Mesos-DNS
This project came to my attention this week, and I am looking to get it installed today to
have hands on time with it. Basically, it's a binary that queries the mesos-master and develops
A records that are hostnames, based on the framework names, and SRV records based on the assigned
ports.
This is where I get confused. I can see the A records being useful, however, you would have
to have your entire network be able to be use the mesos-dns (including non-mesos systems).
Otherwise how would a client know to connect to a .mesos domain name? Perhaps there should
be a way to integrate mesos-dns as the authoritative zone for .mesos in your standard enterprise
DNS servers. This also saves the configuration issues of having to add DNS services to all
the nodes. I need to research DNS a bit more, but couldn't you setup, say in bind, that any
requests in .mesos are forwarded to the mesos-dns service, and then sent through your standard
dns back to the client? Wouldn't this be preferable to setting the .mesos name services as
the first DNS server and then THAT forwards off to your standard enterprise DNS servers?
Another issue I see with DNS is it works well for hostnames, but what about ports. Yes I see
there there SRV records that will return the ports, but how would that even be used? Consider
the hive thrift service example above. We could assume hive thrift would run on port 10000
on all nodes in the cluster, and use the port, but then you run into the same issues as ha
proxy. You can't really specify a port via DNS in a jdbc connection URL can you? How do you
get applications that want to connect to a integer port do a DNS lookup to resolve a port?
Or are we back to you have one cluster, and you get 65536 ports for all the services you could
want on that cluster? Basically hard coding the ports? This then loses flexibility from a
docker port bridging perspective too, in that in my above haproxy example, all the docker
containers would have to expose port 10000 which would have caused a conflict on node2.
Summary
So while I have a nice long email here, it seems I am either missing something critical in
how service discovery could work with a mesos cluster, or there are still some pretty big
difficulties that we need to over come for an enterprise. Haproxy seems cool, and to work
well except for those "long running TCP connections" like thrift. I am at a loss how to handle
that. Mesos DNS is neat too, except for the port conflicts etc that would occur if you used
native ports on nodes, and if you didn't use native ports, (mesos random ports) how do your
applications know which port to connect to (yes it's in the SRV record, however, how do you
make apps aware to look up a DNS record for a port?)
Am I missing something? How are others handling these issues?
--
Adam Shannon | Software Engineer | Banno | Jack Henry
206 6th Ave Suite 1020 | Des Moines, IA 50309 | Cell: 515.867.8337