Monday, September 8, 2014

Apache Mesos - Open Source Datacenter Computing

Apache Mesos is a cluster manager that provides efficient resource
isolation and sharing across distributed applications or frameworks. Mesos is a open-source software originally developed at the University of California at Berkeley. It sits between the application layer and the operating system and makes it easier to deploy and manage applications in large-scale clustered environments more efficiently. It can run many applications on a dynamically shared pool of nodes. Prominent users of Mesos include Twitter, Airbnb, MediaCrossing, Xogito and Categorize.Mesos leverages features of the modern kernel – “cgroups” in Linux, “zones” in Solaris – to provide isolation for CPU, memory, I/O, file system, rack locality, etc. The big idea is to make a large collection of heterogeneous resources. Mesos introduces a distributed two-level scheduling mechanism called resource offers. Mesos decides how many resources to offer each framework, while frameworks decide which resources to accept and which computations to run on them. It is a thin resource sharing layer that enables fine-grained sharing across diverse cluster computing frameworks, by giving frameworks a common interface for accessing cluster resources.The idea is to deploy multiple distributed systems to a shared pool of nodes in order to increase resource utilization. A lot of modern workloads and frameworks can run on Mesos, including Hadoop, Memecached, Ruby on Rails, Storm, JBoss Data Grid,MPI, Spark and Node.js, as well as various Web servers, databases and application servers.

In a similar way that a PC OS manages access to the resources on a
desktop computer, Mesos ensures applications have access to the
resources they need in a cluster. Instead of setting up numerous server clusters for different parts of an
application, Mesos allows you to share a pool of servers that can all
run different parts of your application without them interfering with each other and with the ability to
dynamically allocate resources across the cluster as needed. That means , it could easily switch resources away from framework1 [ big-data analysis ] and
allocate them to framework2 [web server ] , if there is a heavy network. It also reduces a lot of the manual
steps in deploying applications and can shift workloads around
automatically to provide fault tolerance and keep utilization rates high.

Mesos is essentially data center kernel - which means it’s the
software that actually isolates the running workloads from each other . It still needs additional tooling to let
engineers get their workloads running on the system and to manage when
those jobs actually run. Otherwise, some workloads might consume all the
resources, or important workloads might get bumped by less-important
workloads that happen to require more resources.Hence Mesos needs more than just a kernel - Chronos scheduler, a cron replacement for automatically starting and stopping services (and handling failures) that runs on top of Mesos. The other part of the Mesos is Marathon that provides API for starting,
stopping and scaling services (and Chronos could be one of those
services).

Mesos consists of a master process that manages slave daemons running on each cluster node, and frameworks that run tasks on these slaves. The master implements fine-grained sharing across frameworks using resource offers. Each resource offer is a list of free resources on multiple slaves. The master decides how many resources to offer to each framework according to an organizational policy, such as fair sharing or priority. To support a diverse set of inter-framework allocation policies, Mesos lets organizations define their own policies via a pluggable allocation module.

Each framework running on Mesos consists of two components: a scheduler
that registers with the master to be offered resources, and an executor
process that is launched on slave nodes to run the framework’s tasks.
While the master determines how many resources to offer to each
framework, the frameworks’ schedulers select which of the offered
resources to use. When a framework accepts offered resources, it passes
Mesos a description of the tasks it wants to launch on them.

Figure shows an example of how a framework gets scheduled to run tasks. In step (1), slave 1 reports to the master that it has 4 CPUs and 4 GB of memory free. The master then invokes the allocation module, which tells it that framework 1 should be offered all available resources. In step (2), the master sends a resource offer describing these resources to framework 1. In step (3), the framework’s scheduler replies to the master with information about two tasks to run on the slave, using 2 CPUs; 1 GB RAM for the first task, and 1 CPUs; 2 GB RAM for the second task. Finally, in step (4), the master sends the tasks to the slave, which allocates appropriate resources to the framework’s executor, which in turn launches the two tasks (depicted with dotted borders). Because 1 CPU and 1 GB of RAM are still free, the allocation module may now offer them to framework 2. In addition, this resource offer process repeats when tasks finish and new resources become free.

While the thin interface provided by Mesos allows it to scale and allows the frameworks to evolve independently. A framework will reject the offers that do not satisfy its constraints and accept the ones that do. In particular, we have found that a simple policy called delay scheduling, in which frameworks wait for a limited time to acquire nodes storing the input data, yields nearly optimal data locality.Mesos Scheduler APIs :API cosists of two primitives: calls and events which are low-vel,unreliable and one way message passing.

Offer will be made to the scheduler with specific requests (optional).

Master allocates some resources for scheduler shows up as a OFFER

Scheduler can use offered resources to run tasks, once it decides what tasks it might want to run.

Then Master launches task on specified slaves. It might receive OFFER for mor resource allocation. Later , it may get update i.e state of task ( asynchronous in nature).

Task/Executor isolation:To get more control over task management, executors are used. The executor would decide how it actually wants to run the task. Executor can run one or more tasks (like threads of distributed systems). Advantage here is that you could assign multiple tasks to the executor. Interesting point to note here is isolation (i.e allocation of containers for Executor tree with multiple tasks and for individual task) as shown below. The executor's resources change overtime and dynamically adjust the resources per container. In summary , Mesos gives an elasticity on per node basis or across the cluster for containers to grow or shrink dynamically. Applications like Spark would show great performance for the same reason.

Long Running Services: 1)Aurora is a service scheduler that runs on top of Mesos, enabling you to run long-running services that take advantage of Mesos' scalability, fault-tolerance, and resource isolation.

2)Marathon is a private PaaS built on Mesos. It automatically handles hardware or software failures and ensures that an app is “always on”.

6)MPI is a message-passing system designed to function on a wide variety of parallel computers.

7)Spark is a fast and general-purpose cluster computing system which makes parallel jobs easy to write.

8)Storm is a distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.

Batch Scheduling:1)Chronos is a distributed job scheduler that supports complex job topologies. It can be used as a more fault-tolerant replacement for Cron.

2)Jenkins is a continuous integration server. The mesos-jenkins plugin allows it to dynamically launch workers on a Mesos cluster depending on the workload.

3)JobServer is a distributed job scheduler and processor which allows developers to build custom batch processing Tasklets using point and click web UI.

4)Torque is a distributed resource manager providing control over batch jobs and distributed compute nodes.

Data Storage:1)Cassandra is a highly available distributed database. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.

2)ElasticSearch is a distributed search engine. Mesos makes it easy to run and scale.

3)Hypertable is a high performance, scalable, distributed storage and processing system for structured and unstructured data.

Trends such as cloud computing and big data are moving organizations
away from consolidation and into situations where they might have
multiple distributed systems dedicated to specific tasks. With the help of Docker executor for Mesos, Mesos can run and manage Docker containers in conjunction with Chronos and Marathon frameworks. Docker containers provide a consistent, compact and flexible means of packaging application builds. Delivering applications with Docker on Mesos promises a truly elastic, efficient and consistent platform for delivering a range of applications on premises or in the cloud.