Archive for the ‘Uncategorized’ Category

Last few months I had worked on Cloud based Cluster Manager using Docker Swarm to set up the new customer instance for my project and the experience was wonderful as it gave me deep understanding on how cloud based cluster solutions are working compared to a old school application deployment/setup model. Our solution is mainly to replace the another popular cluster management solution using DMM – Docker Marathon & Mesos.

In DMM, Marathon-lb + Marathon + Mesos takes care of connecting customer requests to their respective services running on the cluster. But this is more involved effort and Docker swarm provides these capabilities out of the box & with simple commands. Some of technologies that we are using …

Docker Swarm => Swarm mode on docker engine help us to natively manage the cluster.

For building new architecture I had to learn various commands related to docker/docker swarm, HDFS, Spark & Linux (Thanks to our great Chief Architect for his vision/inputs). We had built python based provisioning service to create customer specific instance which involves setting up many of the swarm services …

Since Docker swarm is new technology and we had ran into lots of issues due to the docker version + Linux OS/version that we are using. Our journey on debugging/fixing issues …

Started on Centos 7.0 + Docker 1.23 => got into lot of docker swarm service connectivity and other weird issues

upgraded Centos OS + Docker to latest version => still had the issues

Upgraded OS to Ubuntu 14.04 + Docker to 17.03 => still had service connectivity issues in docker swarm

Upgraded Docker to 17.05 => Issues came down but we still noticed few connectivity issues. Posted the issue with docker team – https://github.com/moby/moby/issues/32830. Lately we came to know that there is some race condition which has been fixed.

Upgraded OS to latest Ubuntu 16.04 with latest kernel => Yet to apply 17.06 release to see if connectivity issues are completely gone or not. For now we check the connectivity health by using scripts.

I have been working in SAAS solutions for many years now and it is interesting to see how technologies are evolving & applications / web APIs are connecting to each other to get seamless & quick end to end integration between systems & organizations.

Webhook – A lightweight HTTP pattern providing a simple pub/sub model for wiring together Web APIs and SaaS services. It’s been an year that this new concept been introduced in my product (thanks to our great Architect) and I see lot of benefits with it.

Now our product can send notifications to the subscribed customers for the predefined events that happens in the product.

It’s also used to show the changes between two major statuses in our workflow system

We also thought about using it to build reporting data warehouse instead of building warehouse using the typical approach of ETL/batch jobs.

So there may be many such usecases that one can think about to see whether this can fit. The first use case i.e. sending notification with data on an event is a very natural usecase that many products can use this. More examples …

We are using zookeeper as a central repository for our data/configuration and here are few interesting options/utilities (docker/java versions) that one may need for browsing, copying, backing it up & migrating.

There can be many UI / API / DB level design patterns to follow but when / what to choose need to be carefully decided and following 5 important principals helped to decide on some of the recent decisions … (Thanks to our Chief Architect for providing these guidelines …)

Here are the examples of how above principles are helped us to decide between PUSH Vs PULL & Filtering out the data or Keeping the original msg …

Our team had created multiple microservices recently for achieving a big org level new initiative. For one of the end to end flow i.e. creating a new app instance for a trail user involves 3 microservices which follows #1, #4 from above principals.

External Service – follows the event lifecycle model where it triggers the subscribed REST endpoint with user request msg when a new user request comes. Here event lifecycle can be synchronous/asynchronous and the service tries multiple times in case when REST endpoint is down. The service also expects REST endpoint to complete the event lifecycle.

REST interface service – gets the msg from external service and pushes it into ZK.

Follows #1 so that the “Original message” been “PUSHed” to ZK so that it avoids unnecessary parsing/transformations on the msg, operation will be quick & future changes on the msg can be directly synched instead of parsing/editing the zk msg.

Here #4 is not followed since the service been triggered by external service which intern pushes the original message to ZK and it wait for the core service to complete the provisioning and core service updates the status to ZK. Once it gets the provisioning status by following #4 i.e. PULL model it triggers the external service to complete the event life cycle.

Core service which I owned creates application instances in Docker Swarm for the trail customers based on data provided in Zookeeper (ZK)

Zookeeper out of box provides watch capability where we can watch on a node which triggers an event when changes happen and based on that the core service can take care the provisioning, so it is PUSH model. But here based on #4, we followed the PULL model i.e. pulling ZK data in period interval to take care provisioning requests and it really helped us from single point failure with ZK (i.e. ZK connection failure or missing some of the events due to any exceptions).

We have recently faced lot of issues with our spark based APP running in Docker SWARM due to heavy Minor/Major GC calls (STW) and following configuration helped to minimize it. Configuration is specific to application and it can’t be reused as is but it can be used as basis to start & try different values to arrive at good numbers for your app. We have tried atleast 10 to 15 different combinations before we arrive at below entries. In our case we use SPARK to run in standalone mode with 4 workers and 4 cores per worker, 16GB driver memory, 16 GB executor memory + max 16 cores & parallelism of 16.

Btw, the above entries can also be given as part of HDFS configuration for HADOOP_JOBTRACKER_OPTS, SHARED_HADOOP_NAMENODE_OPTS & HADOOP_DATANODE_OPTS options so that even HDFS can start using G1 GC instead of default -XX:+UseConcMarkSweepGC.

Heap region => size decides based on amount of the Heap size and JVM plans around 2000 regions ranging from 1 MB to 32 MB size

10% – default reserved value for safety to avoid promotion failures

Tenuring Threshold is used by JVM to decide when an object can be promoted from young generations to Old generation (MaxTenuringThreshold=n, default 15, -XX:+PrintTenuringDistribution – prints age distribution)

Live objects are evacuated (i.e., copied or moved) to one or more survivor regions. If the aging threshold is met, some of the objects are promoted to old generation region

6 collection phase for old generation =>

Initial Mark survivor regions or root regions (STW)

Root Region Scanning => Scan survivor regions for old gen references – need to complete before young GC can occur

Concurrent marking (find live objects across heap) – parallel to application but it can be interrupted by young GC

Copying (STW) – copy to new unused regions. This can be done with young generation regions which are logged as [GC pause (young)]. Or both young and old generation regions which are logged as [GC Pause (mixed)]

do not explicitly set young generation size –Xmn which will impact G1 GC collector’s pause time goal and with explicit value G1 GC can’t auto adjust the size as needed => i.e.

Evacuation Failure => when JVM runs out of heap regions during GC for either survivor or promoted objects

Tuning =>

-XX:NewSize and -XX:MaxNewSize => set a lower and upper bound for the size of the young generation. Young generation can’t be bigger than old since at some point if all young gen obj may beed to be moved to old gen.

-XX:NewRatio => allows us to specify the factor by which the old generation should be larger than the young generation. For example, with -XX:NewRatio=3 the old generation will be three times as large as the young generation

-XX:MaxGCPauseMillis=200 => Sets a target for the maximum GC pause time. This is a soft goal, and the JVM will make its best effort to achieve it. Therefore, the pause time goal will sometimes not be met. The default value is 200 milliseconds

-XX:SurvivorRatio specifies how large “Eden” should be sized relative to one of the two survivor spaces. For example, with -XX:SurvivorRatio=10 we dimension “Eden” ten times as large as “To” (and at the same time ten times as large as “From”). As a result, “Eden” occupies 10/12 of the young generation while “To” and “From” each occupy 1/12. Note that the two survivor spaces are always equal in size.

-XX:InitialTenuringThreshold and -XX:MaxTenuringThreshold, -XX:+PrintTenuringDistribution, -XX:TargetSurvivorRatio =>initial and maximum value of the tenuring threshold, respectively + specify the target utilization (in percent) of “To” at the end of a young generation GC

-XX:+NeverTenure and -XX:+AlwaysTenure => objects are never promoted to the old generation i.e. old generation not needed / no survivor spaces are used so that all young objects are immediately promoted to the old generation on their first GC