Category Archives: Programming

As Kubernetes becomes the de facto solution for container orchestration, more and more people expect that it will be the orchestrator of data centers. For example, ZDNet predicted Kubernetes will rule the hyperscale data center in 2018. In a little over four years’ time, the project born from Google seems going to change everything. Tracing back to its root from Google Borg, Kubernetes is nicely designed to run web services. As StatefulSets became stable in 1.9, it is also able to manage stateful applications such as database, message queue, etc. To conquer enterprise data centers, however, there are still several missing pieces.

In the data centers of large corporates (e.g. banks, pharmaceutical, energy companies), there are a variety of workloads such as HPC (high-performance computing), HPA (high-performance analytics), and batch jobs. Compared to them, web services use only a small portion of compute resources. Unfortunately, Kubernetes has been weak to orchestrate these workloads so far.

HPC

There are many kinds of HPC workloads. For simplicity, let’s just consider Monte Carlo simulation here. It is a simple use case but consumes a lot of compute time in many enterprise data centers. A typical Monte Carlo simulation involves millions of tasks with complicated dependency. The scheduling algorithm is generally task driven. Since each task doesn’t run very long (seconds), the low latency of scheduling is critical. In contrast, the median k8s pod start up latency on large cluster could be as long as 25 seconds. Of which, 80% time is for deploying container images. Although one may argue that local cache of docker images should help, quick release of new versions/images is a norm today with agile development. Hiccups or even choking happen frequently therefore.

Even worse, there will be thousand machines that simultaneously request the docker image from the docker registry server when a HPC job starts. The central registry is not only the bottleneck but also may not survive the heavy volume of requests. Instead, a distributed registry solution is a better approach. For example, in NERSC’s Shifter project, docker images are converted to tgz files and transferred to Lustre parallel distributed file system.

HPA

Since Spark 2.3, we can submit spark jobs to Kubernetes. However, the current integration takes a static resource allocation approach. When submitting a job, the user needs to configure the number of executors, which will book the resources from kubernetes across the lifetime of job. Note that a Spark application often run several or many Spark jobs, which are decomposed into stages and tasks for scheduling. Each job and stage generally has different number of tasks and requires different amount of resources. But the user has to allocate the maximum number of executors up front. The static resource allocation approach will certainly waste a lot of CPU time and RAM.

Batch Jobs

Kubernetes’s batch job support is extremely simple, basically run to completion. However, enterprise batch jobs are way more complicated than that. For example, a batch job may execute in parallel across many hundred or even thousands of nodes using a message passing library to synchronize state. It may also require specialized resources like GPUs or require access to limited software licenses. Organizations may enforce policies around what types of resources can be used by whom to ensure projects are adequately resourced and deadlines are met. Therefore the capabilities like array jobs, configurable priority and preemption, user, group or service based quotas and a variety of other features are mandatory. There is a SIG kub-batch working on a batch scheduler for Kubernetes. But the road map and expected GA date are not available yet.

The ever-growing user base and demands have forced the development community to hug asynchronous programming. As the microservices architecture is quickly gaining popularity, we can expect that asynchronous programming will become the new normal. Without blocking, asynchronous programming provides extremely high throughputs. However, programmers also scratch their heads when starting asynchronous programming even with a language with builtin asynchronous processing features such as Node.js and Go. It is okay as we all started programming by old and simple synchronous processing. But it is easy to understand how asynchronous processing works if we go to buy a coffee from Starbucks! Continue reading →

I have been developing a comprehensive machine learning library of advanced algorithms, called SMILE (Statistical Machine Intelligence and Learning Engine), for several years with my spare time. Today I am very pleased to announce that SMILE is now available on GitHub under Apache 2.0 license. SMILE is self contained and requires only the standard Java library. With advanced data structures and learning algorithms, SMILE achieves the state of the art of performance.

Sorry, Taylor Swift’s fans. No beautiful pictures for you. We are talking about Apple’s new programming language for iOS and OS X. Revealed last month, Swift already generates a lot of good buzz. After finishing the “The Swift Programming Language” book, here are some of what I learned. Continue reading →

In all previous posts, we tried out OCaml features in the toplevel. The real world applications of course are divided into multiple source files (.ml files) that can be compiled and linked to byte code or native executables. To compile a file, one can use ocamlc (generating .cmo object file of byte code) Continue reading →

We have learned many interesting features of OCaml in our journey. Today we will do some exercises with them. As you will notice, many examples are recursive, which is very common in functional programming. Computer scientists love recursion because a lot of data structures and algorithms exhibit recursive behavior. In particular, divide and conquer is an important algorithm design paradigm based on multi-branched recursion, which solves a large problem by breaking it up into smaller and smaller pieces until we can solve it immediately for small trivial cases and then combine the results. So if a problem has the following properties: Continue reading →

So far, we have been focusing on the functional features of OCaml, which doesn’t allow the destructive operations (e.g. assignment) of entities in a program. Accordingly, variables, i.e. identifiers referring to immutable values, are used in a mathematical sense. Although some languages such as Haskell prompt purely functional programming, OCaml does allow imperative programming, which we will discuss today. Continue reading →

Object-oriented programmers are familiar with polymorphism. It is one of major capabilities of object-oriented languages besides encapsulation and inheritance. For most C++/Java programmers, polymorphism means dynamic dispatch, i.e. when a method is invoked on an object, the object itself determines what code gets executed by looking up the method at run time in a table associated with the object. Actually there are several fundamentally different kinds of polymorphism, which are all supported in C++. Continue reading →