In our last post we created two node spark cluster using kubernetes. Once we have defined and created the cluster
we can easily scale up or scale down using kubernetes. This elastic nature of kubernetes makes easy to scale
the infrastructure as and when the demand increases rather than setting up everything upfront.

In this seventh blog of the series, we will discuss how to scale the spark cluster on kubernetes.
You can access all the posts in the series here.

Dynamic Scaling

When we discussed deployment abstraction in our previous blog, we talked about replica factor. In deployment configuration, we can specify the number
of replications we need for a given pod. This number is set to 1 in our current spark worker deployment.

One of the nice thing about deployment abstraction is, we can change replica size dynamically without changing configuration. This
allows us to scale our spark cluster dynamically.

Scale Up

Run below command to scale up workers from 1 to 2.

kubectl scale deployment spark-worker --replicas 2

The above command takes deployment name as parameters and number of replicas.
You can check results using

kubectl get po

When you run the above command, kubernetes creates more pods using template specified in spark-worker. Whenever these
pods come up they automatically connect to spark-master and scales the cluster.

Scale Down

We can not only increase the workers, we can also scale down by setting lower replica numbers.

kubectl scale deployment spark-worker --replicas 1

When above command executes, kubernetes will kill one of the worker to reduce the replica count to 1.

Kubernetes automatically manages all the service related changes. So whenever we scale workers spark will automatically scale.

Multiple Clusters

Till now, we have run single cluster. But sometime we may want to run multiple clusters on same kubernetes cluster. If we try to run
same configurations twice like below

Kubernetes is rejecting the request as the spark-master named deployment is already exist. One of the way to solve this issue is to
duplicate the configurations with different name. But it will be tedious and difficult to maintain.

Better way to solve this issue to use namespace abstraction of kubernetes.

Namespace Abstraction

Kubernetes allows users to create multiple virtual clusters on single physical cluster. These are called as namespaces.

Namespace abstraction is used for allowing multiple users to share the same physical cluster. This abstraction gives scopes for names. This makes us to have same named services in different namespace.

By default our cluster is running in a namespace called default. In next section, we will create another namespace where we can run one more single node cluster.

Creating Namespace

In order to create new cluster, first we need to cluster new namespace. Run below command to create namespace called cluster2.

As you can observe from the result, there are multiple spark-master running in different namespaces.

So using the namespace abstraction of kubernetes we can create multiple spark clusters on same kubernetes cluster.

Conclusion

In this blog we discussed how to scale our clusters using kubernetes deployment abstraction. Also we discussed how to use
namespace abstraction to create multiple clusters.

What’s Next?

Whenever we run services on kubernetes we may want to restrict their resource usage. This allows better infrastructure planning
and monitoring. In next blog, we will discuss about resource management on kubernetes.