While I rail against the use of the too vague and cringe-inducing descriptor “workload” with respect to scalability and cloud computing , it is perhaps at least bringing to the fore an important distinction that needs to be made: that of the impact of different compute resource utilization patterns on scalability.

What categorizing workloads has done is to separate “types” of processing and resource needs: some applications require more I/O, some less. Others are CPU hogs while others chew up memory at an alarming rate. Applications have different resource utilization needs across the network, storage and compute spectrum that have a profound impact on their scalability. This leads to models in which some applications scale better horizontally and others vertically. Unfortunately, there are very few “pure” applications that can be dissected down to a simple model in which it is simply a case of providing more “X” as a means to scale. It is more often the case that some portions of the application are more network intensive while others require more compute. Functional partitioning is certainly a better option for scaling out such applications, but is an impractical design methodology during development as the overhead resulting from separation of duties at the functional level requires a more service-oriented approach, one that is not currently aligned with modern web application development practices.

Yet we see the need on a daily basis for hyper-scalability of applications. Applications are being pushed to their resource limits with more users, more devices, more environments in which they must perform without delay. The “one size fits all” scalability model offered by cloud computing providers today is inadequate as a means to scale our rapidly and nearly infinitely, without overrunning budgets. This is because along with resource consumption patterns comes constraints on concurrency. Cloud computing offers an easy button to this problem – auto-scalability. Concurrency demands are easily met, just spin up another instance. While certainly one answer, it can be an expensive one and it’s absolutely an inefficient model of scalability.

HYPER-LOCAL SCALE

The lessons we should learn from cloud computing and hyper-scalability demands is that different functional processing scales, well, differently. The resource consumption patterns of one functional process may differ dramatically from another, and both need to be addressed in order to efficiently scale the application.

If it is impractical to functionally separate “workloads” in the design and development process, then it is necessary to do so during the deployment phase leveraging those identifying contextual clues indicating the type of workload being invoked, i.e. hyper-local scale.

Hyperlocal scalability requires leveraging scalability domains, those functional workload divisions that are similar in nature and require similar resources to scale. Scalability domains allow functional partitioning as a scalability pattern to be applied to an application without requiring function level separation (and all the management, maintenance and deployment headaches that go along with it). Scalability domains are discrete pools of similar processing workloads, partitioned as part of the architecture, that allow specific configuration and architectural techniques to be applied to the underlying network and platform that specifically increase the performance and scalability of those workloads.

This is the notion of hyperlocal scalability: an architectural scaling strategy that leverages scalability domains to isolate similar functional partitions requiring hyper-scale from those partitions that do not. In doing so, highly efficient scalability domains can be used to scale up or out those functional partitions requiring it while allowing other functional partitions to scale at a more nominal rate, incurring therefore less costs. Consider the notion a form of offload, where high resource impact processing is offloaded to another instance, thereby increasing the available resources on the first instance which results in higher concurrency. The offloaded processing can hyper-scale as necessary in a purpose-configured instance at higher efficiency, resulting in better concurrency. Where a traditional scalability pattern – effectively replication – may have required ten instances to meet demand, a hyper-localized scalability pattern may require only six or seven, with the plurality of those serving the high resource consuming processing. Fewer instances results in lower costs whilst simultaneously improving performance and efficiency.

INFRASTRUCTURE REQUIRED

Hyper-localized scalability architectures can leverage a variety of infrastructure scalability patterns, but the fact that they are dependent upon infrastructure and its ability to perform application layer routing and switching is paramount to success.

Today’s demanding business and operational requirements cannot be met with the simple scalability strategies of yesterday. Not only are legacy strategies based on infinite resources and budgets, they are inherently based on legacy application design in which functional partitioning was not only difficult, it was nearly impossible without the aid of methodologies like SOA.

Web applications are uniquely positioned such that they are perfectly suited to partitioning strategies whether at the functional or type or session layers. The contextual data shared by web applications with infrastructure capable of intercepting, inspecting and acting upon that data means more modern, architectural-based scaling strategies can be put into play. Doing so affords organizations the means to achieve higher efficiency and utilization rates, while in turn improving the performance, resiliency and availability of those applications.

These strategies require infrastructure services and an understanding of the resource needs and usage patterns of the application as well as the ability to implement that architecture using infrastructure services and platform optimization.