Hello, Service Fabric!

In this chapter from Programming Microsoft Azure Service Fabric, Haishi Bai presents what makes a great platform as a service (PaaS) and shares some of the key concepts of Service Fabric in preparation for service development.

This book is about Microsoft Azure Service Fabric, a distributed systems platform that makes it easy to build large-scale, highly available, low-latency, and easily manageable services. Service Fabric brings you the same technology that empowers cloud-scale applications such as Cortana, Skype for Business, and SQL databases so that you can easily design, implement, scale, and manage your own services leveraging the power of next-generation distributed computing.

Before you embark on the journey with Service Fabric, let’s reflect on what makes a great platform as a service (PaaS) and why you need a new PaaS to build the next generation of cloud-based services.

A modern PaaS

A PaaS is designed with agility, scalability, availability, and performance in mind. Microsoft Azure Service Fabric is a PaaS that is built from the ground up to support large-scale, highly available cloud applications.

Designed for agility

The software industry is all about agility. Developers have the privilege to work in a virtual world without physical constraints to drag us down. Innovations can happen at a speed that is unimaginable to other fields. And as a group, we’ve been in a relentless pursuit for speed: from software frameworks to automation tools, from incremental development to Heroku’s 12-factor methodology (Wiggins 2012), from minimum viable product (MVP) to continuous delivery. Agility is the primary goal so that developers can innovate and improve continuously.

Microservices

The essence of Microservices is to decompose complex applications into independent services. Each service is a self-contained, complete functional unit that can be evolved or even reconstructed without necessarily impacting other services.

All software can be abstracted as components and communication routes between them. Monolithic applications are hard to maintain or revise. An overly decomposed system, in contrast, is hard to understand and often comes with unnecessary overhead due to the complex interaction paths across different components. To a great extent, the art of a software architect is to strike a balance between the number of components and the number of communication paths.

A PaaS designed for Microservices encourages separation of concerns, emphasizes loose coupling, and facilitates flexible inter-component communication. While allowing other architectural choices, Service Fabric is designed for and recommends Microservices. A Service Fabric application is made up of a number of services. Each service can be revised, scaled, and managed as an independent component, and you still can manage the entire application as a complete logical unit. The Service Fabric application design is discussed in Chapter 2, “Stateless services,” Chapter 3, “Stateful services,” and Chapter 4, “Actor pattern.” We’ll review several application patterns and scenarios in Part III.

NOTE: Architecture choices

Microservices is strongly recommended but not mandatory. You can choose to use other architectures such as n-tiered architecture, data-centric architecture, and single-tiered web applications or APIs.

Simplicity

A PaaS platform is not just about scheduling resources and hosting applications. It needs to provide practical support to developers to complete the tasks at hand without jumping through hoops.

At a basic level, a PaaS platform helps developers deal with cross-cutting concerns such as logging, monitoring, and transaction processing. Taking it a step further, a PaaS platform provides advanced nonfunctional features such as service discovery, failover, replication, and load balancing. All these nonfunctional requirements are essential to a scalable and available system. And providing built-in constructs to satisfy these requirements leads to a significant productivity boost. Because PaaS takes care of all these troubles, developers can focus on building up core business logic. To achieve this, these nonfunctional features should be available without getting in the way. As you progress through this chapter and book, you’ll see how Service Fabric enables you to focus on business logic and to incorporate these features whenever you need them.

Can you go a step further? What if a PaaS platform provides easy programming models that help you tackle complex problems? And what if the PaaS platform also provides guidance and patterns for typical scenarios? We’ll come back to this in the discussion of different service types in Chapter 2, Chapter 3, and Chapter 4.

Comprehensive application lifetime management

Continuous improvement is at the core of the agile software movement and the Lean movement in various industries. The faster you can iterate through revision cycles, the quicker you can innovate, reduce waste, and create additional value. A mature PaaS platform has to offer comprehensive application lifecycle management (ALM) functionalities to keep the innovation engine running without friction.

Because more companies are adopting continuous delivery, software is being released at a faster pace than in the past. Some companies claim they do hundreds of deployments on a daily basis. This calls for automated testing, continuous integration, rapid deployments, robust version management, and fast rollbacks. Only when a PaaS platform provides all these features can developers and independent software vendors (ISVs) realize such continuous delivery scenarios.

A comprehensive ALM strategy is critical to DevOps. If you look carefully, you’ll see that a lot of so-called friction between development and operations is rooted in discrepancies among different environments. PaaS platforms such as Service Fabric allow applications to be placed in self-contained packages that can be deployed consistently to different environments—such as development, test, QA, and production.

Part II of this book is dedicated to ALM.

Designed for QoS

A successful cloud service is based on a healthy partnership between the service developer and the cloud platform. The service developer brings business know-how and innovation, and the cloud platform brings Quality of Service (QoS) opportunities such as scalability, availability, and reliability.

Scalability

Through innovation, you can do unprecedented things. However, the increasing complexity of problems constantly challenges developers to improve methodologies to maintain momentum. A PaaS platform should be designed with scalability in mind so that applications can be scaled out naturally without much effort from the developers.

Increasing complexity and scale

Increasing complexity can be demonstrated easily with some examples. According to the “NASA Study on Flight Software Complexity” (NASA Office of Chief Engineer, 2009), flight software complexity has been increasing exponentially with a growth rate of a factor of 10 approximately every 10 years. Apollo 8 had about 8,500 lines of code in 1968. In contrast, the International Space Station (ISS) was launched with 1.5 million lines of code in 1989.

Besides software complexity, the sheer volume of data presents a new set of problems. According to Twitter statistics (Company Facts at https://about.twitter.com/company), Twitter is handling 500 million tweets every day. Data ingress, transformation, storage, and analysis at such a scale is an unprecedented challenge. Modern services also need to deal with the potential for rapid growth. Over the past five years or so, Azure Storage has grown into a service that needs to handle 777 trillion transactions per day (Charles Babcock, “Microsoft Azure: More Mature Cloud Platform,” InformationWeek, Sept 30, 2015, http://aka.ms/asf/maturecloud).

On a cloud platform, scaling up, which means increasing the processing power of a single host, is a less preferable approach. Typically, virtual machines are offered with preconfigured sizes. To scale up, you’ll need to migrate your workload to a virtual machine with a bigger size. This is a long and disruptive process because services need to be brought down, migrated, and relaunched on the new machine, causing service interruptions. Furthermore, because there are finite choices of machine sizes, scaling options run out quickly. Although Azure provides a large catalog of virtual machine sizes, including some of the largest virtual machines in the cloud, large-scale workloads still can exceed the processing power of a single machine.

In contrast, scaling out dynamically adjusts system capacity by adding more service instances to share the workload. This kind of scaling is not disruptive because it doesn’t need to shut down existing services. And theoretically, there’s no limit to how much you can scale because you can add as many instances as you need.

When scaling out, there are two fundamental ways to distribute workloads. One way is to distribute the workloads evenly across all available instances. The other way is to partition the workloads among service instances. Service Fabric supports both options, which we’ll discuss in detail in Chapter 7, “Scalability and performance.”

Availability

Availability commonly is achieved by redundancy—when a service fails, a backup service takes over to maintain business continuity. Although the idea sounds simple, difficulties can be found in the details. For example, when a service fails, what happens to its state that it has been maintaining locally? How do you ensure that the replacement service can restore the state and pick up wherever it left off? In a different case, when you apply updates, how do you perform a zero-downtime upgrade? And how do you safely roll back to previous versions if the new version turns out to be broken? The solution to these questions involves many parts such as health monitoring, fault detection, failover, version management, and state replication. Only a carefully designed PaaS can orchestrate these features into a complete and intuitive availability solution. Reliability and availability is the topic of Chapter 6, “Availability and reliability.”

Reliability

Reliability is compromised by system faults. However, in a large-scale, distributed system, monitoring, tracing, and diagnosing problems often are challenging. If a PaaS doesn’t have a robust health subsystem that can monitor, report, and react to possible system-level and application-level problems, detecting and fixing system defects becomes incredibly difficult.

We’ll examine what Service Fabric has to offer in terms of reliability in Chapter 6.

Separation of workload and infrastructure

The cloud era brings new opportunities and new challenges. One advantage of cloud infrastructure as a service (IaaS) is that it shields you from the complexity of physical or virtualized hardware management—and that’s only the starting point. To enjoy the benefits of the cloud fully, you need PaaS to help you forget about infrastructure altogether. After all, for a program to run, all you need are some compute and storage resources such as CPU, memory, and disk space. Do you really need to control which host is providing these resources? Does it really matter if your program stays on the same host throughout its lifetime? Should it make a difference if the program is running on a local server or in the cloud? A modern PaaS such as Service Fabric provides a clear separation of workload and infrastructure. It automatically manages the pool of resources, and it finds and assigns resources required by your applications as needed.

Placement constraints

Sometimes, you do care how components in your application are laid out on a PaaS cluster. For example, if your cluster comprises multiple node types with different capacities, you might want to put certain components on specific nodes. In this case, your application can dictate where PaaS places different components by defining placement constraints. In addition, if you want to minimize the latency between two components that frequently interact with each other, you can suggest that PaaS keep them in close proximity. In some other cases, you might want to distribute the components far apart so that a failing host won’t bring down all the components. We’ll discuss placement constraints later in this book.

Such clear separation of concerns brings several significant benefits. First, it enables workloads to be transferred from host to host as needed. When a host fails, the workloads on the failing host can be migrated quickly to another healthy host, providing fast failovers. Second, it allows higher compute density because independent workloads can be packed into the same host without interfering with one another. Third, as launching and destroying application instances usually is much faster than booting up and shutting down machines, system capacity can be scaled dynamically to adapt to workload changes. Fourth, such separation also allows applications to be architected, developed, and operated without platform lock-in. You can run the same application on-premises or in the cloud, as long as these environments provide the same mechanism to schedule CPU, memory, and disk resources.