Topics

Featured in Development

Understandability is the concept that a system should be presented so that an engineer can easily comprehend it. The more understandable a system is, the easier it will be for engineers to change it in a predictable and safe manner. A system is understandable if it meets the following criteria: complete, concise, clear, and organized.

Featured in Architecture & Design

Sonali Sharma and Shriya Arora describe how Netflix solved a complex join of two high-volume event streams using Flink. They also talk about managing out of order events and processing late arriving data, exploring keyed state for maintaining large state, fault tolerance of a stateful application, strategies for failure recovery, data validation batch vs streaming, and more.

Featured in Culture & Methods

Tim Cochran presents research gathered from ThoughtWorks' varied clients and projects, and shows some of the metrics their teams have identified as guides to creating the platform and the culture for high performing teams.

At the AWS re:invent 2017 conference, held in Las Vegas, USA, several new compute and storage features were announced, including: AWS EKS, a fully managed Kubernetes service; AWS Fargate, a service to run containers without managing servers; Amazon Aurora Multi-Master; Amazon Aurora Serverless; DynamoDB Global Tables and on-demand backup; Amazon Neptune, a fully managed graph database; and AWS S3 Select and Glacier Select, allowing SQL-like queries to retrieve only a subset of data stored within objects.

Andy Jassy, CEO of Amazon Web Services (AWS), began the Wednesday AWS re:Invent keynote by stating that the goal of the AWS platform is to provide everything modern builders require, and to enable services, platforms and tooling that can be utilised effectively and securely within an enterprise context.

Continuing on the theme of containers, the next announcement was AWS Fargate - a service to run containers without managing servers or clusters. Fargate is comparable to EC2, but with the instance primitives being containers rather than VMs. An engineer can build a container image, specify the CPU and memory requirements, define networking and IAM policies, and launch the container within a managed cluster. ECS clusters are heterogeneous, and they can contain tasks running within Fargate and on EC2. The service is currently available in US East (Northern Virginia). Jassy stated that AWS plan to support launching containers on Fargate using Amazon EKS in 2018.

The Amazon Aurora RDBMS-as-a-service received two considerable updates, both available in preview only. Multi-Master Aurora allows an engineer to create multiple read/write master instances across multiple Availability Zones. This enables applications to read and write data to multiple database instances within a cluster in a similar fashion to the pre-existing Read Replicas. This new configuration means that it is now possible to execute above the current limit of 200,000 write operations/sec.

Amazon Aurora Serverless is designed for workloads that are highly variable and subject to rapid change, and allows customers to pay for the database resources they use on a second-by-second basis. Engineers create a database endpoint (and set the desired minimum and maximum capacity if required), and billing is based on Aurora Capacity Units (ACUs), each representing a combination of compute power and memory. The AWS blog states that the current plan is to make Aurora Serverless generally available (GA) with MySQL compatibility in the first half of 2018, and to follow up with PostgreSQL compatibility later in the year.

The fully-managed NoSQL Amazon DynamoDB service also received two major upgrades: The first upgrade, Global Tables, allows the creation of tables that are automatically replicated across two or more AWS Regions, with full support for multi-master writes. This service in GA from today. The second announcement, On-Demand Backup, enables the creation of full backups of DynamoDB tables "with a single click", and with zero impact on performance or availability (providing the read and write capacity units are configured correctly). The backup service is GA today, with point-in-time restore predicted to be available early 2018.

Continuing in the domain of data storage, Jassy also announced the launch of a limited preview of Amazon Neptune, a fully-managed graph database service that "makes it easy to gain insights from relationships among your highly connected datasets". The service supports fast-failover, point-in-time recovery, Multi-AZ deployments, and supports up to 15 read replicas to scale query throughput to 100s of thousands of queries per second. Amazon Neptune supports two open standards for describing and querying graphs: Apache TinkerPop3, queried with Gremlin; and Resource Description Framework (RDF), queried with SPARQL.

The launch of Amazon S3 Select and Glacier S3 enables engineers and applications to retrieve only a subset of data from an object by using simple SQL expressions. This supports use cases such as querying the contents of a zipped CSV stored within S3 or Glacier, without having to download and decompress the file. The updated S3 SDK uses a binary wire protocol to achieve the functionality, and accordingly require the addition of a deserialisation library. Glacier Select works just like any other Glacier retrieval job, with the exception that an additional set of parameters can be passed to the initiate job request. S3 select is available in preview, and Glacier Select is generally available in all commercial regions that offer Glacier.

Several additional announcements within the compute and storage domain were made in the run up to the keynote:

Extension of the VPC PrivateLink model, allowing a customer to set up and use VPC Endpoints to access their own services and those made available by others.

Inter-Region VPC Peering now allows peering relationships to be established between Virtual Private Clouds (VPCs) across different AWS regions without requiring gateways, VPN connections or separate network appliances.

Public preview of the i3.metal instance, the first in a series of EC2 instances that allows the operating system to run directly on the underlying "bare metal" hardware, while still providing access to all other AWS services.

Launch of the new H1 instance that is designed specifically for use cases that depend on high-speed, sequential access to multiple terabytes of data, e.g. large MapReduce clusters or using Apache Kafka to process voluminous log files.

A new M5 instance, which delivers 14% better price/performance than the M4 instances on a per-core basis. The instances support Enhanced Networking (delivering up to 25 Gbps when used within a Placement Group) and access to EBS storage is enhanced by the use of NVMe (with some caveats).

Launch of the new T2 Unlimited, which provides the ability to sustain high CPU performance over any desired time frame.

Amazon Time Sync Service, a time synchronization service delivered over Network Time Protocol (NTP) which uses a fleet of redundant satellite-connected and atomic clocks in each region to deliver a highly accurate reference clock

Additional information on the AWS re:Invent product launches and service upgrades can be found on the AWS News Blog.