Today I’m back with an update on the Pi Plug I made a while back. This prototype is still in the works, and is much more modular than the previous version. https://N-O-D-E.net/piplug2.html https://github.com/N-O-D-E/piplug —————- Shop: http://N-O-D-E.net/shop/ Patreon: http://patreon.com/N_O_D_E_ BTC: 17HqC7ZzmpE7E8Liuyb5WRbpwswBUgKRGZ Newsletter: http://eepurl.com/ceA-nL Music: https://archive.org/details/Fwawn-FromManToGod

The Pi Zero Power Case

In a video early last year, YouTuber N-O-D-E revealed his Pi Zero Power Case, an all-in-one always-on networked computer that fits snugly against a wall power socket.

The project uses an official Raspberry Pi power supply, a Zero4U USB hub, and a Raspberry Pi Zero W, and it allows completely wireless connection to a network. N-O-D-E cut the power cord and soldered its wires directly to the power input of the USB hub. The hub powers the Zero via pogo pins that connect directly to the test pads beneath.

The Power Case is a neat project, but it may be a little daunting for anyone not keen on cutting and soldering the power supply wires.

Pi Plug 2

In his overhaul of the design, N-O-D-E has created a modular reimagining of the previous always-on networked computer that fits more streamlined to the wall socket and requires absolutely no soldering or hacking of physical hardware.

The Pi Plug 2 uses a USB power supply alongside two custom PCBs and a Zero W. While one PCB houses a USB connector that slots directly into the power supply, two blobs of solder on the second PCB press against the test pads beneath the Zero W. When connected, the PCBs run power directly from the wall socket to the Raspberry Pi Zero W. Neat!

While N-O-D-E isn’t currently selling these PCBs in his online store, all files are available on GitHub, so have a look if you want to recreate the Pi Plug.

Besides simply SSH’ing into the Pi, you could also easily install a remote desktop client and use the GUI. You can share your computer’s internet connection with the Pi and use it just like you would normally, but now without the need for a monitor, chargers, adapters, cables, or peripherals.

We’re keen to see how our community is hacking their Zeros and Zero Ws in order to take full advantage of the small footprint of the computer, so be sure to share your projects and ideas with us, either in the comments below or via social media.

One of the challenges faced by our customers—especially those in highly regulated industries—is balancing the need for security with flexibility. In this post, we cover how to enable multi-tenancy and increase security by using EMRFS (EMR File System) authorization, the Amazon S3 storage-level authorization on Amazon EMR.

Amazon EMR is an easy, fast, and scalable analytics platform enabling large-scale data processing. EMRFS authorization provides Amazon S3 storage-level authorization by configuring EMRFS with multiple IAM roles. With this functionality enabled, different users and groups can share the same cluster and assume their own IAM roles respectively.

Simply put, on Amazon EMR, we can now have an Amazon EC2 role per user assumed at run time instead of one general EC2 role at the cluster level. When the user is trying to access Amazon S3 resources, Amazon EMR evaluates against a predefined mappings list in EMRFS configurations and picks up the right role for the user.

In this post, we will discuss what EMRFS authorization is (Amazon S3 storage-level access control) and show how to configure the role mappings with detailed examples. You will then have the desired permissions in a multi-tenant environment. We also demo Amazon S3 access from HDFS command line, Apache Hive on Hue, and Apache Spark.

EMRFS authorization for Amazon S3

There are two prerequisites for using this feature:

Users must be authenticated, because EMRFS needs to map the current user/group/prefix to a predefined user/group/prefix. There are several authentication options. In this post, we launch a Kerberos-enabled cluster that manages the Key Distribution Center (KDC) on the master node, and enable a one-way trust from the KDC to a Microsoft Active Directory domain.

The application must support accessing Amazon S3 via Applications that have their own S3FileSystem APIs (for example, Presto) are not supported at this time.

EMRFS supports three types of mapping entries: user, group, and Amazon S3 prefix. Let’s use an example to show how this works.

Assume that you have the following three identities in your organization, and they are defined in the Active Directory:

To enable all these groups and users to share the EMR cluster, you need to define the following IAM roles:

In this case, you create a separate Amazon EC2 role that doesn’t give any permission to Amazon S3. Let’s call the role the base role (the EC2 role attached to the EMR cluster), which in this example is named EMR_EC2_RestrictedRole. Then, you define all the Amazon S3 permissions for each specific user or group in their own roles. The restricted role serves as the fallback role when the user doesn’t belong to any user/group, nor does the user try to access any listed Amazon S3 prefixes defined on the list.

Important: For all other roles, like emrfs_auth_group_role_data_eng, you need to add the base role (EMR_EC2_RestrictedRole) as the trusted entity so that it can assume other roles. See the following example:

This role grants all Amazon S3 permissions to the emrfs-auth-data-science-bucket-demo bucket and all the objects in it. Similarly, the policy for the role emrfs_auth_group_role_data_eng is shown below:

Example role mappings configuration

To configure EMRFS authorization, you use EMR security configuration. Here is the configuration we use in this post

Consider the following scenario.

First, the admin user admin1 tries to log in and run a command to access Amazon S3 data through EMRFS. The first role emrfs_auth_user_role_admin_user on the mapping list, which is a user role, is mapped and picked up. Then admin1 has access to the Amazon S3 locations that are defined in this role.

Then a user from the data engineer group (grp_data_engineering) tries to access a data bucket to run some jobs. When EMRFS sees that the user is a member of the grp_data_engineering group, the group role emrfs_auth_group_role_data_eng is assumed, and the user has proper access to Amazon S3 that is defined in the emrfs_auth_group_role_data_eng role.

Next, the third user comes, who is not an admin and doesn’t belong to any of the groups. After failing evaluation of the top three entries, EMRFS evaluates whether the user is trying to access a certain Amazon S3 prefix defined in the last mapping entry. This type of mapping entry is called the prefix type. If the user is trying to access s3://emrfs-auth-default-bucket-demo/, then the prefix mapping is in effect, and the prefix role emrfs_auth_prefix_role_default_s3_prefix is assumed.

If the user is not trying to access any of the Amazon S3 paths that are defined on the list—which means it failed the evaluation of all the entries—it only has the permissions defined in the EMR_EC2RestrictedRole. This role is assumed by the EC2 instances in the cluster.

In this process, all the mappings defined are evaluated in the defined order, and the first role that is mapped is assumed, and the rest of the list is skipped.

Setting up an EMR cluster and mapping Active Directory users and groups

Now that we know how EMRFS authorization role mapping works, the next thing we need to think about is how we can use this feature in an easy and manageable way.

Active Directory setup

Many customers manage their users and groups using Microsoft Active Directory or other tools like OpenLDAP. In this post, we create the Active Directory on an Amazon EC2 instance running Windows Server and create the users and groups we will be using in the example below. After setting up Active Directory, we use the Amazon EMR Kerberos auto-join capability to establish a one-way trust from the KDC running on the EMR master node to the Active Directory domain on the EC2 instance. You can use your own directory services as long as it talks to the LDAP (Lightweight Directory Access Protocol).

After configuring Active Directory, you can create all the users and groups using the Active Directory tools and add users to appropriate groups. In this example, we created users like admin1, dataeng1, datascientist1, grp_data_engineering, and grp_data_science, and then add the users to the right groups.

Join the EMR cluster to an Active Directory domain

For clusters with Kerberos, Amazon EMR now supports automated Active Directory domain joins. You can use the security configuration to configure the one-way trust from the KDC to the Active Directory domain. You also configure the EMRFS role mappings in the same security configuration.

The following is an example of the EMR security configuration with a trusted Active Directory domain EMRKRB.TEST.COM and the EMRFS role mappings as we discussed earlier:

The EMRFS role mapping configuration is shown in this example:

We will also provide an example AWS CLI command that you can run.

Launching the EMR cluster and running the tests

Now you have configured Kerberos and EMRFS authorization for Amazon S3.

Additionally, you need to configure Hue with Active Directory using the Amazon EMR configuration API in order to log in using the AD users created before. The following is an example of Hue AD configuration.

Note: In the preceding configuration JSON file, change the values as required before pasting it into the software setting section in the Amazon EMR console.

Now let’s use this configuration and the security configuration you created before to launch the cluster.

In the Amazon EMR console, choose Create cluster. Then choose Go to advanced options. On the Step1: Software and Steps page, under Edit software settings (optional), paste the configuration in the box.

The rest of the setup is the same as an ordinary cluster setup, except in the Security Options section. In Step 4: Security, under Permissions, choose Custom, and then choose the RestrictedRole that you created before.

Choose the appropriate subnets (these should meet the base requirement in order for a successful Active Directory join—see the Amazon EMR Management Guide for more details), and choose the appropriate security groups to make sure it talks to the Active Directory. Choose a key so that you can log in and configure the cluster.

Most importantly, choose the security configuration that you created earlier to enable Kerberos and EMRFS authorization for Amazon S3.

Quickly run two commands to show that the Active Directory join is successful:

id [user name] shows the mapped AD users and groups in Linux.

hdfs groups [user name] shows the mapped group in Hadoop.

Both should return the current Active Directory user and group information if the setup is correct.

Now, you can test the user mapping first. Log in with the admin1 user, and run a Hadoop list directory command:

hadoop fs -ls s3://emrfs-auth-data-science-bucket-demo/

Now switch to a user from the data engineer group.

Retry the previous command to access the admin’s bucket. It should throw an Amazon S3 Access Denied exception.

When you try listing the Amazon S3 bucket that a data engineer group member has accessed, it triggers the group mapping.

hadoop fs -ls s3://emrfs-auth-data-engineering-bucket-demo/

It successfully returns the listing results. Next we will test Apache Hive and then Apache Spark.

To run jobs successfully, you need to create a home directory for every user in HDFS for staging data under /user/<username>. Users can configure a step to create a home directory at cluster launch time for every user who has access to the cluster. In this example, you use Hue since Hue will create the home directory in HDFS for the user at the first login. Here Hue also needs to be integrated with the same Active Directory as explained in the example configuration described earlier.

First, log in to Hue as a data engineer user, and open a Hive Notebook in Hue. Then run a query to create a new table pointing to the data engineer bucket, s3://emrfs-auth-data-engineering-bucket-demo/table1_data_eng/.

You can see that the table was created successfully. Now try to create another table pointing to the data science group’s bucket, where the data engineer group doesn’t have access.

It failed and threw an Amazon S3 Access Denied error.

Now insert one line of data into the successfully create table.

Next, log out, switch to a data science group user, and create another table, test2_datasci_tb.

The creation is successful.

The last task is to test Spark (it requires the user directory, but Hue created one in the previous step).

Now let’s come back to the command line and run some Spark commands.

Login to the master node using the datascientist1 user:

Start the SparkSQL interactive shell by typing spark-sql, and run the show tables command. It should list the tables that you created using Hive.

As a data science group user, try select on both tables. You will find that you can only select the table defined in the location that your group has access to.

Conclusion

EMRFS authorization for Amazon S3 enables you to have multiple roles on the same cluster, providing flexibility to configure a shared cluster for different teams to achieve better efficiency. The Active Directory integration and group mapping make it much easier for you to manage your users and groups, and provides better auditability in a multi-tenant environment.

Late 2017, Boing Boing co-editor Xena Jardin posted an article in which he linked to an archive containing every Playboy centerfold image to date.

“Kind of amazing to see how our standards of hotness, and the art of commercial erotic photography, have changed over time,” Jardin noted.

While Boing Boing had nothing to do with the compilation, uploading, or storing of the Imgur-based archive, Playboy took exception to the popular blog linking to the album.

Noting that Jardin had referred to the archive uploader as a “wonderful person”, the adult publication responded with a lawsuit (pdf), claiming that Boing Boing had commercially exploited its copyrighted images.

Last week, with assistance from the Electronic Frontier Foundation, Boing Boing parent company Happy Mutants filed a motion to dismiss in which it defended its right to comment on and link to copyrighted content without that constituting infringement.

“This lawsuit is frankly mystifying. Playboy’s theory of liability seems to be that it is illegal to link to material posted by others on the web — an act performed daily by hundreds of millions of users of Facebook and Twitter, and by journalists like the ones in Playboy’s crosshairs here,” the company wrote.

EFF Senior Staff Attorney Daniel Nazer weighed in too, arguing that since Boing Boing’s reporting and commenting is protected by copyright’s fair use doctrine, the “deeply flawed” lawsuit should be dismissed.

Now, just a week later, Playboy has fired back. Opposing Happy Mutants’ request for the Court to dismiss the case, the company cites the now-famous Perfect 10 v. Amazon/Google case from 2007, which tried to prevent Google from facilitating access to infringing images.

Playboy highlights the court’s finding that Google could have been held contributorily liable – if it had knowledge that Perfect 10 images were available using its search engine, could have taken simple measures to prevent further damage, but failed to do so.

Turning to Boing Boing’s conduct, Playboy says that the company knew it was linking to infringing content, could have taken steps to prevent that, but failed to do so. It then launches an attack on the site itself, offering disparaging comments concerning its activities and business model.

“This is an important case. At issue is whether clickbait sites like Happy Mutants’ Boing Boing weblog — a site designed to attract viewers and encourage them to click on links in order to generate advertising revenue — can knowingly find, promote, and profit from infringing content with impunity,” Playboy writes.

“Clickbait sites like Boing Boing are not known for creating original content. Rather, their business model is based on ‘collecting’ interesting content created by others. As such, they effectively profit off the work of others without actually creating anything original themselves.”

Playboy notes that while sites like Boing Boing are within their rights to leverage works created by others, courts in the US and overseas have ruled that knowingly linking to infringing content is unacceptable.

Even given these conditions, Playboy argues, Happy Mutants and the EFF now want the Court to dismiss the case so that sites are free to “not only encourage, facilitate, and induce infringement, but to profit from those harmful activities.”

Claiming that Boing Boing’s only reason for linking to the infringing album was to “monetize the web traffic that over fifty years of Playboy photographs would generate”, Playboy insists that the site and parent company Happy Mutants was properly charged with copyright infringement.

Playboy also dismisses Boing Boing’s argument that a link to infringing content cannot result in liability due to the link having both infringing and substantial non-infringing uses.

First citing the Betamax case, which found that maker Sony could not be held liable for infringement because its video recorders had substantial non-infringing uses, Playboy counters with the Grokster decision, which held that a distributor of a product could be liable for infringement, if there was an intent to encourage or support infringement.

“In this case, Happy Mutants’ offending link — which does nothing more than support infringing content — is good for nothing but promoting infringement and there is no legitimate public interest in its unlicensed availability,” Playboy notes.

In its motion to dismiss, Happy Mutants also argued that unless Playboy could identify users who “in fact downloaded — rather than simply viewing — the material in question,” the case should be dismissed. However, Playboy rejects the argument, claiming it is based on an erroneous interpretation of the law.

Citing the Grokster decision once more, the adult publisher notes that the Supreme Court found that someone infringes contributorily when they intentionally induce or encourage direct infringement.

“The argument that contributory infringement only lies where the defendant’s actions result in further infringement ignores the ‘or’ and collapses ‘inducing’ and ‘encouraging’ into one thing when they are two distinct things,” Playboy writes.

As for Boing Boing’s four classic fair use arguments, the publisher describes these as “extremely weak” and proceeds to hit them one by one.

In respect of the purpose and character of the use, Playboy discounts Boing Boing’s position that the aim of its post was to show “how our standards of hotness, and the art of commercial erotic photography, have changed over time.” The publisher argues that is the exact same purpose of Playboy magazine, while highliting its publication Playboy: The Compete Centerfolds, 1953-2016.

Moving on to the second factor of fair use – the nature of the copyrighted work – Playboy notes that an entire album of artwork is involved, rather than just a single image.

On the third factor, concerning the amount and substantiality of the original work used, Playboy argues that in order to publish an opinion on how “standards of hotness” had developed over time, there was no need to link to all of the pictures in the archive.

“Had only representative images from each decade, or perhaps even each year, been taken, this would be a very different case — but Happy Mutants cannot dispute that it knew it was linking to an illegal library of ‘Every Playboy Playmate Centerfold Ever’ since that is what it titled its blog post,” Playboy notes.

Finally, when considering the effect of the use upon the potential market for or value of the copyrighted work, Playbody says its archive of images continues to be monetized and Boing Boing’s use of infringing images jeopardizes that.

“Given that people are generally not going to pay for what is freely available, it is disingenuous of Happy Mutants to claim that promoting the free availability of infringing archives of Playboy’s work for viewing and downloading is not going to have an adverse effect on the value or market of that work,” the publisher adds.

While it appears the parties agree on very little, there is agreement on one key aspect of the case – its wider importance.

On the one hand, Playboy insists that a finding in its favor will ensure that people can’t commercially exploit infringing content with impunity. On the other, Boing Boing believes that the health of the entire Internet is at stake.

“The world can’t afford a judgment against us in this case — it would end the web as we know it, threatening everyone who publishes online, from us five weirdos in our basements to multimillion-dollar, globe-spanning publishing empires like Playboy,” the company concludes.

Playboy’s opposition to Happy Mutants’ motion to dismiss can be found here (pdf)

So, what’s Amazon Elastic Container Service (ECS)? ECS is a managed service for running containers on AWS, designed to make it easy to run applications in the cloud without worrying about configuring the environment for your code to run in. Using ECS, you can easily deploy containers to host a simple website or run complex distributed microservices using thousands of containers.

Getting started with ECS isn’t too difficult. To fully understand how it works and how you can use it, it helps to understand the basic building blocks of ECS and how they fit together!

Let’s begin with an analogy

Imagine you’re in a virtual reality game with blocks and portals, in which your task is to build kingdoms.

In your spaceship, you pull up a holographic map of your upcoming destination: Nozama, a golden-orange planet. Looking at its various regions, you see that the nearest one is za-southwest-1 (SWNozama). You set your destination, and use your jump drive to jump to the outer atmosphere of za-southwest-1.

As you approach SW Nozama, you see three portals, 1a, 1b, and 1c. Each portal lets you transport directly to an isolated zone (Availability Zone), where you can start construction on your new kingdom (cluster), Royaume.

With your supply of blocks, you take the portal to 1b, and erect the surrounding walls of your first territory (instance)*.

Before you get ahead of yourself, there are some rules to keep in mind. For your territory to be a part of Royaume, the land ordinance requires construction of a building (container), specifically a castle, from which your territory’s lord (agent)* rules.

You can then create architectural plans (task definitions) to build your developments (tasks), consisting of up to 10 buildings per plan. A development can be built now within this or any territory, or multiple territories.

If you do decide to create more territories, you can either stay here in 1b or take a portal to another location in SW Nozama and start building there.

Amazon EC2 building blocks

We currently provide two launch types: EC2 and Fargate. With Fargate, the Amazon EC2 instances are abstracted away and managed for you. Instead of worrying about ECS container instances, you can just worry about tasks. In this post, the infrastructure components used by ECS that are handled by Fargate are marked with a *.

Instance*

EC2 instances are good ol’ virtual machines (VMs). And yes, don’t worry, you can connect to them (via SSH). Because customers have varying needs in memory, storage, and computing power, many different instance types are offered. Just want to run a small application or try a free trial? Try t2.micro. Want to run memory-optimized workloads? R3 and X1 instances are a couple options. There are many more instance types as well, which cater to various use cases.

AMI*

Sorry if you wanted to immediately march forward, but before you create your instance, you need to choose an AMI. An AMI stands for Amazon Machine Image. What does that mean? Basically, an AMI provides the information required to launch an instance: root volume, launch permissions, and volume-attachment specifications. You can find and choose a Linux or Windows AMI provided by AWS, the user community, the AWS Marketplace (for example, the Amazon ECS-Optimized AMI), or you can create your own.

Region

AWS is divided into regions that are geographic areas around the world (for now it’s just Earth, but maybe someday…). These regions have semi-evocative names such as us-east-1 (N. Virginia), us-west-2 (Oregon), eu-central-1 (Frankfurt), ap-northeast-1 (Tokyo), etc.

Each region is designed to be completely isolated from the others, and consists of multiple, distinct data centers. This creates a “blast radius” for failure so that even if an entire region goes down, the others aren’t affected. Like many AWS services, to start using ECS, you first need to decide the region in which to operate. Typically, this is the region nearest to you or your users.

Availability Zone

AWS regions are subdivided into Availability Zones. A region has at minimum two zones, and up to a handful. Zones are physically isolated from each other, spanning one or more different data centers, but are connected through low-latency, fiber-optic networking, and share some common facilities. EC2 is designed so that the most common failures only affect a single zone to prevent region-wide outages. This means you can achieve high availability in a region by spanning your services across multiple zones and distributing across hosts.

Amazon ECS building blocks

Container

Are containers virtual machines? Nope! Virtual machines virtualize the hardware (benefits), while containers virtualize the operating system (even more benefits!). If you look inside a container, you would see that it is made by processes running on the host, and tied together by kernel constructs like namespaces, cgroups, etc. But you don’t need to bother about that level of detail, at least not in this post!

Why containers? Containers give you the ability to build, ship, and run your code anywhere!

Before the cloud, you needed to self-host and therefore had to buy machines in addition to setting up and configuring the operating system (OS), and running your code. In the cloud, with virtualization, you can just skip to setting up the OS and running your code. Containers make the process even easier—you can just run your code.

Additionally, all of the dependencies travel in a package with the code, which is called an image. This allows containers to be deployed on any host machine. From the outside, it looks like a host is just holding a bunch of containers. They all look the same, in the sense that they are generic enough to be deployed on any host.

With ECS, you can easily run your containerized code and applications across a managed cluster of EC2 instances.

Are containers a fairly new technology? The concept of containerization is not new. Its origins date back to 1979 with the creation of chroot. However, it wasn’t until the early 2000s that containers became a major technology. The most significant milestone to date was the release of Docker in 2013, which led to the popularization and widespread adoption of containers.

What does ECS use? While other container technologies exist (LXC, rkt, etc.), because of its massive adoption and use by our customers, ECS was designed first to work natively with Docker containers.

And as you probably guessed, in these instances, you are running containers.

AMI*

These container instances can use any AMI as long as it has the following specifications: a modern Linux distribution with the agent and the Docker Daemon with any Docker runtime dependencies running on it.

Want it more simplified? Well, AWS created the Amazon ECS-Optimized AMI for just that. Not only does that AMI come preconfigured with all of the previously mentioned specifications, it’s tested and includes the recommended ecs-init upstart process to run and monitor the agent.

Cluster

An ECS cluster is a grouping of (container) instances* (or tasks in Fargate) that lie within a single region, but can span multiple Availability Zones – it’s even a good idea for redundancy. When launching an instance (or tasks in Fargate), unless specified, it registers with the cluster named “default”. If “default” doesn’t exist, it is created. You can also scale and delete your clusters.

Agent*

The Amazon ECS container agent is a Go program that runs in its own container within each EC2 instance that you use with ECS. (It’s also available open source on GitHub!) The agent is the intermediary component that takes care of the communication between the scheduler and your instances. Want to register your instance into a cluster? (Why wouldn’t you? A cluster is both a logical boundary and provider of pool of resources!) Then you need to run the agent on it.

Task

When you want to start a container, it has to be part of a task. Therefore, you have to create a task first. Succinctly, tasks are a logical grouping of 1 to N containers that run together on the same instance, with N defined by you, up to 10. Let’s say you want to run a custom blog engine. You could put together a web server, an application server, and an in-memory cache, each in their own container. Together, they form a basic frontend unit.

Task definition

Ah, but you cannot create a task directly. You have to create a task definition that tells ECS that “task definition X is composed of this container (and maybe that other container and that other container too!).” It’s kind of like an architectural plan for a city. Some other details it can include are how the containers interact, container CPU and memory constraints, and task permissions using IAM roles.

Then you can tell ECS, “start one task using task definition X.” It might sound like unnecessary planning at first. As soon as you start to deal with multiple tasks, scaling, upgrades, and other “real life” scenarios, you’ll be glad that you have task definitions to keep track of things!

Scheduler*

So, the scheduler schedules… sorry, this should be more helpful, huh? The scheduler is part of the “hosted orchestration layer” provided by ECS. Wait a minute, what do I mean by “hosted orchestration”? Simply put, hosted means that it’s operated by ECS on your behalf, without you having to care about it. Your applications are deployed in containers running on your instances, but the managing of tasks is taken care of by ECS. One less thing to worry about!

Also, the scheduler is the component that decides what (which containers) gets to run where (on which instances), according to a number of constraints. Say that you have a custom blog engine to scale for high availability. You could create a service, which by default, spreads tasks across all zones in the chosen region. And if you want each task to be on a different instance, you can use the distinctInstancetask placement constraint. ECS makes sure that not only this happens, but if a task fails, it starts again.

Service

To ensure that you always have your task running without managing it yourself, you can create a service based on the task that you defined and ECS ensures that it stays running. A service is a special construct that says, “at any given time, I want to make sure that N tasks using task definition X1 are running.” If N=1, it just means “make sure that this task is running, and restart it if needed!” And with N>1, you’re basically scaling your application until you hit N, while also ensuring each task is running.

So, what now?

Hopefully you, at the very least, learned a tiny something. All comments are very welcome!

Want to discuss ECS with others? Join the amazon-ecs slack group, which members of the community created and manage.

Also, if you’re interested in learning more about the core concepts of ECS and its relation to EC2, here are some resources:

After getting in contact with us to share their latest build with us, we invited Matvey Fridman of Germany-based production company Spiegelbilder Studio to write a guest blog post about their CRT video walls created for the band STRANDKØNZERT.

CRT video wall

About a year ago, we had the idea of building a huge video wall out of old TVs to use in a music video. It took some time, but half a year later we found ourselves in a studio actually building this thing using 30 connected computers, 24 of which were Raspberry Pis.

How we did it

After weeks and months of preproduction and testing, we decided on two consecutive days to build the wall, create the underlying IP network, run a few tests, and then film the artists’ performance in front of it. We actually had 32 Pis (a mixed bag of first, second, and third generation models) and even more TVs ready to go, since we didn’t know what the final build would actually look like. We ended up using 29 separate screens of various sizes hooked up to 24 separate Pis — the remaining five TVs got a daisy-chained video signal out of other monitors for a cool effect. Each Pi had to run a free software called PiWall.

Since the TVs only had analogue video inputs, we had to get special composite breakout cables and then adapt the RCA connectors to either SCART, S-Video, or BNC.

As soon as we had all of that running, we connected every Pi to a 48-port network switch that we’d hooked up to a Windows PC acting as a DHCP server to automatically assign IP addresses and handle the multicast addressing. To make remote control of the Raspberry Pis easier, a separate master Linux PC and two MacBook laptops, each with SSH enabled and a Samba server running, joined the network as well.

The MacBook laptops were used to drop two files containing the settings on each Pi. The .pitile file was unique to every Pi and contained their respective IDs. The .piwall file contained the same info for all Pis: the measurements and positions of every single screen to help the software split up the video signal coming in through the network. After every Pi got the command to start the PiWall software, which specifies the UDP multicast address and settings to be used to receive the video stream, the master Linux PC was tasked with streaming the video file to these UDP addresses. Now every TV was showing its section of the video, and we could begin filming.

The whole process and the contents of the files and commands are summarised in the infographic below. A lot of trial and error was involved in the making of this project, but it all worked out well in the end. We hope you enjoy the craft behind the music video even though the music is not for everybody 😉

Late last year Playboy sued the popular blog Boing Boing for publishing an article that linked to an archive of every playmate centerfold till then.

“Kind of amazing to see how our standards of hotness, and the art of commercial erotic photography, have changed over time,” Boing Boing’s Xena Jardin commented.

Playboy, instead, was amazed that infringing copies of their work were being shared in public. While Boing Boing didn’t upload or store the images in question, the publisher took the case to court.

The blog’s parent company Happy Mutants was accused of various counts of copyright infringement, with Playboy claiming that it exploited their playmates’ images for commercial purposes.

Boing Boing sees things differently. With help from the Electronic Frontier Foundation, it has filed a motion to dismiss the case, arguing that hyperlinking is not copyright infringement.

“This lawsuit is frankly mystifying. Playboy’s theory of liability seems to be that it is illegal to link to material posted by others on the web — an act performed daily by hundreds of millions of users of Facebook and Twitter, and by journalists like the ones in Playboy’s crosshairs here,” they write.

The article in question

The defense points out that Playboy’s complaint fails to state a claim for direct or contributory copyright infringement. In addition, it argues that this type of reporting should be seen as fair use.

“We’re asking the court to dismiss this deeply flawed lawsuit. Journalists, scientists, researchers, and everyday people on the web have the right to link to material, even copyrighted material, without having to worry about getting sued.”

The lawsuit shares a lot of similarities with the case between Dutch blog GeenStijl and local Playboy publisher Sanoma. That high-profile case went all the way to the European Court of Justice.

The highest European court eventually decided that hyperlinks to infringing works are to be considered a ‘communication to the public,’ and that a commercial publication can indeed be held liable for copyright infringement.

Boing Boing hopes that US Courts will see things differently, or it might be “the end of the web as we know it.”

“The world can’t afford a judgment against us in this case — it would end the web as we know it, threatening everyone who publishes online, from us five weirdos in our basements to multimillion-dollar, globe-spanning publishing empires like Playboy,” Boing Boing writes.

—

A copy of Boing Boing’s memorandum in support of the motion to dismiss is available here (pdf). The original Playboy complaint can be found here (pdf).

Jason Barnett used the pots feature of the Monzo banking API to create a simple e-paper display so that his kids can keep track of their pocket money.

Monzo

For those outside the UK: Monzo is a smartphone-based bank that allows costumers to manage their money and payment cards via an app, removing the bank clerk middleman.

In the Monzo banking app, users can set up pots, which allow them to organise their money into various, you guessed it, pots. You want to put aside holiday funds, budget your food shopping, or, like Jason, manage your kids’ pocket money? Using pots is an easy way to do it.

Jason’s Monzo Pot ePaper tracker

After failed attempts at keeping track of his sons’ pocket money via a scrap of paper stuck to the fridge, Jason decided to try a new approach.

He started his build by installing Stretch Lite to the SD card of his Raspberry Pi Zero W. “The Pi will be running headless (without screen, mouse or keyboard)”, he explains on his blog, “so there is no need for a full-fat Raspbian image.” While Stretch Lite was downloading, he set up the Waveshare ePaper HAT on his Zero W. He notes that Pimoroni’s “Inky pHAT would be easiest,” but his tutorial is specific to the Waveshare device.

Before ejecting the SD card, Jason updated the boot partition to allow him to access the Pi via SSH. He talks makers through that process here.

Among the libraries he installed for the project is pyMonzo, a Python wrapper for the Monzo API created by Paweł Adamczak. Monzo is still in its infancy, and the API is partly under construction. Until it’s completed, Paweł’s wrapper offers a more stable way to use it.

After installing the software, it was time to set up the e-paper screen for the tracker. Jason adjusted the code for the API so that the screen reloads information every 15 minutes, displaying the up-to-date amount of pocket money in both kids’ pots.

Here is how Jason describes going to the supermarket with his sons, now that he has completed the tracker:

“Daddy, I want (insert first thing picked up here), I’ve always wanted one of these my whole life!” […] Even though you have never seen that (insert thing here) before, I can quickly open my Monzo app, flick to Account, and say “You have £3.50 in your money box”. If my boy wants it, a 2-second withdrawal is made whilst queueing, and done — he walks away with a new (again, insert whatever he wanted his whole life here) and is happy!

Jason’s blog offers a full breakdown of his project, including all necessary code and the specs for the physical build. Be sure to head over and check it out.

Many customers ask for guidance on migrating end-to-end solutions running on virtual machines over to AWS. This post provides an overview of moving a common WordPress blog running on a virtualized platform to AWS, including re-pointing the DNS records associated to with the website.

AWS Server Migration Service (AWS SMS) is an agentless service that makes it easier and faster for you to migrate thousands of on-premises workloads to AWS. AWS SMS allows you to automate, schedule, and track incremental replications of live server volumes, making it easier for you to coordinate large-scale server migrations.

Walkthrough

The key elements of this migration process include the following steps:

Before you start, ensure that your source systems OS and vCenter version are supported by AWS. For more information, see the Server Migration Service FAQ.

Establish your AWS environment

For this walkthrough, your WordPress blog is currently running as a two-tier LAMP stack in a corporate data center. You have a frontend running Apache and PHP, plus a backend database running on MySQL. All systems are hosted on a virtualized platform.

First, establish your AWS environment. If your organization is new to AWS, this may include account or subaccount creation, a new virtual private cloud (VPC), and associated subnets, route tables, internet gateways, and so on. Think of this phase as setting up your software-defined data center. For more information, see Getting Started with Amazon EC2.

The blog is a two-tier stack, so go with two private subnets. Because you want it to be highly available, use multiple Availability Zones. A zone resides within an AWS Region. Each zone is isolated, but the zones within a region are connected through low-latency links. This allows architects and solution designers to build highly available solutions.

Replicate your database

WordPress uses a MySQL relational database. You could continue to manage MySQL and the associated EC2 instances associated with maintaining and scaling a database. For this walkthrough, use this opportunity to migrate to an RDS instance of Amazon Aurora, as it is a MySQL compliant database. Not only is Amazon Aurora a high-performant database engine but it frees you up to focus on application development by managing time-consuming database administration tasks, including backups, software patching, monitoring, scaling, and replication.

Use AWS Database Migration Service to migrate your MySQL database to Amazon Aurora easily and securely. After a database migration instance has been instantiated, configure the source and destination endpoints and create a replication task.

Install and configure the SMS Connector appliance

Launch a new VM based on the SMS Connector that you downloaded. To configure the connector, connect to it via HTTPS. You can obtain the SMS Connector IP address from your hypervisor.

Connect to the SMS Connector via HTTPS. In the example above, the connector IP address is 10.0.0.31. In your browser, enter https://10.0.0.31.

Configure the connector with the IAM and hypervisor credentials that you created earlier.

After it’s configured, and the associated connectivity and authentication checks have passed, return to the console and view your connector in AWS SMS.

Import your virtual machine inventory and create a replication job After validating that the SMS Connector is in a “HEALTHY” state, import your server catalog to AWS SMS. This process can take up to a minute.

Select the server to migrate and choose Create replication job. The console guides you through the process. The time that the initial replication task takes to complete is dependent on the available bandwidth and the size of your VM. After the initial seed replication, network bandwidth is minimized as AWS SMS replicates only incremental changes occurring on the VM.

Launch your EC2 instance

When your replication task is complete, the artifact created by AWS SMS is a custom AMI that you can use to deploy an EC2 instance. Follow the usual process to launch your EC2 instance, noting that you may need to replace any host-based firewalls with security groups and NACLs.

When you create an EC2 instance, ensure that you pick the most suitable EC2 instance type and size to match your performance requirements while optimizing for cost.

While your new EC2 instance is a replica of your on-premises VM, you should always validate that applications are functioning. How you do this differs on an application-by-application basis. You can use a combination of approaches, such as editing a local host file and testing your application, SSH, or Telnet.

From the RDS console, get your connection string details and update your WordPress configuration file to point to the Amazon Aurora database. As WordPress is expecting a MySQL database and Amazon Aurora is MySQL-compliant, this change of database engine is transparent to WordPress.

You have validated that your WordPress application is running correctly, as you are still receiving changes from your on-premises data center via AWS DMS into your Amazon Aurora database.

You can now update your DNS zone file using Amazon Route 53. Route 53 can be driven by multiple methods: console, SDK, or AWS CLI.

For this walkthrough, update your DNS zone file via the AWS CLI. The JSON example shows upserting the A record in your zone to resolve to your EC2 instance.

Use the AWS CLI to execute the request and update the record in your zone file. The cut-over period between the original off-cloud location and AWS is defined by the TTL in the SOA (statement of authority) in your DNS zone. During this period, any requests resolving to your off-cloud server that result in database writes are automatically replicated to your Amazon Aurora instance via AWS DMS.

You have now successfully migrated your WordPress blog to AWS. Based on the TTL of your DNS zone file, end users slowly resolve the WordPress blog to AWS.

Summary

In this post, you moved a WordPress blog to AWS, using AWS SMS and AWS DMS to re-point the associated DNS records.

Many architectures can be extended to use many of the inherent benefits of AWS, with little effort. For example, by using Amazon CloudWatch metrics to drive Auto Scaling policies, you can use an Application Load Balancer as your frontend. This removes the single point of failure for a single Amazon EC2 instance and ensures that your deployed capacity closely follows customer demand. Think big and get building!

Many enterprises use Microsoft Active Directory to manage users, groups, and computers in a network. And a question is asked frequently: How can Active Directory users access big data workloads running on Amazon EMR with the same single sign-on (SSO) experience they have when accessing resources in the Active Directory network?

This post walks you through the process of using AWS CloudFormation to set up a cross-realm trust and extend authentication from an Active Directory network into an Amazon EMR cluster with Kerberos enabled. By establishing a cross-realm trust, Active Directory users can use their Active Directory credentials to access an Amazon EMR cluster and run jobs as themselves.

Walkthrough overview

In this example, you build a solution that allows Active Directory users to seamlessly access Amazon EMR clusters and run big data jobs. Here’s what you need before setting up this solution:

A possible limit increase for your account (Note: Usually a limit increase will not be necessary. See the AWS Service Limits documentation if you encounter a limit error while building the solution.)

To make it easier for you to get started, I created AWS CloudFormation templates that automatically configure and deploy the solution for you. The following steps and resources are involved in setting up the solution:

Note: If you want to manually create and configure the components for this solution without using AWS CloudFormation, refer to the Amazon EMR cross-realm documentation.IMPORTANT: The AWS CloudFormation templates used in this post are designed to work only in the us-east-1 (N. Virginia) Region. They are not intended for production use without modification.

Single-step solution deployment

If you don’t want to set up each component individually, you can use the single-step AWS CloudFormation template. The single-step template is a master template that uses nested stacks (additional templates) to launch and configure all the resources for the solution in one go.

This takes you to the Create stack wizard in the AWS CloudFormation console. The template is launched in the US East (N. Virginia) Region by default. Do not change to a different Region because the template is designed to work only in us-east-1 (N. Virginia).

On the Select Template page, keep the default URL for the AWS CloudFormation template, and then choose Next.

On the Specify Details page, review the parameters for the template. Provide values for the parameters that require input (for more information, see the parameters table that follows).

The following parameters are available in this template.

Parameter

Default

Description

Domain Controller name

DC1

NetBIOS (hostname) name of the Active Directory server. This name can be up to 15 characters long.

NetBIOS name of the domain for users of earlier versions of Windows. This name can be up to 15 characters long.

Domain admin user

CrossRealmAdmin

User name for the account that is added as domain administrator. This account is separate from the default administrator account.

Domain admin password

Requires input

Password for the domain admin user. Must be at least eight characters including letters, numbers, and symbols.

Key pair name

Requires input

Name of an existing key pair, which enables you to connect securely to your instance after it launches.

Instance type

m4.xlarge

Instance type for the domain controller and the Amazon EMR cluster.

Allowed IP address

10.0.0.0/16

The client IP address can that can reach your cluster. Specify an IP address range in CIDR notation (for example, 203.0.113.5/32). By default, only the VPC CIDR (10.0.0.0/16) can reach the cluster. Be sure to add your client IP range so that you can connect to the cluster using SSH.

EMR Kerberos realm

EC2.INTERNAL

Cluster’s Kerberos realm name. By default, the realm name is derived from the cluster’s VPC domain name in uppercase letters (for example, EC2.INTERNAL is the default VPC domain name in the us-east-1 Region).

Trusted AD domain

EXAMPLE.COM

The Active Directory (AD) domain that you want to trust. This is the same as the “Active Directory domain.” However, it must use all uppercase letters (for example, EXAMPLE.COM).

Cross-realm trust password

Requires input

Password that you want to use for your cross-realm trust.

Instance count

2

The number of instances (core nodes) for the cluster.

EMR applications

Hadoop, Spark, Ganglia, Hive

Comma separated list of applications to install on the cluster.

After you specify the template details, choose Next. On the Options page, choose Next again. On the Review page, select the I acknowledge that AWS CloudFormation might create IAM resources with custom names check box, and then choose Create.

It takes approximately 45 minutes for the deployment to complete. When the stack launch is complete, it will return outputs with information about the resources that were created. Note the outputs and skip to the Managing and testing the solution section. You can view the stack outputs on the AWS Management Console or by using the following AWS CLI command:

Launch and configure an Active Directory domain controller

In this step, you use an AWS CloudFormation template to automatically launch and configure a new Active Directory domain controller and cross-realm trust.

Note: There are various ways to install and configure an Active Directory domain controller. For details on manually launching and installing a domain controller without AWS CloudFormation, see Step 2: Launch and Install the AD Domain Controller in the Amazon EMR documentation.

In addition to launching and configuring an Active Directory domain controller and cross-realm trust, this AWS CloudFormation template also sets the domain controller as the DNS server (name server) for your Amazon VPC. In other words, the template creates a new DHCP option-set for the VPC where it’s being deployed to, and it sets the private IP of the domain controller as the name server for that new DHCP option set.

IMPORTANT: You should not use this template on a production VPC with existing resources like Amazon EC2 instances. When you launch this stack, make sure that you use the new environment and resources (Amazon VPC, subnet, and security group) that were created in the Create and configure an Amazon VPC step.

To launch this stack, choose Launch Stack:

The following table contains information about the parameters available in this template. Review the parameters for the template and provide values for those that require input.

Parameter

Default

Description

VPC ID

Requires input

Launch the domain controller on this VPC (for example, use the VPC created in the Create and configure an Amazon VPC step).

Subnet ID

Requires input

Subnet used for the domain controller (for example, use the subnet created in the Create and configure an Amazon VPC step).

Security group ID

Requires input

Security group (SG) for the domain controller (for example, use the SG created in the Create and configure an Amazon VPC step).

NetBIOS name of the domain for users of earlier versions of Windows. This name can be up to 15 characters long.

Domain admin user

CrossRealmAdmin

User name for the account that is added as domain administrator. This account is separate from the default administrator account.

Domain admin password

Requires input

Password for the domain admin user. Must be at least eight characters including letters, numbers, and symbols.

Key pair name

Requires input

Name of an existing EC2 key pair to enable access to the domain controller instance.

Instance type

m4.xlarge

Instance type for the domain controller.

EMR Kerberos realm

EC2.INTERNAL

Cluster’s Kerberos realm name. By default, the realm name is derived from the cluster’s VPC domain name in uppercase letters (for example, EC2.INTERNAL is the default VPC domain name in the us-east-1 Region).

Cross-realm trust password

Requires input

Password that you want to use for your cross-realm trust.

It takes 25–30 minutes for this stack to be created. When it’s complete, note the stack’s outputs, and then move to the next step: Launch an EMR cluster with Kerberos enabled.

Create a security configuration and launch an Amazon EMR cluster with Kerberos enabled

To launch a kerberized Amazon EMR cluster, you first need to create a security configuration containing the cross-realm trust configuration. You then specify cluster-specific Kerberos attributes when launching the cluster.

In this step, you use AWS CloudFormation to launch and configure a kerberized Amazon EMR cluster with a cross-realm trust. If you want to manually launch and configure a cluster with Kerberos enabled, see Step 6: Launch a Kerberized EMR Cluster in the Amazon EMR documentation.

Note: At the time of this writing, AWS CloudFormation does not yet support launching Amazon EMR clusters with Kerberos authentication enabled. To overcome this limitation, I created a template that uses an AWS Lambda-backed custom resource to launch and configure the Amazon EMR cluster with Kerberos enabled. If you use this template, there’s nothing else that you need to do. Just keep in mind that the template creates and invokes an AWS Lambda function (custom resource) to launch the cluster.

The following table lists and describes the template parameters for deploying a kerberized Amazon EMR cluster and configuring a cross-realm trust.

Parameter

Default

Description

Active Directory domain

example.com

The Active Directory domain that you want to establish the cross-realm trust with.

Domain admin user (joiner user)

CrossRealmAdmin

The user name of an Active Directory domain user with privileges to join domains/computers to the Active Directory domain (joiner user).

Domain admin password

Requires input

Password of the joiner user.

Cross-realm trust password

Requires input

Password of your cross-realm trust.

EC2 key pair name

Requires input

Name of an existing key pair, which enables you to connect securely to your cluster after it launches.

Subnet ID

Requires input

Subnet that you want to use for your Amazon EMR cluster (for example, choose the subnet created in the Create and configure an Amazon VPC step).

Security group ID

Requires input

Security group that you want to use for your Amazon EMR cluster (for example, choose the security group created in the Create and configure an Amazon VPC step).

Instance type

m4.xlarge

The instance type that you want to use for the cluster nodes.

Instance count

2

The number of instances (core nodes) for the cluster.

Allowed IP address

10.0.0.0/16

The client IP address can that can reach your cluster. Specify an IP address range in CIDR notation (for example, 203.0.113.5/32). By default, only the VPC CIDR (10.0.0.0/16) can reach the cluster. Be sure to add your client IP range so that you can connect to the cluster using SSH.

EMR applications

Hadoop, Spark, Ganglia, Hive

Comma separated list of the applications that you want installed on the cluster.

EMR Kerberos realm

EC2.INTERNAL

Cluster’s Kerberos realm name. By default, the realm name is derived from the cluster’s VPC domain name in uppercase letters (for example, EC2.INTERNAL is the default VPC domain name in the us-east-1 Region).

Trusted AD domain

EXAMPLE.COM

The Active Directory domain that you want to trust. This name is the same as the “AD domain name.” However, it must use all uppercase letters (for example, EXAMPLE.COM).

It takes 10–15 minutes for this stack to be created. When it’s complete, note the stack’s outputs, and then move to the next section: Managing and testing the solution.

Managing and testing the solution

Now that you’ve configured and built the solution, it’s time to test it by connecting to a cluster using Active Directory credentials.

SSH to a cluster using Active Directory credentials (single sign-on)

After you launch a kerberized Amazon EMR cluster, if you used the AWS CloudFormation templates and added your client IP range to the Allowed IP address parameter, you should be able to connect to the cluster using an SSH client and your Active Directory user credentials. If you have trouble connecting to the cluster using SSH, check the cluster’s security group to make sure that it allows inbound SSH connection (TCP port 22) from your client’s IP address (source).

The following steps assume that you’re using a client such as OpenSSH. If you’re using a different SSH application (for example, PuTTY), consult the application-specific documentation.

Note: Because the cluster was launched with a cross-realm trust configuration, you don’t need to use a private key (.pem file) when you connect to it as a domain user using SSH.

To connect to your Amazon EMR cluster as an Active Directory user using SSH, run the following command. Replace ad_user with the domain admin user that you created while setting up the domain controller and replace master_node_URL with the cluster’s URL (see the stack’s outputs to find this information):

$ ssh -l <ad_user> <master_node_URL>

If your SSH client is configured to use a key as the preferred authentication method, the login might fail. If that’s the case, you can add the following options to your SSH command to force the SSH connection to use password authentication:

After a domain user connects to the cluster using SSH, if this is the first that the user is connecting, a local home directory is created for that user. In addition to creating a local home directory, if you used the create-hdfs-home-ba.sh bootstrap action when launching the cluster (done by default if you used the AWS CloudFormation template to launch a kerberized cluster), an HDFS user home directory is also automatically created.

When you connect to the cluster using SSH for the first time (as a domain user), you should see the following messages if the HDFS home directory for your domain user was successfully created:

Running jobs on a kerberized Amazon EMR cluster

To run a job on a kerberized cluster, the user submitting the job must first be authenticated. If you followed the previous section to connect to your cluster as an Active Directory user using SSH, the user should be authenticated automatically.

If running the klist command returns a “No credentials cache found” message, it means that the user is not authenticated (the user doesn’t have a Kerberos ticket). You can re-authenticate a user at any time by running the following command (be sure to use all uppercase letters for the Active Directory domain):

$ kinit <username>@<AD_DOMAIN>

When the user is authenticated, they can submit jobs just like they would on a non-kerberized cluster.

Auditing jobs

Another advantage that Kerberos can provide is that you can easily tell which user ran a particular job. For example, connect (using SSH) to a kerberized cluster with an Active Directory user, and submit the SparkPi sample application:

$ spark-example SparkPi

After running the SparkPi application, go to the Amazon EMR console and choose your cluster. Then choose the Application history tab. There you can see information about the application, including the user that submitted the job:

Common issues

Although it would be hard to cover every possible Kerberos issue, this section covers some of the more common issues that might occur and ways to fix them.

Issue 1: You can successfully connect and get authenticated on a cluster. However, whenever you try running job, it fails with an error similar to the following:

org.apache.hadoop.security.AccessControlException: Permission denied

Solution: Make sure that an HDFS home directory for the user was created and that it has the right permissions.

Issue 2: You can successfully connect to the cluster, but you can’t run any Hadoop or HDFS commands.

Solution: Use the klist command to confirm whether the user is authenticated and has a valid Kerberos ticket. Use the kinit command to re-authenticate a user.

Issue 3: You can’t connect (using SSH) to the cluster using Active Directory user credentials, but you can manually authenticate the user with kinit.

Solution: Make sure that the Active Directory domain controller is the DNS server (name server) for the cluster nodes.

Cleaning up

After completing and testing this solution, remember to clean up the resources. If you used the AWS CloudFormation templates to create the resources, then use the AWS CloudFormation console or AWS CLI/SDK to delete the stacks. Deleting a stack also deletes the resources created by that stack.

If one of your stacks does not delete, make sure that there are no dependencies on the resources created by that stack. For example, if you deployed an Amazon VPC using AWS CloudFormation and then deployed a domain controller into that VPC using a different AWS CloudFormation stack, you must first delete the domain controller stack before the VPC stack can be deleted.

Summary

The ability to authenticate users and services with Kerberos not only allows you to secure your big data applications, but it also enables you to easily integrate Amazon EMR clusters with an Active Directory environment. This post showed how you can use Kerberos on Amazon EMR to create a single sign-on solution where Active Directory domain users can seamlessly access Amazon EMR clusters and run big data applications. We also showed how you can use AWS CloudFormation to automate the deployment of this solution.

Honolulu-based software developer bbtinkerer was tired of never being able to find the TV remote. So he made his own using a Raspberry Pi Zero, and connected it to a web app accessible on his smartphone.

Finding a remote alternative

“I needed one because the remote in my house tends to go missing a lot,” explains Bernard aka bbtinkerer on the Instructables page for his Raspberry Pi Zero Universal Remote.”If I want the controller, I have to hunt down three people and hope one of them remembers that they took it.”

For the build, Bernard used a Raspberry Pi Zero, an IR LED and corresponding receiver, Raspbian Lite, and a neat little 3D-printed housing.

First, he soldered a circuit for the LED and resistors on a small piece of perf board. Then he assembled the hardware components. Finally, all he needed to do was to write the code to control his devices (including a tower fan), and to set up the app.

Bernard employed the Linux Infrared Remote Control (LIRC) package to control the television with the Raspberry Pi Zero, accessing the Zero via SSH. He gives a complete rundown of the installation process on Instructables.

Setting up a remote’s buttons with LIRC is a simple case of pressing them and naming their functions one by one. You’ll need the remote to set up the system, but after that, feel free to lock it in a drawer and use your smartphone instead.

Finally, Bernard created the web interface using Node.js, and again, because he’s lovely, he published the code for anyone wanting to build their own. Thanks, Bernard!

We find that AWS customers often require that every query submitted to Presto running on Amazon EMR is logged. They want to track what query was submitted, when it was submitted and who submitted it.

In this blog post, we will demonstrate how to implement and install a Presto event listener for purposes of custom logging, debugging and performance analysis for queries executed on an EMR cluster. An event listener is a plugin to Presto that is invoked when an event such as query creation, query completion, or split completion occurs.

Presto also provides a system connector to access metrics and details about a running cluster. In particular, the system connector gets information about currently running and recently run queries by using the system.runtime.queries table. From the Presto command line interface (CLI), you get this data with the entries Select * from system.runtime.queries; and Select * from system.runtime.tasks;. This connector is available out of the box and exposes information and metrics through SQL.

In addition to providing custom logging and debugging, the Presto event listener allows you to enhance the information provided by the system connector by providing a mechanism to capture detailed query context, query statistics and failure information during query creation or completion, or a split completion.

We will begin by providing a detailed walkthrough of the implementation of the Presto event listener in Java followed by its deployment on the EMR cluster.

Implementation

We use the Eclipse IDE to create a Maven Project, as shown below:

Once you have created the Maven Project, modify the pom.xml file to add the dependency for Presto, as shown following:

After you add the Presto dependency to our pom.xml file, create a Java package under the src/main/java folder. In our project, we have named the package com.amazonaws.QueryEventListener. You can choose the naming convention that best fits your organization. Within this package, create three Java files for the EventListener, the EventListenerFactory, and the EventListenerPlugin.

As the Presto website says: “EventListenerFactory is responsible for creating an EventListener instance. It also defines an EventListener name, which is used by the administrator in a Presto configuration. Implementations of EventListener implement methods for the event types they are interested in handling. The implementation of EventListener and EventListenerFactory must be wrapped as a plugin and installed on the Presto cluster.”

In our project, we have named these Java files QueryEventListener, QueryEventListenerFactory, and QueryEventListenerPlugin:

Now we write our code for the Java files.

QueryEventListener – QueryEventListener implements the Presto EventListener interface. It has a constructor that creates five rotating log files of 524 MB each. After creating QueryEventListener, we implement the query creation, query completion, and split completion methods and log the events relevant to us. You can choose to include more events based on your needs.

QueryEventListenerFactory – The QueryEventListenerFactory class implements the Presto EventListenerFactory interface. We implement the method getName, which provides a registered EventListenerFactory name to Presto. We also implement the create method, which creates an EventListener instance.

QueryEventListenerPlugin – The QueryEventListenerPlugin class implements the Presto EventListenerPlugin interface. This class has a getEventListenerFactories method that returns an immutable list containing the EventListenerFactory. Basically, in this class we are wrapping QueryEventListener and QueryEventListenerFactory.

Finally, in our project we add the META-INF folder and a services subfolder within the META-INF folder. In the services subfolder, you create a file called com.facebook.presto.spi.Plugin:

As the Presto documentation describes: “Each plugin identifies an entry point: an implementation of the plugin interface. This class name is provided to Presto via the standard Java ServiceLoader interface: the classpath contains a resource file named com.facebook.presto.spi.Plugin in the META-INF/services directory”.

We add the name of our plugin class com.amazonaws.QueryEventListener.QueryEventListenerPlugin to the com.facebook.presto.spi.Plugin file, as shown below:

Next we will show you how to deploy the Presto plugin we created to Amazon EMR for custom logging.

Presto logging overview on Amazon EMR

Presto by default will produce three log files that capture the configurations properties and the overall operational events of the components that make up Presto, plus log end user access to the Presto UI.

On Amazon EMR, these log files are written into /var/log/presto. The log files in this directory are pushed into Amazon S3. This S3 location is the location of the new log file.

Steps to deploy Presto on Amazon EMR with custom logging

To deploy Presto on EMR with custom logging a bootstrap action will be used. The bootstrap is available in this Amazon Repository.

After updating the bootstrap with the S3 location for your JAR, upload that bootstrap to your own bucket.

The bootstrap action will copy the jar file with the custom EventListener implementation into all machines of the cluster. Moreover, the bootstrap action will create a file named event-listener.properties on the Amazon EMR Master node. This file will configure the coordinator to enable the custom logging plugin via property event-listener.name. The event-listener.name property is set to event-listener in the event-listener.properties file. As per Presto documentation, this property is used by Presto to find a registered EventListenerFactory based on the name returned by EventListenerFactory.getName().

Now that the bootstrap is ready, the following AWS CLI command can be used to create a new EMR cluster with the bootstrap:

Then, we access the presto command line by typing the following command on the terminal:

presto-cli --server localhost:8889 --catalog hive --schema default

Finally, we run a simple query:

Select * from wikistats limit 10;

We then go to the /var/log/presto directory and look at the contents of the log file queries-YYYY-MM-DDTHH\:MM\:SS.0.log. As depicted in the screenshot below, our QueryEventListener plugin captures the fields shown for the Query Created and Query Completed events. Moreover, if there are splits, the plugin will also capture split events.

Note: If you want to include the query text executed by the user for auditing and debugging purposes, add the field appropriately in the QueryEventListener class methods, as shown below:

Because this is custom logging, you can capture as many fields as are available for the particular events. To find out the fields available for each of the events, see the Java Classes provided by Presto at this GitHub location.

Summary

In this post, you learned how to add custom logging to Presto on EMR to enhance your organization’s auditing capabilities and provide insights into performance.

Additional Reading

About the Authors

Zafar Kapadia is a Cloud Application Architect for AWS. He works on Application Development and Optimization projects. He is also an avid cricketer and plays in various local leagues.

Francisco Oliveira is a Big Data Engineer with AWS Professional Services. He focuses on building big data solutions with open source technology and AWS. In his free time, he likes to try new sports, travel and explore national parks.

AWS Glue has a transform called Relationalize that simplifies the extract, transform, load (ETL) process by converting nested JSON into columns that you can easily import into relational databases. Relationalize transforms the nested JSON into key-value pairs at the outermost level of the JSON document. The transformed data maintains a list of the original keys from the nested JSON separated by periods.

Let’s look at how Relationalize can help you with a sample use case.

An example of Relationalize in action

Suppose that the developers of a video game want to use a data warehouse like Amazon Redshift to run reports on player behavior based on data that is stored in JSON. Sample 1 shows example user data from the game. The player named “user1” has characteristics such as race, class, and location in nested JSON data. Further down, the player’s arsenal information includes additional nested JSON data. If the developers want to ETL this data into their data warehouse, they might have to resort to nested loops or recursive functions in their code.

You can then write the data to a database or to a data warehouse. You can also write it to delimited text files, such as in comma-separated value (CSV) format, or columnar file formats such as Optimized Row Columnar (ORC) format. You can use either of these format types for long-term storage in Amazon S3. Storing the transformed files in S3 provides the additional benefit of being able to query this data using Amazon Athena or Amazon Redshift Spectrum. You can further extend the usefulness of the data by performing joins between data stored in S3 and the data stored in an Amazon Redshift data warehouse.

Before we get started…

In my example, I took two preparatory steps that save some time in your ETL code development:

I stored my data in an Amazon S3 bucket and used an AWS Glue crawler to make my data available in the AWS Glue data catalog. You can find instructions on how to do that in Cataloging Tables with a Crawler in the AWS Glue documentation. The AWS Glue database name I used was “blog,” and the table name was “players.” You can see these values in use in the sample code that follows.

I deployed a Zeppelin notebook using the automated deployment available within AWS Glue. If you already used an AWS Glue development endpoint to deploy a Zeppelin notebook, you can skip the deployment instructions. Otherwise, let’s quickly review how to deploy Zeppelin.

Deploying a Zeppelin notebook with AWS Glue

The following steps are outlined in the AWS Glue documentation, and I include a few screenshots here for clarity.

Specify a name for the endpoint and the AWS Glue IAM role that you created.

On the networking screen, choose Skip Networking because our code only communicates with S3.

Complete the development endpoint process by providing a Secure Shell (SSH) public key and confirming your settings.

When your new development endpoint’s Provisioning status changes from PROVISIONING to READY, choose your endpoint, and then for Actions choose Create notebook server.

Enter the notebook server details, including the role you previously created and a security group with inbound access allowed on TCP port 443.

Doing this automatically launches an AWS CloudFormation template. The output specifies the URL that you can use to access your Zeppelin notebook with the username and password you specified in the wizard.

How do we flatten nested JSON?

With my data loaded and my notebook server ready, I accessed Zeppelin, created a new note, and set my interpreter to spark. I used some Python code that AWS Glue previously generated for another job that outputs to ORC. Then I added the Relationalize transform. You can see the resulting Python code in Sample 3.­

Sample 3: Python code to transform the nested JSON and output it to ORC

What exactly is going on in this script?

After the import statements, we instantiate a GlueContext object, which allows us to work with the data in AWS Glue. Next, we create a DynamicFrame (datasource0) from the “players” table in the AWS Glue “blog” database. We use this DynamicFrame to perform any necessary operations on the data structure before it’s written to our desired output format. The source files remain unchanged.

We then run the Relationalize transform (Relationalize.apply()) with our datasource0 as one of the parameters. Another important parameter is the name parameter, which is a key that identifies our data after the transformation completes.

The Relationalize.apply() method returns a DynamicFrameCollection, and this is stored in the dfc variable. Before we can write our data to S3, we need to select the DynamicFrame from the DynamicFrameCollection object. We do this with the dfc.select() method. The correct DynamicFrame is stored in the blogdata variable.

You might be curious why a DynamicFrameCollection was returned when we started with a single DynamicFrame. This return value comes from the way Relationalize treats arrays in the JSON document: A DynamicFrame is created for each array. Together with the root data structure, each generated DynamicFrame is added to a DynamicFrameCollection when Relationalize completes its work. Although we didn’t have any arrays in our data, it’s good to keep this in mind. Finally, we output (blogdataoutput) the root DynamicFrame to ORC files in S3.

Using the transformed data

One of the use cases we discussed earlier was using Amazon Athena or Amazon Redshift Spectrum to query the ORC files.

I used the following SQL DDL statements to create external tables in both services to enable queries of my data stored in Amazon S3.

I even ran a query, shown in Sample 6, that joined my Redshift Spectrum table (spectrum.playerdata) with data in an Amazon Redshift table (public.raids) to generate advanced reports. In the where clause, I join the two tables based on the username values that are common to both data sources.

-- Get Total Raid Completions for the Hunter Class.
select spectrum.playerdata.characteristics_class as class, sum(public.raids."completions.val.raids.leviathan") as "Total Hunter Leviathan Raid Completions" from spectrum.playerdata, public.raids
where spectrum.playerdata.username = public.raids."completions.val.username"
and spectrum.playerdata.characteristics_class = 'Hunter'
group by spectrum.playerdata.characteristics_class;

Summary

This post demonstrated how simple it can be to flatten nested JSON data with AWS Glue, using the Relationalize transform to automate the conversion of nested JSON. AWS Glue also automates the deployment of Zeppelin notebooks that you can use to develop your Python automation script. Finally, AWS Glue can output the transformed data directly to a relational database, or to files in Amazon S3 for further analysis with tools such as Amazon Athena and Amazon Redshift Spectrum.

As great as Relationalize is, it’s not the only transform available with AWS Glue. You can see a complete list of available transforms in Built-In Transforms in the AWS Glue documentation. Try them out today!

Additional Reading

About the Author

Trevor Roberts Jr is a Solutions Architect with AWS. He provides architectural guidance to help customers achieve success in the cloud. In his spare time, Trevor enjoys traveling to new places and spending time with family.

Linux Foundation Director of IT infrastructure security, Konstantin Ryabitsev, has put together a lengthy guide to using Git and PGP to protect the integrity of source code. In a Google+ post, he called it “beta quality” and asked for help with corrections and fixes. “PGP incorporates a trust delegation mechanism known as the ‘Web of Trust.’ At its core, this is an attempt to replace the need for centralized Certification Authorities of the HTTPS/TLS world. Instead of various software makers dictating who should be your trusted certifying entity, PGP leaves this responsibility to each user.

Unfortunately, very few people understand how the Web of Trust works, and even fewer bother to keep it going. It remains an important aspect of the OpenPGP specification, but recent versions of GnuPG (2.2 and above) have implemented an alternative mechanism called ‘Trust on First Use’ (TOFU).

You can think of TOFU as ‘the SSH-like approach to trust.’ With SSH, the first time you connect to a remote system, its key fingerprint is recorded and remembered. If the key changes in the future, the SSH client will alert you and refuse to connect, forcing you to make a decision on whether you choose to trust the changed key or not.

Similarly, the first time you import someone’s PGP key, it is assumed to be trusted. If at any point in the future GnuPG comes across another key with the same identity, both the previously imported key and the new key will be marked as invalid and you will need to manually figure out which one to keep.

Introduced at AWS re:Invent 2017, Amazon GuardDuty is a managed threat detection service that continuously monitors for malicious or unauthorized behavior to help you protect your AWS accounts and workloads. In an AWS Blog post, Jeff Barr shows you how to enable GuardDuty to monitor your AWS resources continuously. That blog post shows how to get started with a single GuardDuty account and provides an overview of the features of the service. Your security team, though, will probably want to use GuardDuty to monitor a group of AWS accounts continuously.

In this post, I demonstrate how to use GuardDuty to monitor a group of AWS accounts and have their findings routed to another AWS account—the master account—that is owned by a security team. The method I demonstrate in this post is especially useful if your security team is responsible for monitoring a group of AWS accounts over which it does not have direct access—known as member accounts. In this solution, I simplify the work needed to enable GuardDuty in member accounts and configure findings by simplifying the process, which I do by enabling GuardDuty in the master account and inviting member accounts.

Enable GuardDuty in a master account and invite member accounts

To get started, you must enable GuardDuty in the master account, which will receive GuardDuty findings. The master account should be managed by your security team, and it will display the findings from all member accounts. The master account can be reverted later by removing any member accounts you add to it. Adding member accounts is a two-way handshake mechanism to ensure that administrators from both the master and member accounts formally agree to establish the relationship.

To designate this account as the GuardDuty master account, start adding member accounts:

You can add individual accounts by choosing Add Account, or you can add a list of accounts by choosing Upload List (.csv).

Now, add the account ID and email address of the member account, and choose Add. (If you are uploading a list of accounts, choose Browse, choose the .csv file with the member accounts [one email address and account ID per line], and choose Add accounts.)

For security reasons, AWS checks to make sure each account ID is valid and that you’ve entered each member account’s email address that was used to create the account. If a member account’s account ID and email address do not match, GuardDuty does not send an invitation.

After you add all the member accounts you want to add, you will see them listed in the Member accounts table with a Status of Invite. You don’t have to individually invite each account—you can choose a group of accounts and when you choose to invite one account in the group, all accounts are invited.

When you choose Invite for each member account:

AWS checks to make sure the account ID is valid and the email address provided is the email address of the member account.

AWS sends an email to the member account email address with a link to the GuardDuty console, where the member account owner can accept the invitation. You can add a customized message from your security team. Account owners who receive the invitation must sign in to their AWS account to accept the invitation. The service also sends an invitation through the AWS Personal Health Dashboard in case the member email address is not monitored. This invitation appears in the member account under the AWS Personal Health Dashboard alert bell on the AWS Management Console.

A pending-invitation indicator is shown on the GuardDuty console of the member account, as shown in the following screenshot.

When the invitation is sent by email, it is sent to the account owner of the GuardDuty member account.

The account owner can click the link in the email invitation or the AWS Personal Health Dashboard message, or the account owner can sign in to their account and navigate to the GuardDuty console. In all cases, the member account displays the pending invitation in the member account’s GuardDuty console with instructions for accepting the invitation. The GuardDuty console walks the account owner through accepting the invitation, including enabling GuardDuty if it is not already enabled.

If you prefer to work in the AWS CLI, you can enable GuardDuty and accept the invitation. To do this, call CreateDetector to enable GuardDuty, and then call AcceptInvitation, which serves the same purpose as accepting the invitation in the GuardDuty console.

After the member account owner accepts the invitation, the Status in the master account is changed to Monitored. The status helps you track the status of each AWS account that you invite.

You have enabled GuardDuty on the member account, and all findings will be forwarded to the master account. You can now monitor the findings about GuardDuty member accounts from the GuardDuty console in the master account.

The member account owner can see GuardDuty findings by default and can control all aspects of the experience in the member account with AWS Identity and Access Management (IAM) permissions. Users with the appropriate permissions can end the multi-account relationship at any time by toggling the Accept button on the Accounts page. Note that ending the relationship changes the Status of the account to Resigned and also triggers a security finding on the side of the master account so that the security team knows the member account is no longer linked to the master account.

Working with GuardDuty findings

Most security teams have ticketing systems, chat operations, security information event management (SIEM) systems, or other security automation systems to which they would like to push GuardDuty findings. For this purpose, GuardDuty sends all findings as JSON-based messages through Amazon CloudWatch Events, a scalable service to which you can subscribe and to which AWS services can stream system events. To access these events, navigate to the CloudWatch Events console and create a rule that subscribes to the GuardDuty-related findings. You then can assign a target such as Amazon Kinesis Data Firehose that can place the findings in a number of services such as Amazon S3. The following screenshot is of the CloudWatch Events console, where I have a rule that pulls all events from GuardDuty and pushes them to a preconfigured AWS Lambda function.

The following example is a subset of GuardDuty findings that includes relevant context and information about the nature of a threat that was detected. In this example, the instanceId, i-00bb62b69b7004a4c, is performing Secure Shell (SSH) brute-force attacks against IP address 172.16.0.28. From a Lambda function, you can access any of the following fields such as the title of the finding and its description, and send those directly to your ticketing system.

You can use other AWS services to build custom analytics and visualizations of your security findings. For example, you can connect Kinesis Data Firehose to CloudWatch Events and write events to an S3 bucket in a standard format, which can be encrypted with AWS Key Management Service and then compressed. You also can use Amazon QuickSight to build ad hoc dashboards by using AWS Glue and Amazon Athena. Similarly, you can place the data from Kinesis Data Firehose in Amazon Elasticsearch Service, with which you can use tools such as Kibana to build your own visualizations and dashboards.

Like most other AWS services, GuardDuty is a regional service. This means that when you enable GuardDuty in an AWS Region, all findings are generated and delivered in that region. If you are regulated by a compliance regime, this is often an important requirement to ensure that security findings remain in a specific jurisdiction. Because customers have let us know they would prefer to be able to enable GuardDuty globally and have all findings aggregated in one place, we intend to give the choice of regional or global isolation as we evolve this new service.

Summary

In this blog post, I have demonstrated how to use GuardDuty to monitor a group of GuardDuty member accounts and aggregate security findings in a central master GuardDuty account. You can use this solution whether or not you have direct control over the member accounts.

If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about using GuardDuty, start a thread in the GuardDuty forum or contact AWS Support.

Tags

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.