Tag Archives: DNS filtering

Phishing attacks begin like any other visit to a site on the Internet. A user opens a suspicious link from an email, and their DNS resolver looks up the hostname, then connects the user to the origin.

Cloudflare Gateway’s secure DNS blocks threats like this by checking every hostname query against a constantly-evolving list of known threats on the Internet. Instead of sending the user to the malicious host, Gateway stops the site from resolving.. The user sees a “blocked domain” page instead of the malicious site itself.

As teams migrate to SaaS applications and zero-trust solutions, they rely more on the public Internet to do their jobs. Gateway’s security works like a bouncer, keeping users safe as they navigate the Internet. However, some organizations still need to send traffic to internal destinations for testing or as a way to make the migration more seamless.

Starting today, you can use Cloudflare Gateway to direct end user traffic to a different IP than the one they originally requested. Administrators can build rules to override the address that would be returned by a resolver and send traffic to a specified alternative.

Like the security features of Cloudflare Gateway, the redirect function is available in every one of Cloudflare’s data centers in 200 cities around the world, so you can block bad traffic and steer internal traffic without compromising performance.

As part of that platform, Cloudflare Gateway blocks threats on the public Internet from becoming incidents inside of your organization. Gateway’s first release added DNS security filtering and content blocking to the world’s fastest DNS resolver, Cloudflare’s 1.1.1.1.

Deployment takes less than 5 minutes. Teams can secure entire office networks and segment traffic reports by location. For distributed organizations, Gateway can be deployed via MDM on networks that support IPv6 or using a dedicated IPv4 as part of a Cloudflare enterprise account.

With secure DNS filtering, administrators can click a single button to block known threats, like sources of malware or phishing sites. Policies can be extended to block specific categories, like gambling sites or social media. When users request a filtered site, Gateway stops the DNS query from resolving and prevents the device from connecting to a malicious destination or hostname with blocked material.

Traffic bound for internal destinations

As users connect to SaaS applications, Cloudflare Gateway can keep those teams secure from threats on the public Internet.

In parallel, teams can move applications that previously lived on a private network to a zero-trust model with Cloudflare Access. Rather than trusting anyone on a private network, Access checks for identity any time someone attempts to reach the application.

Together, Cloudflare for Teams keeps users safe and makes internal applications just as easy to use as SaaS tools. Making it easier to migrate to that model also reduces user friction. Domain overrides can smooth that transition from internal networks to a fully cloud-delivered model.

With Gateway’s domain override feature, administrators can choose certain hostnames that still run on the private network and send traffic to the local IPs with the same resolver that secures Internet-bound traffic. End users can continue to connect to those resources without disruption. Once ready, those tools can be secured with Cloudflare Access to remove the reliance on a private network altogether.

Cloudflare Gateway can help reduce user confusion and IT overhead with split-horizon setups where some traffic routes to the Internet and other requests need to stay on the same network. Administrators can build policies to route traffic bound for hostnames, even ones that exist publicly, to internal IP addresses that a user can reach if they are on the same local network.

How does it work?

When administrators configure an override policy, Cloudflare Gateway pushes that information to the edge of our network. The rule becomes part of the Gateway enforcement flow for that organization’s account. Explicit override policies are enforced first, before allowed or blocked rules.

When a user makes a request to the original destination, that request arrives at a Gateway IP address where Cloudflare’s network checks the source IP to determine which policies to enforce. Gateway determines that the request has an override rule and returns the preconfigured IP address.

Gateway’s DNS override feature is supported in deployments that use Cloudflare’s IPv4 or IPv6 addresses, as well as DNS over HTTPS.

What’s next?

The domain override feature is available to all Cloudflare for Teams customers today at no additional cost. You can begin building override rules by navigating to the Policies section of the Gateway product and selecting the “Custom” tab. Administrators can configure up to 1,000 custom rules.

To help organizations in their transition to remote work, Cloudflare has made our Teams platform free for any organization through September 1. You can set up an account at dash.teams.cloudflare.com now.

Note from September 4, 2019: We’ve updated this blog post, initially published on January 26, 2016. Major changes include: support of Amazon Linux 2, no longer having to compile Squid 3.5, and a high availability version of the solution across two availability zones.

For security and compliance purposes, you might have to filter the requests initiated by these instances (also known as “egress filtering”). Using iptables rules, you could restrict outbound traffic with your NAT instance based on a predefined destination port or IP address. However, you might need to enforce more complex security policies, such as allowing requests to AWS endpoints only, or blocking fraudulent websites, which you can’t easily achieve by using iptables rules.

In this post, I discuss and give an example of how to use Squid, a leading open-source proxy, to implement a “transparent proxy” that can restrict both HTTP and HTTPS outbound traffic to a given set of Internet domains, while being fully transparent for instances in the private subnet.

The solution architecture

In this section, I present the architecture of the high availability NAT solution and explain how to configure Squid to filter traffic transparently. Later in this post, I’ll provide instructions about how to implement and test the solution.

The following diagram illustrates how the components in this process interact with each other. Squid Instance 1 intercepts HTTP/S requests sent by instances in Private Subnet 1, including the Testing Instance. Squid Instance 1 then initiates a connection with the destination host on behalf of the Testing Instance, which goes through the Internet gateway. This solution spans two Availability Zones, with Squid Instance 2 intercepting requests sent from the other Availability Zone. Note that you may adapt the solution to span additional Availability Zones.

Figure 1: The solution spans two Availability Zones

Intercepting and filtering traffic

In each availability zone, the route table associated to the private subnet sends the outbound traffic to the Squid instance (see Route Tables for a NAT Device). Squid intercepts the requested domain, then applies the following filtering policy:

For HTTP requests, Squid retrieves the host header field included in all HTTP/1.1 request messages. This specifies the Internet host being requested.

For HTTPS requests, the HTTP traffic is encapsulated in a TLS connection between the instance in the private subnet and the remote host. Squid cannot retrieve the host header field because the header is encrypted. A feature called SslBump would allow Squid to decrypt the traffic, but this would not be transparent for the client because the certificate would be considered invalid in most cases. The feature I use instead, called SslPeekAndSplice, retrieves the Server Name Indication (SNI) from the TLS initiation. The SNI contains the requested Internet host. As a result, Squid can make filtering decisions without decrypting the HTTPS traffic.

Note 1: Some older client-side software stacks do not support SNI. Here are the minimum versions of some important stacks and programming languages that support SNI: Python 2.7.9 and 3.2, Java 7 JSSE, wget 1.14, OpenSSL 0.9.8j, cURL 7.18.1

Note 2: TLS 1.3 introduced an optional extension that allows the client to encrypt the SNI, which may prevent Squid from intercepting the requested domain.

The SslPeekAndSplice feature was introduced in Squid 3.5 and is implemented in the same Squid module as SslBump. To enable this module, Squid requires that you provide a certificate, though it will not be used to decode HTTPS traffic. The solution creates a certificate using OpenSSL.

The following code shows the Squid configuration file. For HTTPS traffic, note the ssl_bump directives instructing Squid to “peek” (retrieve the SNI) and then “splice” (become a TCP tunnel without decoding) or “terminate” the connection depending on the requested host.

The text file located at /etc/squid/whitelist.txt contains the list of whitelisted domains, with one domain per line. In this blog post, I’ll show you how to configure Squid to allow requests to *.amazonaws.com, which corresponds to AWS endpoints. Note that you can restrict access to a specific set of AWS services that you’ve defined (see Regions and Endpoints for a detailed list of endpoints), or you can set your own list of domains.

Note: An alternate approach is to use VPC endpoints to privately connect your VPC to supported AWS services without requiring access over the Internet (see VPC Endpoints). Some supported AWS services allow you to create a policy that controls the use of the endpoint to access AWS resources (see VPC Endpoint Policies, and VPC Endpoints for a list of supported services).

You may have noticed that Squid listens on port 3129 for HTTP traffic and 3130 for HTTPS. Because Squid cannot directly listen to 80 and 443, you have to redirect the incoming requests from instances in the private subnets to the Squid ports using iptables. You do not have to enable IP forwarding or add any FORWARD rule, as you would do with a standard NAT instance.

The solution stores the files squid.conf and whitelist.txt in an Amazon Simple Storage Service (S3) bucket and runs the following script every minute on the Squid instances to download and update the Squid configuration from S3. This makes it easy to maintain the Squid configuration from a central location. Note that it first validates the files with squid -k parse and then reload the configuration with squid -k reconfigure if no error was found.

The solution then uses the CloudWatch Agent on the Squid instances to collect and store Squid logs in Amazon CloudWatch Logs. The log group /filtering-nat-instance/cache.log contains the error and debug messages that Squid generates and /filtering-nat-instance/access.log contains the access logs.

An access log record is a space-delimited string that has the following format:

Designing a high availability solution

The Squid instances introduce a single point of failure for the private subnets. If a Squid instance fails, the instances in its associated private subnet cannot send outbound traffic anymore. The following diagram illustrates the architecture that I propose to address this situation within an Availability Zone.

Figure 2: The architecture to address if a Squid instance fails within an Availability Zone

The solution uses the CloudWatch Agent and its procstat plugin to collect the CPU usage of the Squid process every 10 seconds. For each Squid instance, the solution creates a CloudWatch alarm that watches this custom metric and goes to an ALARM state when a data point is missing. This can happen, for example, when Squid crashes or the Squid instance fails. Note that for my use case, I consider watching the Squid process a sufficient approach to determining the health status of a Squid instance, although it cannot detect eventual cases of the Squid process being alive but unable to forward traffic. As a workaround, you can use an end-to-end monitoring approach, like using witness instances in the private subnets to send test requests at regular intervals and collect the custom metric.

When an alarm goes to ALARM state, CloudWatch sends a notification to an Amazon Simple Notification Service (SNS) topic which then triggers an AWS Lambda function. The Lambda function marks the Squid instance as unhealthy in its Auto Scaling group, retrieves the list of healthy Squid instances based on the state of other CloudWatch alarms, and updates the route tables that currently route traffic to the unhealthy Squid instance to instead route traffic to the first available healthy Squid instance. While the Auto Scaling group automatically replaces the unhealthy Squid instance, private instances can send outbound traffic through the Squid instance in the other Availability Zone.

When the CloudWatch agent starts collecting the custom metric again on the replacement Squid instance, the alarm reverts to OK state. Similarly, CloudWatch sends a notification to the SNS topic, which then triggers the Lambda function. The Lambda function completes the lifecycle action (see Amazon EC2 Auto Scaling Lifecycle Hooks) to indicate that the replacement instance is ready to serve traffic, and updates the route table associated to the private subnet in the same availability zone to route traffic to the replacement instance.

Implementing and testing the solution

Now that you understand the architecture behind this solution, you can follow the instructions in this section to implement and test the solution in your AWS account.

Implementing the solution

First, you’ll use AWS CloudFormation to provision the required resources. Select the Launch Stack button below to open the CloudFormation console and create a stack from the template. Then, follow the on-screen instructions.

Three route tables. The first route table is associated to the public subnets to make them publicly accessible. The other two route tables are associated to the private subnets.

An S3 bucket to store the Squid configuration files, and two Lambda-based custom resources to add the files squid.conf and whitelist.txt to this bucket.

An IAM role to grant the Squid instances permissions to read from the S3 bucket and use the CloudWatch agent.

A security group to allow HTTP and HTTPS traffic from instances in the private subnets.

A launch configuration to specify the template of Squid instances. That includes commands to run at startup for automating the initial configuration.

Two Auto Scaling groups that use this launch configuration to launch the Squid instances.

A Lambda function to redirect the outbound traffic and recover a Squid instance when it fails.

Two CloudWatch alarms to watch the custom metric sent by Squid instances and trigger the Lambda function when the health status of Squid instances changes.

An EC2 instance in the first private subnet to test the solution, and an IAM role to grant this instance permissions to use the SSM agent. Session Manager, which I introduce in the next paragraph, uses this SSM agent (see Working with SSM Agent)

Testing the solution

After the stack creation has completed (it can take up to 10 minutes), connect onto the Testing Instance using Session Manager, a capability of AWS Systems Manager that lets you manage instances through an interactive shell without the need to open an SSH port:

For Target instances, choose the option button to the left of Testing Instance.

Choose Start Session.

Note: Session Manager makes calls to several AWS endpoints (see Working with SSM Agent). If you prefer to restrict access to a defined set of AWS services, make sure to whitelist the associated domains.

After the connection is made, you can test the solution with the following commands. Only the last three requests should return a valid response, because Squid allows traffic to *.amazonaws.com only.

For Log Groups, choose the log group /filtering-nat-instance/access.log.

Choose Search Log Group to view and search log records.

To test how the solution behaves when a Squid instance fails, you can terminate one of the Squid instances manually in the Amazon EC2 console. Then, watch the CloudWatch alarm change its state in the Amazon CloudWatch console, or watch the solution change the default route of the impacted route table in the Amazon VPC console.

You can now delete the CloudFormation stack to clean up the resources that were just created.

Discussion: Transparent or forward proxy?

The solution that I describe in this blog is fully transparent for instances in the private subnets, which means that instances don’t need to be aware of the proxy and can make requests as if they were behind a standard NAT instance. An alternate solution is to deploy a forward proxy in your Amazon VPC and configure instances in private subnets to use it (see the blog post How to set up an outbound VPC proxy with domain whitelisting and content filtering for an example). In this section, I discuss some of the differences between the two solutions.

Supportability

A major drawback with forward proxies is that the proxy must be explicitly configured on every instance within the private subnets. For example, you can configure the HTTP_PROXY and HTTPS_PROXY environment variables on Linux instances, but some applications or services, like yum, require their own proxy configuration, or don’t support proxy usage. Note also that some AWS services and features, like Amazon EMR or Amazon SageMaker notebook instances, don’t support using a forward proxy at the time of this post. However, with TLS 1.3, a forward proxy is the only option to restrict outbound traffic if the SNI is encrypted.

Scalability

Deploying a forward proxy on AWS usually consists of a load balancer distributing traffic to a set of proxy instances launched in an Auto Scaling group. Proxy instances can be launched or terminated dynamically depending on the demand (also known as “horizontal scaling”). With forward proxies, each route table can route traffic to a single instance at a time, and changing the type of the instance is the only way to increase or decrease the capacity (also known as “vertical scaling”).

The solution I present in this post does not dynamically adapt the instance type of the Squid instances based on the demand. However, you might consider a mechanism in which the traffic from a private subnet is temporarily redirected through another Availability Zone while the Squid instance is being relaunched by Auto Scaling with a smaller or larger instance type.

Mutualization

Deploying a centralized proxy solution and using it across multiple VPCs is a way of reducing cost and operational complexity.

With a forward proxy, instances in private subnets send IP packets to the proxy load balancer. Therefore, sharing a forward proxy across multiple VPCs only requires connectivity between the “instance VPCs” and a proxy VPC that has VPC Peering or equivalent capabilities.

With a transparent proxy, instances in private subnets sends IP packets to the remote host. VPC Peering does not support transitive routing (see Unsupported VPC Peering Configurations) and cannot be used to share a transparent proxy across multiple VPCs. However, you can now use an AWS Transit Gateway that acts as a network transit hub to share a transparent proxy across multiple VPCs. I give an example in the next section.

Sharing the solution across multiple VPCs using AWS Transit Gateway

In this section, I give an example of how to share a transparent proxy across multiple VPCs using AWS Transit Gateway. The architecture is illustrated in the following diagram. For the sake of simplicity, the diagram does not include Availability Zones.

Figure 3: The architecture for a transparent proxy across multiple VPCs using AWS Transit Gateway

Here’s how instances in the private subnet of “VPC App” can make requests via the shared transparent proxy in “VPC Shared:”

When instances in VPC App make HTTP/S requests, the network packets they send have the public IP address of the remote host as the destination address. These packets are forwarded to the transit gateway, based on the route table associated to the private subnet.

The transit gateway receives the packets and forwards them to VPC Shared, based on the default route of the transit gateway route table.

Note that the transit gateway attachment resides in the transit gateway subnet. When the packets arrive in VPC Shared, they are forwarded to the Squid instance because the next destination has been determined based on the route table associated to the transit gateway subnet.

The Squid instance makes requests on behalf of the source instance (“Instances” in the schema). Then, it sends the response to the source instance. The packets that it emits have the IP address of the source instance as the destination address and are forwarded to the transit gateway according to the route table associated to the public subnet.

The transit gateway receives and forwards the response packets to VPC App.

Finally, the response reaches the source instance.

In a high availability deployment, you could have one transit gateway subnet per Availability Zone that sends traffic to the Squid instance that resides in the same Availability Zone, or to the Squid instance in another Availability Zone if the instance in the same Availability Zone fails.

You could also use AWS Transit Gateway to implement a transparent proxy solution that scales horizontally. This allows you to add or remove proxy instances based on the demand, instead of changing the instance type. With this approach, you must deploy a fleet of proxy instances – launched by an Auto Scaling group, for example – and mount a VPN connection between each instance and the transit gateway. The proxy instances need to support ECMP (“Equal Cost Multipath routing”; see Transit Gateways) to equally spread the outbound traffic between instances. I don’t describe this alternative architecture further in this blog post.

Conclusion

In this post, I’ve shown how you can use Squid to implement a high availability solution that filters outgoing traffic to the Internet and helps meet your security and compliance needs, while being fully transparent for the back-end instances in your VPC. I’ve also discussed the key differences between transparent proxies and forward proxies. Finally, I gave an example of how to share a transparent proxy solution across multiple VPCs using AWS Transit Gateway.

If you have any questions or suggestions, please leave a comment below or on the Amazon VPC forum.

If you have feedback about this blog post, submit comments in the Comments section below.

Nicolas Malaval

Nicolas is a Solution Architect for Amazon Web Services. He lives in Paris and helps our healthcare customers in France adopt cloud technology and innovate with AWS. Before that, he spent three years as a Consultant for AWS Professional Services, working with enterprise customers.

Tags

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.