Industry News & Blogs

To help keep you up-to-date with the latest news and ideas from the industry, we have compiled the latest articles from industry leaders and corporate blogs. New content is pulled hourly from each blog's RSS feed. The article links will take you directly to the related blog.

You are here

CloudFlare Blog

Cloudflare’s global network currently spans 193 cities across 90+ countries. With over 20 million Internet properties on our network, we increase the security, performance, and reliability of large portions of the Internet every time we add a location.Expanding Network to New CitiesSo far in 2019, we’ve added a score of new locations: Amman, Antananarivo*, Arica*, Asunción, Baku, Bengaluru, Buffalo, Casablanca, Córdoba*, Cork, Curitiba, Dakar*, Dar es Salaam, Fortaleza, Geneva, Göteborg, Guatemala City, Hyderabad, Kigali, Kolkata, Male*, Maputo, Nagpur, Neuquén*, Nicosia, Nouméa, Ottawa, Port-au-Prince, Porto Alegre, Querétaro, Ramallah, and Thessaloniki.Our Humble BeginningsWhen Cloudflare launched in 2010, we focused on putting servers at the Internet’s crossroads: large data centers with key connections, like the Amsterdam Internet Exchange and Equinix Ashburn. This not only provided the most value to the most people at once but was also easier to manage by keeping our servers in the same buildings as all the local ISPs, server providers, and other people they needed to talk to streamline our services. This is a great approach for bootstrapping a global network, but we’re obsessed with speed in general. There are over five hundred cities in the world with over one million inhabitants, but only a handful of them have the kinds of major Internet exchanges that we targeted. Our goal as a company is to help make a better Internet for all, not just those lucky enough to live in areas with affordable and easily-accessible interconnection points. However, we ran up against two broad, nasty problems: a) running out of major Internet exchanges and b) latency still wasn’t as low as we wanted. Clearly, we had to start scaling in new ways.One of our first big steps was entering into partnerships around the world with local ISPs, who have many of the same problems we do: ISPs want to save money and provide fast Internet to their customers, but they often don’t have a major Internet exchange nearby to connect to. Adding Cloudflare equipment to their infrastructure effectively brought more of the Internet closer to them. We help them speed up millions of Internet properties while reducing costs by serving traffic locally. Additionally, since all of our servers are designed to support all our products, a relatively small physical footprint can also provide security, performance, reliability, and more.Upgrading Capacity in Existing CitiesThough it may be obvious and easy to overlook, continuing to build out existing locations is also a key facet of building a global network. This year, we have significantly increased the computational capacity at the edge of our network. Additionally, by making it easier to interconnect with Cloudflare, we have increased the number of unique networks directly connected with us to over 8,000. This makes for a faster, more reliable Internet experience for the >1 billion IPs that we see daily.To make these capacity upgrades possible for our customers, efficient infrastructure deployment has been one of our keys to success. We want our infrastructure deployment to be targeted and flexible.Targeted DeploymentThe next Cloudflare customer through our door could be a small restaurant owner on a Pro plan with thousands of monthly pageviews or a fast-growing global tech company like Discord. As a result, we need to always stay one step ahead and synthesize a lot of data all at once for our customers.To accommodate this expansion, our Capacity Planning team is learning new ways to optimize our servers. One key strategy is targeting exactly where to send our servers. However, staying on top of everything isn’t easy - we are a global anycast network, which introduces unpredictability as to where incoming traffic goes. To make things even more difficult, each city can contain as many as five distinct deployments. Planning isn’t just a question of what city to send servers to, it’s one of which address.To make sense of it all, we tackle the problem with simulations. Some, but not all, of the variables we model include historical traffic growth rates, foreseeable anomalous spikes (e.g., Cyber Day in Chile), and consumption states from our live deal pipeline, as well as product costs, user growth, end-customer adoption. We also add in site reliability, potential for expansion, and expected regional expansion and partnerships, as well as strategic priorities and, of course, feedback from our fantastic Systems Reliability Engineers.Flexible Supply ChainKnowing where to send a server is only the first challenge of many when it comes to a global network. Just like our user base, our supply chain must span the entire world while also staying flexible enough to quickly react to time constraints, pricing changes including taxes and tariffs, import/export restrictions and required certifications - not to mention local partnerships many more dynamic location-specific variables. Even more reason we have to stay quick on our feet, there will always be unforeseen roadblocks and detours even in the most well-prepared plans. For example, a planned expansion in our Prague location might warrant an expanded presence in Vienna for failover.Once servers arrive at our data centers, our Data Center Deployment and Technical Operations teams work with our vendors and on-site data center personnel (our “Remote Hands” and “Smart Hands”) to install the physical server, manage the cabling, and handle other early-stage provisioning processes.Our architecture, which is designed so that every server can support every service, makes it easier to withstand hardware failures and efficiently load balance workloads between equipment and between locations.Join Our TeamIf working at a rapidly expanding, globally diverse company interests you, we’re hiring for scores of positions, including in the Infrastructure group. If you want to help increase hardware efficiency, deploy and maintain servers, work on our supply chain, or strengthen ISP partnerships, get in touch.*Represents cities where we have data centers with active Internet ports and where we are configuring our servers to handle traffic for more customers (at the time of publishing)

Today, we're open-sourcing an exciting project that showcases the strengths of our Cloudflare Workers platform: workers-graphql-server is a batteries-included Apollo GraphQL server, designed to get you up and running quickly with GraphQL.Testing GraphQL queries in the GraphQL PlaygroundAs a full-stack developer, I’m really excited about GraphQL. I love building user interfaces with React, but as a project gets more complex, it can become really difficult to manage how your data is managed inside of an application. GraphQL makes that really easy - instead of having to recall the REST URL structure of your backend API, or remember when your backend server doesn't quite follow REST conventions - you just tell GraphQL what data you want, and it takes care of the rest.Cloudflare Workers is uniquely suited as a platform to being an incredible place to host a GraphQL server. Because your code is running on Cloudflare's servers around the world, the average latency for your requests is extremely low, and by using Wrangler, our open-source command line tool for building and managing Workers projects, you can deploy new versions of your GraphQL server around the world within seconds.If you'd like to try the GraphQL server, check out a demo GraphQL playground, deployed on Workers.dev. This optional add-on to the GraphQL server allows you to experiment with GraphQL queries and mutations, giving you a super powerful way to understand how to interface with your data, without having to hop into a codebase.If you're ready to get started building your own GraphQL server with our new open-source project, we've added a new tutorial to our Workers documentation to help you get up and running - check it out here!Finally, if you're interested in how the project works, or want to help contribute - it's open-source! We'd love to hear your feedback and see your contributions. Check out the project on GitHub.

Today, multiple Denial of Service (DoS) vulnerabilities were disclosed for a number of HTTP/2 server implementations. Cloudflare uses NGINX for HTTP/2. Customers using Cloudflare are already protected against these attacks.The individual vulnerabilities, originally discovered by Netflix and are included in this announcement are:CVE-2019-9511 HTTP/2 Data DribbleCVE-2019-9512 HTTP/2 Ping FloodCVE-2019-9513 HTTP/2 Resource LoopCVE-2019-9514 HTTP/2 Reset FloodCVE-2019-9515 HTTP/2 Settings FloodCVE-2019-9516 HTTP/2 0-Length Headers LeakCVE-2019-9518 HTTP/2 Request Data/Header FloodAs soon as we became aware of these vulnerabilities, Cloudflare’s Protocols team started working on fixing them. We first pushed a patch to detect any attack attempts and to see if any normal traffic would be affected by our mitigations. This was followed up with work to mitigate these vulnerabilities; we pushed the changes out few weeks ago and continue to monitor similar attacks on our stack.If any of our customers host web services over HTTP/2 on an alternative, publicly accessible path that is not behind Cloudflare, we recommend you apply the latest security updates to your origin servers in order to protect yourselves from these HTTP/2 vulnerabilities.We will soon follow up with more details on these vulnerabilities and how we mitigated them.Full credit for the discovery of these vulnerabilities goes to Jonathan Looney of Netflix and Piotr Sikora of Google and the Envoy Security Team.

Today we’re excited to announce Cloudflare Magic Transit. Magic Transit provides secure, performant, and reliable IP connectivity to the Internet. Out-of-the-box, Magic Transit deployed in front of your on-premise network protects it from DDoS attack and enables provisioning of a full suite of virtual network functions, including advanced packet filtering, load balancing, and traffic management tools. Magic Transit is built on the standards and networking primitives you are familiar with, but delivered from Cloudflare’s global edge network as a service. Traffic is ingested by the Cloudflare Network with anycast and BGP, announcing your company’s IP address space and extending your network presence globally. Today, our anycast edge network spans 193 cities in more than 90 countries around the world. Once packets hit our network, traffic is inspected for attacks, filtered, steered, accelerated, and sent onward to the origin. Magic Transit will connect back to your origin infrastructure over Generic Routing Encapsulation (GRE) tunnels, private network interconnects (PNI), or other forms of peering. Enterprises are often forced to pick between performance and security when deploying IP network services. Magic Transit is designed from the ground up to minimize these trade-offs: performance and security are better together. Magic Transit deploys IP security services across our entire global network. This means no more diverting traffic to small numbers of distant “scrubbing centers” or relying on on-premise hardware to mitigate attacks on your infrastructure.We’ve been laying the groundwork for Magic Transit for as long as Cloudflare has been in existence, since 2010. Scaling and securing the IP network Cloudflare is built on has required tooling that would have been impossible or exorbitantly expensive to buy. So we built the tools ourselves! We grew up in the age of software-defined networking and network function virtualization, and the principles behind these modern concepts run through everything we do.When we talk to our customers managing on-premise networks, we consistently hear a few things: building and managing their networks is expensive and painful, and those on-premise networks aren’t going away anytime soon. Traditionally, CIOs trying to connect their IP networks to the Internet do this in two steps:Source connectivity to the Internet from transit providers (ISPs).Purchase, operate, and maintain network function specific hardware appliances. Think hardware load balancers, firewalls, DDoS mitigation equipment, WAN optimization, and more.Each of these boxes costs time and money to maintain, not to mention the skilled, expensive people required to properly run them. Each additional link in the chain makes a network harder to manage.This all sounded familiar to us. We had an aha! moment: we had the same issues managing our datacenter networks that power all of our products, and we had spent significant time and effort building solutions to those problems. Now, nine years later, we had a robust set of tools we could turn into products for our own customers.Magic Transit aims to bring the traditional datacenter hardware model into the cloud, packaging transit with all the network “hardware” you might need to keep your network fast, reliable, and secure. Once deployed, Magic Transit allows seamless provisioning of virtualized network functions, including routing, DDoS mitigation, firewalling, load balancing, and traffic acceleration services.Magic Transit is your network’s on-ramp to the InternetMagic Transit delivers its connectivity, security, and performance benefits by serving as the “front door” to your IP network. This means it accepts IP packets destined for your network, processes them, and then outputs them to your origin infrastructure.Connecting to the Internet via Cloudflare offers numerous benefits. Starting with the most basic, Cloudflare is one of the most extensively connected networks on the Internet. We work with carriers, Internet exchanges, and peering partners around the world to ensure that a bit placed on our network will reach its destination quickly and reliably, no matter the destination.An example deployment: Acme CorpLet’s walk through how a customer might deploy Magic Transit. Customer Acme Corp. owns the IP prefix 203.0.113.0/24, which they use to address a rack of hardware they run in their own physical datacenter. Acme currently announces routes to the Internet from their customer-premise equipment (CPE, aka a router at the perimeter of their datacenter), telling the world 203.0.113.0/24 is reachable from their autonomous system number, AS64512. Acme has DDoS mitigation and firewall hardware appliances on-premise.Acme wants to connect to the Cloudflare Network to improve the security and performance of their own network. Specifically, they’ve been the target of distributed denial of service attacks, and want to sleep soundly at night without relying on on-premise hardware. This is where Cloudflare comes in.Deploying Magic Transit in front of their network is simple:Cloudflare uses Border Gateway Protocol (BGP) to announce Acme’s 203.0.113.0/24 prefix from Cloudflare’s edge, with Acme’s permission.Cloudflare begins ingesting packets destined for the Acme IP prefix.Magic Transit applies DDoS mitigation and firewall rules to the network traffic. After it is ingested by the Cloudflare network, traffic that would benefit from HTTPS caching and WAF inspection can be “upgraded” to our Layer 7 HTTPS pipeline without incurring additional network hops.Acme would like Cloudflare to use Generic Routing Encapsulation (GRE) to tunnel traffic back from the Cloudflare Network back to Acme’s datacenter. GRE tunnels are initiated from anycast endpoints back to Acme’s premise. Through the magic of anycast, the tunnels are constantly and simultaneously connected to hundreds of network locations, ensuring the tunnels are highly available and resilient to network failures that would bring down traditionally formed GRE tunnels.Cloudflare egresses packets bound for Acme over these GRE tunnels. Let’s dive deeper on how the DDoS mitigation included in Magic Transit works.Magic Transit protects networks from DDoS attackCustomers deploying Cloudflare Magic Transit instantly get access to the same IP-layer DDoS protection system that has protected the Cloudflare Network for the past 9 years. This is the same mitigation system that stopped a 942Gbps attack dead in its tracks, in seconds. This is the same mitigation system that knew how to stop memcached amplification attacks days before a 1.3Tbps attack took down Github, which did not have Cloudflare watching its back. This is the same mitigation we trust every day to protect Cloudflare, and now it protects your network.Cloudflare has historically protected Layer 7 HTTP and HTTPS applications from attacks at all layers of the OSI Layer model. The DDoS protection our customers have come to know and love relies on a blend of techniques, but can be broken into a few complementary defenses:Anycast and a network presence in 193 cities around the world allows our network to get close to users and attackers, allowing us to soak up traffic close to the source without introducing significant latency.30+Tbps of network capacity allows us to soak up a lot of traffic close to the source. Cloudflare's network has more capacity to stop DDoS attacks than that of Akamai Prolexic, Imperva, Neustar, and Radware — combined.Our HTTPS reverse proxy absorbs L3 (IP layer) and L4 (TCP layer) attacks by terminating connections and re-establishing them to the origin. This stops most spurious packet transmissions from ever getting close to a customer origin server.Layer 7 mitigations and rate limiting stop floods at the HTTPS application layer.Looking at the above description carefully, you might notice something: our reverse proxy servers protect our customers by terminating connections, but our network and servers still get slammed by the L3 and 4 attacks we stop on behalf of our customers. How do we protect our own infrastructure from these attacks?Enter Gatebot!Gatebot is a suite of software running on every one of our servers inside each of our datacenters in the 193 cities we operate, constantly analyzing and blocking attack traffic. Part of Gatebot’s beauty is its simple architecture; it sits silently, in wait, sampling packets as they pass from the network card into the kernel and onward into userspace. Gatebot does not have a learning or warm-up period. As soon as it detects an attack, it instructs the kernel of the machine it is running on to drop the packet, log its decision, and move on.Historically, if you wanted to protect your network from a DDoS attack, you might have purchased a specialized piece of hardware to sit at the perimeter of your network. This hardware box (let’s call it “The DDoS Protection Box”) would have been fantastically expensive, pretty to look at (as pretty as a 2U hardware box could get), and required a ton of recurring effort and money to stay on its feet, keep its licence up to date, and keep its attack detection system accurate and trained. For one thing, it would have to be carefully monitored to make sure it was stopping attacks but not stopping legitimate traffic. For another, if an attacker managed to generate enough traffic to saturate your datacenter’s transit links to the Internet, you were out of luck; no box sitting inside your datacenter can protect you from an attack generating enough traffic to congest the links running from the outside world to the datacenter itself.Early on, Cloudflare considered buying The DDoS Protection Box(es) to protect our various network locations, but ruled them out quickly. Buying hardware would have incurred substantial cost and complexity. In addition, buying, racking, and managing specialized pieces of hardware makes a network hard to scale. There had to be a better way. We set out to solve this problem ourselves, starting from first principles and modern technology.To make our modern approach to DDoS mitigation work, we had to invent a suite of tools and techniques to allow us to do ultra-high performance networking on a generic x86 server running Linux. At the core of our network data plane is the eXpress Data Path (XDP) and the extended Berkeley Packet Filter (eBPF), a set of APIs that allow us to build ultra-high performance networking applications in the Linux kernel. My colleagues have written extensively about how we use XDP and eBPF to stop DDoS attacks:L4Drop: XDP DDoS Mitigationsxdpcap: XDP Packet CaptureXDP based DoS mitigation presentationXDP in practice: integrating XDP into our DDoS mitigation pipeline (PDF)Cloudflare architecture and how BPF eats the worldAt the end of the day, we ended up with a DDoS mitigation system that:Is delivered by our entire network, spread across 193 cities around the world. To put this another way, our network doesn’t have the concept of “scrubbing centers” — every single one of our network locations is always mitigating attacks, all the time. This means faster attack mitigation and minimal latency impact for your users.Has exceptionally fast times to mitigate, with most attacks mitigated in 10s or less.Was built in-house, giving us deep visibility into its behavior and the ability to rapidly develop new mitigations as we see new attack types.Is deployed as a service, and is horizontally scalable. Adding x86 hardware running our DDoS mitigation software stack to a datacenter (or adding another network location) instantly brings more DDoS mitigation capacity online.Gatebot is designed to protect Cloudflare infrastructure from attack. And today, as part of Magic Transit, customers operating their own IP networks and infrastructure can rely on Gatebot to protect their own network.Magic Transit puts your network hardware in the cloudWe’ve covered how Cloudflare Magic Transit connects your network to the Internet, and how it protects you from DDoS attack. If you were running your network the old-fashioned way, this is where you’d stop to buy firewall hardware, and maybe another box to do load balancing. With Magic Transit, you don’t need those boxes. We have a long track record of delivering common network functions (firewalls, load balancers, etc.) as services. Up until this point, customers deploying our services have relied on DNS to bring traffic to our edge, after which our Layer 3 (IP), Layer 4 (TCP & UDP), and Layer 7 (HTTP, HTTPS, and DNS) stacks take over and deliver performance and security to our customers. Magic Transit is designed to handle your entire network, but does not enforce a one-size-fits-all approach to what services get applied to which portion of your traffic. To revisit Acme, our example customer from above, they have brought 203.0.113.0/24 to the Cloudflare Network. This represents 256 IPv4 addresses, some of which (eg 203.0.113.8/30) might front load balancers and HTTP servers, others mail servers, and others still custom UDP-based applications. Each of these sub-ranges may have different security and traffic management requirements. Magic Transit allows you to configure specific IP addresses with their own suite of services, or apply the same configuration to large portions (or all) of your block. Taking the above example, Acme may wish that the 203.0.113.8/30 block containing HTTP services fronted by a traditional hardware load balancer instead deploy the Cloudflare Load Balancer, and also wants HTTP traffic analyzed with Cloudflare’s WAF and content cached by our CDN. With Magic Transit, deploying these network functions is straight-forward — a few clicks in our dashboard or API calls will have your traffic handled at a higher layer of network abstraction, with all the attendant goodies applying application level load balancing, firewall, and caching logic bring.This is just one example of a deployment customers might pursue. We’ve worked with several who just want pure IP passthrough, with DDoS mitigation applied to specific IP addresses. Want that? We got you!Magic Transit runs on the entire Cloudflare Global Network. Or, no more scrubs!When you connect your network to Cloudflare Magic Transit, you get access to the entire Cloudflare network. This means all of our network locations become your network locations. Our network capacity becomes your network capacity, at your disposal to power your experiences, deliver your content, and mitigate attacks on your infrastructure. How expansive is the Cloudflare Network? We’re in 193 cities worldwide, with more than 30Tbps of network capacity spread across them. Cloudflare operates within 100 milliseconds of 98% of the Internet-connected population in the developed world, and 93% of the Internet-connected population globally (for context, the blink of an eye is 300-400 milliseconds).Areas of the globe within 100 milliseconds of a Cloudflare datacenter.Just as we built our own products in house, we also built our network in house. Every product runs in every datacenter, meaning our entire network delivers all of our services. This might not have been the case if we had assembled our product portfolio piecemeal through acquisition, or not had completeness of vision when we set out to build our current suite of services.The end result for customers of Magic Transit: a network presence around the globe as soon you come on board. Full access to a diverse set of services worldwide. All delivered with latency and performance in mind.We'll be sharing a lot more technical detail on how we deliver Magic Transit in the coming weeks and months.Magic Transit lowers total cost of ownershipTraditional network services don’t come cheap; they require high capital outlays up front, investment in staff to operate, and ongoing maintenance contracts to stay functional. Just as our product aims to be disruptive technically, we want to disrupt traditional network cost-structures as well. Magic Transit is delivered and billed as a service. You pay for what you use, and can add services at any time. Your team will thank you for its ease of management; your management will thank you for its ease of accounting. That sounds pretty good to us!Magic Transit is available todayWe’ve worked hard over the past nine years to get our network, management tools, and network functions as a service into the state they’re in today. We’re excited to get the tools we use every day in customers’ hands.So that brings us to naming. When we showed this to customers the most common word they used was ‘whoa.’ When we pressed what they meant by that they almost all said: ‘It’s so much better than any solution we’ve seen before. It’s, like, magic!’ So it seems only natural, if a bit cheesy, that we call this product what it is: Magic Transit.We think this is all pretty magical, and think you will too. Contact our Enterprise Sales Team today.

Today we announced Cloudflare Magic Transit, which makes Cloudflare’s network available to any IP traffic on the Internet. Up until now, Cloudflare has primarily operated proxy services: our servers terminate HTTP, TCP, and UDP sessions with Internet users and pass that data through new sessions they create with origin servers. With Magic Transit, we are now also operating at the IP layer: in addition to terminating sessions, our servers are applying a suite of network functions (DoS mitigation, firewalling, routing, and so on) on a packet-by-packet basis.Over the past nine years, we’ve built a robust, scalable global network that currently spans 193 cities in over 90 countries and is ever growing. All Cloudflare customers benefit from this scale thanks to two important techniques. The first is anycast networking. Cloudflare was an early adopter of anycast, using this routing technique to distribute Internet traffic across our data centers. It means that any data center can handle any customer’s traffic, and we can spin up new data centers without needing to acquire and provision new IP addresses. The second technique is homogeneous server architecture. Every server in each of our edge data centers is capable of running every task. We build our servers on commodity hardware, making it easy to quickly increase our processing capacity by adding new servers to existing data centers. Having no specialty hardware to depend on has also led us to develop an expertise in pushing the limits of what’s possible in networking using modern Linux kernel techniques.Magic Transit is built on the same network using the same techniques, meaning our customers can now run their network functions at Cloudflare scale. Our fast, secure, reliable global edge becomes our customers’ edge. To explore how this works, let’s follow the journey of a packet from a user on the Internet to a Magic Transit customer’s network.Putting our DoS mitigation to work… for you!In the announcement blog post we describe an example deployment for Acme Corp. Let’s continue with this example here. When Acme brings their IP prefix 203.0.113.0/24 to Cloudflare, we start announcing that prefix to our transit providers, peers, and to Internet exchanges in each of our data centers around the globe. Additionally, Acme stops announcing the prefix to their own ISPs. This means that any IP packet on the Internet with a destination address within Acme’s prefix is delivered to a nearby Cloudflare data center, not to Acme’s router.Let’s say I want to access Acme’s FTP server on 203.0.113.100 from my computer in Cloudflare’s office in Champaign, IL. My computer generates a TCP SYN packet with destination address 203.0.113.100 and sends it out to the Internet. Thanks to anycast, that packet ends up at Cloudflare’s data center in Chicago, which is the closest data center (in terms of Internet routing distance) to Champaign. The packet arrives on the data center’s router, which uses ECMP (Equal Cost Multi-Path) routing to select which server should handle the packet and dispatches the packet to the selected server.Once at the server, the packet flows through our XDP- and iptables-based DoS detection and mitigation functions. If this TCP SYN packet were determined to be part of an attack, it would be dropped and that would be the end of it. Fortunately for me, the packet is permitted to pass.So far, this looks exactly like any other traffic on Cloudflare’s network. Because of our expertise in running a global anycast network we’re able to attract Magic Transit customer traffic to every data center and apply the same DoS mitigation solution that has been protecting Cloudflare for years. Our DoS solution has handled some of the largest attacks ever recorded, including a 942Gbps SYN flood in 2018. Below is a screenshot of a recent SYN flood of 300M packets per second. Our architecture lets us scale to stop the largest attacks.Network namespaces for isolation and controlThe above looked identical to how all other Cloudflare traffic is processed, but this is where the similarities end. For our other services, the TCP SYN packet would now be dispatched to a local proxy process (e.g. our nginx-based HTTP/S stack). For Magic Transit, we instead want to dynamically provision and apply customer-defined network functions like firewalls and routing. We needed a way to quickly spin up and configure these network functions while also providing inter-network isolation. For that, we turned to network namespaces.Namespaces are a collection of Linux kernel features for creating lightweight virtual instances of system resources that can be shared among a group of processes. Namespaces are a fundamental building block for containerization in Linux. Notably, Docker is built on Linux namespaces. A network namespace is an isolated instance of the Linux network stack, including its own network interfaces (with their own eBPF hooks), routing tables, netfilter configuration, and so on. Network namespaces give us a low-cost mechanism to rapidly apply customer-defined network configurations in isolation, all with built-in Linux kernel features so there’s no performance hit from userspace packet forwarding or proxying.When a new customer starts using Magic Transit, we create a brand new network namespace for that customer on every server across our edge network (did I mention that every server can run every task?). We built a daemon that runs on our servers and is responsible for managing these network namespaces and their configurations. This daemon is constantly reading configuration updates from Quicksilver, our globally distributed key-value store, and applying customer-defined configurations for firewalls, routing, etc, inside the customer’s namespace. For example, if Acme wants to provision a firewall rule to allow FTP traffic (TCP ports 20 and 21) to 203.0.113.100, that configuration is propagated globally through Quicksilver and the Magic Transit daemon applies the firewall rule by adding an nftables rule to the Acme customer namespace:# Apply nftables rule inside Acme’s namespace
$ sudo ip netns exec acme_namespace nft add rule inet filter prerouting ip daddr 203.0.113.100 tcp dport 20-21 accept
Getting the customer’s traffic to their network namespace requires a little routing configuration in the default network namespace. When a network namespace is created, a pair of virtual ethernet (veth) interfaces is also created: one in the default namespace and one in the newly created namespace. This interface pair creates a “virtual wire” for delivering network traffic into and out of the new network namespace. In the default network namespace, we maintain a routing table that forwards Magic Transit customer IP prefixes to the veths corresponding to those customers’ namespaces. We use iptables to mark the packets that are destined for Magic Transit customer prefixes, and we have a routing rule that specifies that these specially marked packets should use the Magic Transit routing table.(Why go to the trouble of marking packets in iptables and maintaining a separate routing table? Isolation. By keeping Magic Transit routing configurations separate we reduce the risk of accidentally modifying the default routing table in a way that affects how non-Magic Transit traffic flows through our edge.)Network namespaces provide a lightweight environment where a Magic Transit customer can run and manage network functions in isolation, letting us put full control in the customer’s hands.GRE + anycast = magicAfter passing through the edge network functions, the TCP SYN packet is finally ready to be delivered back to the customer’s network infrastructure. Because Acme Corp. does not have a network footprint in a colocation facility with Cloudflare, we need to deliver their network traffic over the public Internet.This poses a problem. The destination address of the TCP SYN packet is 203.0.113.100, but the only network announcing the IP prefix 203.0.113.0/24 on the Internet is Cloudflare. This means that we can’t simply forward this packet out to the Internet—it will boomerang right back to us! In order to deliver this packet to Acme we need to use a technique called tunneling.Tunneling is a method of carrying traffic from one network over another network. In our case, it involves encapsulating Acme’s IP packets inside of IP packets that can be delivered to Acme’s router over the Internet. There are a number of common tunneling protocols, but Generic Routing Encapsulation (GRE) is often used for its simplicity and widespread vendor support.GRE tunnel endpoints are configured both on Cloudflare’s servers (inside of Acme’s network namespace) and on Acme’s router. Cloudflare servers then encapsulate IP packets destined for 203.0.113.0/24 inside of IP packets destined for a publicly-routable IP address for Acme’s router, which decapsulates the packets and emits them into Acme’s internal network.Now, I’ve omitted an important detail in the diagram above: the IP address of Cloudflare’s side of the GRE tunnel. Configuring a GRE tunnel requires specifying an IP address for each side, and the outer IP header for packets sent over the tunnel must use these specific addresses. But Cloudflare has thousands of servers, each of which may need to deliver packets to the customer through a tunnel. So how many Cloudflare IP addresses (and GRE tunnels) does the customer need to talk to? The answer: just one, thanks to the magic of anycast.Cloudflare uses anycast IP addresses for our GRE tunnel endpoints, meaning that any server in any data center is capable of encapsulating and decapsulating packets for the same GRE tunnel. How is this possible? Isn’t a tunnel a point-to-point link? The GRE protocol itself is stateless—each packet is processed independently and without requiring any negotiation or coordination between tunnel endpoints. While the tunnel is technically bound to an IP address it need not be bound to a specific device. Any device that can strip off the outer headers and then route the inner packet can handle any GRE packet sent over the tunnel. Actually, in the context of anycast the term “tunnel” is misleading since it implies a link between two fixed points. With Cloudflare’s Anycast GRE, a single “tunnel” gives you a conduit to every server in every data center on Cloudflare’s global edge.One very powerful consequence of Anycast GRE is that it eliminates single points of failure. Traditionally, GRE-over-Internet can be problematic because an Internet outage between the two GRE endpoints fully breaks the “tunnel”. This means reliable data delivery requires going through the headache of setting up and maintaining redundant GRE tunnels terminating at different physical sites and rerouting traffic when one of the tunnels breaks. But because Cloudflare is encapsulating and delivering customer traffic from every server in every data center, there is no single “tunnel” to break. This means Magic Transit customers can enjoy the redundancy and reliability of terminating tunnels at multiple physical sites while only setting up and maintaining a single GRE endpoint, making their jobs simpler.Our scale is now your scaleMagic Transit is a powerful new way to deploy network functions at scale. We’re not just giving you a virtual instance, we’re giving you a global virtual edge. Magic Transit takes the hardware appliances you would typically rack in your on-prem network and distributes them across every server in every data center in Cloudflare’s network. This gives you access to our global anycast network, our fleet of servers capable of running your tasks, and our engineering expertise building fast, reliable, secure networks. Our scale is now your scale.

Today we’re launching Certificate Transparency Monitoring (my summer project as an intern!) to help customers spot malicious certificates. If you opt into CT Monitoring, we’ll send you an email whenever a certificate is issued for one of your domains. We crawl all public logs to find these certificates quickly. CT Monitoring is available now in public beta and can be enabled in the Crypto Tab of the Cloudflare dashboard.BackgroundMost web browsers include a lock icon in the address bar. This icon is actually a button — if you’re a security advocate or a compulsive clicker (I’m both), you’ve probably clicked it before! Here’s what happens when you do just that in Google Chrome:This seems like good news. The Cloudflare blog has presented a valid certificate, your data is private, and everything is secure. But what does this actually mean?CertificatesYour browser is performing some behind-the-scenes work to keep you safe. When you request a website (say, cloudflare.com), the website should present a certificate that proves its identity. This certificate is like a stamp of approval: it says that your connection is secure. In other words, the certificate proves that content was not intercepted or modified while in transit to you. An altered Cloudflare site would be problematic, especially if it looked like the actual Cloudflare site. Certificates protect us by including information about websites and their owners.We pass around these certificates because the honor system doesn’t work on the Internet. If you want a certificate for your own website, just request one from a Certificate Authority (CA), or sign up for Cloudflare and we’ll do it for you! CAs issue certificates just as real-life notaries stamp legal documents. They confirm your identity, look over some data, and use their special status to grant you a digital certificate. Popular CAs include DigiCert, Let’s Encrypt, and Sectigo. This system has served us well because it has kept imposters in check, but also promoted trust between domain owners and their visitors.Unfortunately, nothing is perfect.It turns out that CAs make mistakes. In rare cases, they become reckless. When this happens, illegitimate certificates are issued (even though they appear to be authentic). If a CA accidentally issues a certificate for your website, but you did not request the certificate, you have a problem. Whoever received the certificate might be able to:Steal login credentials from your visitors.Interrupt your usual services by serving different content.These attacks do happen, so there’s good reason to care about certificates. More often, domain owners lose track of their certificates and panic when they discover unexpected certificates. We need a way to prevent these situations from ruining the entire system.Certificate TransparencyAh, Certificate Transparency (CT). CT solves the problem I just described by making all certificates public and easy to audit. When CAs issue certificates, they must submit certificates to at least two “public logs.” This means that collectively, the logs carry important data about all trusted certificates on the Internet. Several companies offer CT logs — Google has launched a few of its own. We announced Cloudflare's Nimbus log last year.Logs are really, really big, and often hold hundreds of millions of certificate records.The log infrastructure helps browsers validate websites’ identities. When you request cloudflare.com in Safari or Google Chrome, the browser will actually require Cloudflare’s certificate to be registered in a CT log. If the certificate isn’t found in a log, you won’t see the lock icon next to the address bar. Instead, the browser will tell you that the website you’re trying to access is not secure. Are you going to visit a website marked “NOT SECURE”? Probably not.There are systems that audit CT logs and report illegitimate certificates. Therefore, if your browser finds a valid certificate that is also trusted in a log, everything is secure.What We're Announcing TodayCloudflare has been an industry leader in CT. In addition to Nimbus, we launched a CT dashboard called Merkle Town and explained how we made it. Today, we’re releasing a public beta of Certificate Transparency Monitoring.If you opt into CT Monitoring, we’ll send you an email whenever a certificate is issued for one of your domains. When you get an alert, don’t panic; we err on the side of caution by sending alerts whenever a possible domain match is found. Sometimes you may notice a suspicious certificate. Maybe you won’t recognize the issuer, or the subdomain is not one you offer (e.g. slowinternet.cloudflare.com). Alerts are sent quickly so you can contact a CA if something seems wrong.This raises the question: if services already audit public logs, why are alerts necessary? Shouldn’t errors be found automatically? Well no, because auditing is not exhaustive. The best person to audit your certificates is you. You know your website. You know your personal information. Cloudflare will put relevant certificates right in front of you.You can enable CT Monitoring on the Cloudflare dashboard. Just head over to the Crypto Tab and find the “Certificate Transparency Monitoring” card. You can always turn the feature off if you’re too popular in the CT world.If you’re on a Business or Enterprise plan, you can tell us who to notify. Instead of emailing the zone owner (which we do for Free and Pro customers), we accept up to 10 email addresses as alert recipients. We do this to avoid overwhelming large teams. These emails do not have to be tied to a Cloudflare account and can be manually added or removed at any time.How This Actually WorksOur Cryptography and SSL teams worked hard to make this happen; they built on the work of some clever tools mentioned earlier:Merkle Town is a hub for CT data. We process all trusted certificates and present relevant statistics on our website. This means that every certificate issued on the Internet passes through Cloudflare, and all the data is public (so no privacy concerns here).Cloudflare Nimbus is our very own CT log. It contains more than 400 million certificates.Note: Cloudflare, Google, and DigiCert are not the only CT log providers.So here’s the process... At some point in time, you (or an impostor) request a certificate for your website. A Certificate Authority approves the request and issues the certificate. Within 24 hours, the CA sends this certificate to a set of CT logs. This is where we come in: Cloudflare uses an internal process known as “The Crawler” to look through millions of certificate records. Merkle Town dispatches The Crawler to monitor CT logs and check for new certificates. When The Crawler finds a new certificate, it pulls the entire certificate through Merkle Town.When we process the certificate in Merkle Town, we also check it against a list of monitored domains. If you have CT Monitoring enabled, we’ll send you an alert immediately. This is only possible because of Merkle Town’s existing infrastructure. Also, The Crawler is ridiculously fast.I Got a Certificate Alert. What Now?Good question. Most of the time, certificate alerts are routine. Certificates expire and renew on a regular basis, so it’s totally normal to get these emails. If everything looks correct (the issuer, your domain name, etc.), go ahead and toss that email in the trash.In rare cases, you might get an email that looks suspicious. We provide a detailed support article that will help. The basic protocol is this:Contact the CA (listed as “Issuer” in the email).Explain why you think the certificate is suspicious.The CA should revoke the certificate (if it really is malicious).We also have a friendly support team that can be reached here. While Cloudflare is not at CA and cannot revoke certificates, our support team knows quite a bit about certificate management and is ready to help.The FutureCertificate Transparency has started making regular appearances on the Cloudflare blog. Why? It’s required by Chrome and Safari, which dominate the browser market and set precedents for Internet security. But more importantly, CT can help us spot malicious certificates before they are used in attacks. This is why we will continue to refine and improve our certificate detection methods.What are you waiting for? Go enable Certificate Transparency Monitoring!

The mass shootings in El Paso, Texas and Dayton, Ohio are horrific tragedies. In the case of the El Paso shooting, the suspected terrorist gunman appears to have been inspired by the forum website known as 8chan. Based on evidence we've seen, it appears that he posted a screed to the site immediately before beginning his terrifying attack on the El Paso Walmart killing 20 people.Unfortunately, this is not an isolated incident. Nearly the same thing happened on 8chan before the terror attack in Christchurch, New Zealand. The El Paso shooter specifically referenced the Christchurch incident and appears to have been inspired by the largely unmoderated discussions on 8chan which glorified the previous massacre. In a separate tragedy, the suspected killer in the Poway, California synagogue shooting also posted a hate-filled “open letter” on 8chan. 8chan has repeatedly proven itself to be a cesspool of hate.8chan is among the more than 19 million Internet properties that use Cloudflare's service. We just sent notice that we are terminating 8chan as a customer effective at midnight tonight Pacific Time. The rationale is simple: they have proven themselves to be lawless and that lawlessness has caused multiple tragic deaths. Even if 8chan may not have violated the letter of the law in refusing to moderate their hate-filled community, they have created an environment that revels in violating its spirit.We do not take this decision lightly. Cloudflare is a network provider. In pursuit of our goal of helping build a better internet, we’ve considered it important to provide our security services broadly to make sure as many users as possible are secure, and thereby making cyberattacks less attractive — regardless of the content of those websites. Many of our customers run platforms of their own on top of our network. If our policies are more conservative than theirs it effectively undercuts their ability to run their services and set their own policies. We reluctantly tolerate content that we find reprehensible, but we draw the line at platforms that have demonstrated they directly inspire tragic events and are lawless by design. 8chan has crossed that line. It will therefore no longer be allowed to use our services.What Will Happen NextUnfortunately, we have seen this situation before and so we have a good sense of what will play out. Almost exactly two years ago we made the determination to kick another disgusting site off Cloudflare's network: the Daily Stormer. That caused a brief interruption in the site's operations but they quickly came back online using a Cloudflare competitor. That competitor at the time promoted as a feature the fact that they didn't respond to legal process. Today, the Daily Stormer is still available and still disgusting. They have bragged that they have more readers than ever. They are no longer Cloudflare's problem, but they remain the Internet's problem.I have little doubt we'll see the same happen with 8chan. While removing 8chan from our network takes heat off of us, it does nothing to address why hateful sites fester online. It does nothing to address why mass shootings occur. It does nothing to address why portions of the population feel so disenchanted they turn to hate. In taking this action we've solved our own problem, but we haven't solved the Internet's.In the two years since the Daily Stormer what we have done to try and solve the Internet’s deeper problem is engage with law enforcement and civil society organizations to try and find solutions. Among other things, that resulted in us cooperating around monitoring potential hate sites on our network and notifying law enforcement when there was content that contained an indication of potential violence. We will continue to work within the legal process to share information when we can to hopefully prevent horrific acts of violence. We believe this is our responsibility and, given Cloudflare's scale and reach, we are hopeful we will continue to make progress toward solving the deeper problem.Rule of LawWe continue to feel incredibly uncomfortable about playing the role of content arbiter and do not plan to exercise it often. Some have wrongly speculated this is due to some conception of the United States' First Amendment. That is incorrect. First, we are a private company and not bound by the First Amendment. Second, the vast majority of our customers, and more than 50% of our revenue, comes from outside the United States where the First Amendment and similarly libertarian freedom of speech protections do not apply. The only relevance of the First Amendment in this case and others is that it allows us to choose who we do and do not do business with; it does not obligate us to do business with everyone.Instead our concern has centered around another much more universal idea: the Rule of Law. The Rule of Law requires policies be transparent and consistent. While it has been articulated as a framework for how governments ensure their legitimacy, we have used it as a touchstone when we think about our own policies.We have been successful because we have a very effective technological solution that provides security, performance, and reliability in an affordable and easy-to-use way. As a result of that, a huge portion of the Internet now sits behind our network. 10% of the top million, 17% of the top 100,000, and 19% of the top 10,000 Internet properties use us today. 10% of the Fortune 1,000 are paying Cloudflare customers.Cloudflare is not a government. While we've been successful as a company, that does not give us the political legitimacy to make determinations on what content is good and bad. Nor should it. Questions around content are real societal issues that need politically legitimate solutions. We will continue to engage with lawmakers around the world as they set the boundaries of what is acceptable in their countries through due process of law. And we will comply with those boundaries when and where they are set.Europe, for example, has taken a lead in this area. As we've seen governments there attempt to address hate and terror content online, there is recognition that different obligations should be placed on companies that organize and promote content — like Facebook and YouTube — rather than those that are mere conduits for that content. Conduits, like Cloudflare, are not visible to users and therefore cannot be transparent and consistent about their policies.The unresolved question is how should the law deal with platforms that ignore or actively thwart the Rule of Law? That's closer to the situation we have seen with the Daily Stormer and 8chan. They are lawless platforms. In cases like these, where platforms have been designed to be lawless and unmoderated, and where the platforms have demonstrated their ability to cause real harm, the law may need additional remedies. We and other technology companies need to work with policy makers in order to help them understand the problem and define these remedies. And, in some cases, it may mean moving enforcement mechanisms further down the technical stack.Our ObligationCloudflare's mission is to help build a better Internet. At some level firing 8chan as a customer is easy. They are uniquely lawless and that lawlessness has contributed to multiple horrific tragedies. Enough is enough.What's hard is defining the policy that we can enforce transparently and consistently going forward. We, and other technology companies like us that enable the great parts of the Internet, have an obligation to help propose solutions to deal with the parts we're not proud of. That's our obligation and we're committed to it.Unfortunately the action we take today won’t fix hate online. It will almost certainly not even remove 8chan from the Internet. But it is the right thing to do. Hate online is a real issue. Here are some organizations that have active work to help address it:Anti-Defamation LeagueGen Next FoundationPerspective API7 CupsOur whole Cloudflare team’s thoughts are with the families grieving in El Paso, Texas and Dayton, Ohio this evening.

I’ve recently joined Cloudflare as Head of Australia and New Zealand (A/NZ). This is an important time for the company as we continue to grow our presence locally to address the demand in A/NZ, recruit local talent, and build on the successes we’ve had in our other offices around the globe. In this new role, I’m eager to grow our brand recognition in A/NZ and optimise our reach to customers by building up my team and channel presence.A little about meI’m a Melburnian born and bred (most livable city in the world!) with more than 20 years of experience in our market. From guiding strategy and architecture of the region’s largest resources company, BHP, to building and running teams and channels, and helping customers solve the technical challenges of their time, I have been in, or led, businesses in the A/NZ Enterprise market, with a focus on network and security for the last six years.Why Cloudflare?I joined Cloudflare because I strongly believe in its mission to help build a better Internet, and believe this mission, paired with its massive global network, will enable the company to continue to deliver incredibly innovative solutions to customers of all segments. Four years ago, I was lucky to build and lead the VMware Network & Security business, working with some of Cloudflare’s biggest A/NZ customers. I was confronted with the full extent of the security challenges that A/NZ businesses face. I recognized that there must be a better way to help customers secure their local and multi-cloud environments. That's how I found Cloudflare. With Cloudflare's Global Cloud Platform, businesses have an integrated solution that offers the best in security, performance and reliability. Second, something that’s personally important for me as the son of Italian migrants, and now a dad of two gorgeous daughters, is that Cloudflare is serious about culture and diversity. When I was considering joining Cloudflare, I watched videos from the Internet Summit, an annual event that Cloudflare hosts in its San Francisco office. One thing that really stood out to me was that the speakers came from so many different backgrounds. I’m extremely passionate about encouraging those from all walks of life to pursue opportunities in business and tech, so seeing the diversity of people giving insightful talks made me realise that this was a company I wanted to work for, and hopefully perhaps my girls as well (no pressure).Cloudflare A/NZI strongly believe that Cloudflare’s mission, paired with its massive global network, will enable customers of all sizes in segments in Australia and New Zealand to leverage Cloudflare’s security, performance and reliability solutions. For example, VicRoads is 85 percent faster now that they are using Argo Smart Routing, Ansarada uses Cloudflare’s WAF to protect against malicious activity, and MyAffiliates harnesses Cloudflare’s global network, which spans more than 180 cities in 80 countries, to ensure an interruption-free service for its customers. Making security and speed, which are necessary for any strong business, available to anyone with an Internet property is truly a noble goal. That’s another one of the reasons I’m most excited to work at Cloudflare.Australians and Kiwis alike have always been great innovators and users of technology. However, being so physically isolated (Perth is the most isolated city in the world and A/NZ are far from pretty much everywhere else in the world) has limited our ability to have the diversity of choice and competition. Our isolation from said choice and competition fueled innovation, but at the price of complexity, cost, and ease. This makes having local servers absolutely vital for good performance. With Cloudflare’s expansive network, 98 percent of the Internet-connected developed world is located within 100 milliseconds of our network. In fact, Cloudflare already has data centers in Auckland, Brisbane, Melbourne, Perth, and Sydney, ensuring that customers in A/NZ have access to a secure, fast, and reliable Internet. Our opportunities in Australia, New Zealand and beyond...I’m truly looking forward to helping Cloudflare grow its reach over the next five years. If you are a business in Australia and New Zealand and have a cyber-security, performance or reliability need, get in touch with us (1300 748 959). We’d love to explore how we can help. If you’re interested in exploring careers at Cloudflare, we are hiring globally. Our team in Australia is small today, about a dozen, and we are growing quickly. We have open roles in Solutions Engineering and Business Development Representatives. Check out our careers page to learn more, or send me a note.

I rarely have to deal with the hassle of using a corporate VPN and I hope it remains this way. As a new member of the Cloudflare team, that seems possible. Coworkers who joined a few years ago did not have that same luck. They had to use a VPN to get any work done. What changed?Cloudflare released Access, and now we’re able to do our work without ever needing a VPN again. Access is a way to control access to your internal applications and infrastructure. Today, we’re releasing a new feature to help you replace your VPN by deploying Access at an even greater scale.Access in an instantAccess replaces a corporate VPN by evaluating every request made to a resource secured behind Access. Administrators can make web applications, remote desktops, and physical servers available at dedicated URLs, configured as DNS records in Cloudflare. These tools are protected via access policies, set by the account owner, so that only authenticated users can access those resources. These end users are able to be authenticated over both HTTPS and SSH requests. They’re prompted to login with their SSO credentials and Access redirects them to the application or server.For your team, Access makes your internal web applications and servers in your infrastructure feel as seamless to reach as your SaaS tools. Originally we built Access to replace our own corporate VPN. In practice, this became the fastest way to control who can reach different pieces of our own infrastructure. However, administrators configuring Access were required to create a discrete policy per each application/hostname. Now, administrators don’t have to create a dedicated policy for each new resource secured by Access; one policy will cover each URL protected. When Access launched, the product’s primary use case was to secure internal web applications. Creating unique rules for each was tedious, but manageable. Access has since become a centralized way to secure infrastructure in many environments. Now that companies are using Access to secure hundreds of resources, that method of building policies no longer fits.Starting today, Access users can build policies using a wildcard subdomain to replace the typical bottleneck that occurs when replacing dozens or even hundreds of bespoke rules within a single policy. With a wildcard, the same ruleset will now automatically apply to any subdomain your team generates that is gated by Access.How can teams deploy at scale with wildcard subdomains?Administrators can secure their infrastructure with a wildcard policy in the Cloudflare dashboard. With Access enabled, Cloudflare adds identity-based evaluation to that traffic.In the Access dashboard, you can now build a rule to secure any subdomain of the site you added to Cloudflare. Create a new policy and enter a wildcard tag (“*”) into the subdomain field. You can then configure rules, at a granular level, using your identity provider to control who can reach any subdomain of that apex domain.This new policy will propagate to all 180 of Cloudflare’s data centers in seconds and any new subdomains created will be protected.How are teams using it?Since releasing this feature in a closed beta, we’ve seen teams use it to gate access to their infrastructure in several new ways. Many teams use Access to secure dev and staging environments of sites that are being developed before they hit production. Whether for QA or collaboration with partner agencies, Access helps make it possible to share sites quickly with a layer of authentication. With wildcard subdomains, teams are deploying dozens of versions of new sites at new URLs without needing to touch the Access dashboard.For example, an administrator can create a policy for “*.example.com” and then developers can deploy iterations of sites at “dev-1.example.com” and “dev-2.example.com” and both inherit the global Access policy.The feature is also helping teams lock down their entire hybrid, on-premise, or public cloud infrastructure with the Access SSH feature. Teams can assign dynamic subdomains to their entire fleet of servers, regardless of environment, and developers and engineers can reach them over an SSH connection without a VPN. Administrators can now bring infrastructure online, in an entirely new environment, without additional or custom security rules.What about creating DNS records?Cloudflare Access requires users to associate a resource with a domain or subdomain. While the wildcard policy will cover all subdomains, teams will still need to connect their servers to the Cloudflare network and generate DNS records for those services.Argo Tunnel can reduce that burden significantly. Argo Tunnel lets you expose a server to the Internet without opening any inbound ports. The service runs a lightweight daemon on your server that initiates outbound tunnels to the Cloudflare network.Instead of managing DNS, network, and firewall complexity, Argo Tunnel helps administrators serve traffic from their origin through Cloudflare with a single command. That single command will generate the DNS record in Cloudflare automatically, allowing you to focus your time on building and managing your infrastructure.What’s next?More teams are adopting a hybrid or multi-cloud model for deploying their infrastructure. In the past, these teams were left with just two options for securing those resources: peering a VPN with each provider or relying on custom IAM flows with each environment. In the end, both of these solutions were not only quite costly but also equally unmanageable.While infrastructure benefits from becoming distributed, security is something that is best when controlled in a single place. Access can consolidate how a team controls who can reach their entire fleet of servers and services.

Securing access to your APT repositories is critical. At Cloudflare, like in most organizations, we used a legacy VPN to lock down who could reach our internal software repositories. However, a network perimeter model lacks a number of features that we consider critical to a team’s security.As a company, we’ve been moving our internal infrastructure to our own zero-trust platform, Cloudflare Access. Access added SaaS-like convenience to the on-premise tools we managed. We started with web applications and then moved resources we need to reach over SSH behind the Access gateway, for example Git or user-SSH access. However, we still needed to handle how services communicate with our internal APT repository.We recently open sourced a new APT transport which allows customers to protect their private APT repositories using Cloudflare Access. In this post, we’ll outline the history of APT tooling, APT transports and introduce our new APT transport for Cloudflare Access. A brief history of APTAdvanced Package Tool, or APT, simplifies the installation and removal of software on Debian and related Linux distributions. Originally released in 1998, APT was to Debian what the App Store was to modern smartphones - a decade ahead of its time!APT sits atop the lower-level dpkg tool, which is used to install, query, and remove .deb packages - the primary software packaging format in Debian and related Linux distributions such as Ubuntu. With dpkg, packaging and managing software installed on your system became easier - but it didn’t solve for problems around distribution of packages, such as via the Internet or local media; at the time of inception, it was commonplace to install packages from a CD-ROM.APT introduced the concept of repositories - a mechanism for storing and indexing a collection of .deb packages. APT supports connecting to multiple repositories for finding packages and automatically resolving package dependencies. The way APT connects to said repositories is via a “transport” - a mechanism for communicating between the APT client and its repository source (more on this later).APT over the InternetPrior to version 1.5, APT did not include support for HTTPS - if you wanted to install a package over the Internet, your connection was not encrypted. This reduces privacy - an attacker snooping traffic could determine specific package version your system is installing. It also exposes you to man-in-the-middle attacks where an attacker could, for example, exploit a remote code execution vulnerability. Just 6 months ago, we saw an example of the latter with CVE-2019-3462.Enter the APT HTTPS transport - an optional transport you can install to add support for connecting to repositories over HTTPS. Once installed, users need to configure their APT sources.list with repositories using HTTPS.The challenge here, of course, is that the most common way to install this transport is via APT and HTTP - a classic bootstrapping problem! An alternative here is to download the .deb package via curl and install it via dpkg. You’ll find the links to apt-transport-https binaries for Stretch here - once you have the URL path for your system architecture, you can download it from the deb.debian.org mirror-redirector over HTTPS, e.g. for amd64 (a.k.a. x86_64):curl -o apt-transport-https.deb -L https://deb.debian.org/debian/pool/main/a/apt/apt-transport-https_1.4.9_amd64.deb
HASH=c8c4366d1912ff8223615891397a78b44f313b0a2f15a970a82abe48460490cb && echo "$HASH apt-transport-https.deb" | sha256sum -c
sudo dpkg -i apt-transport-https.deb
To confirm which APT transports are installed on your system, you can list each “method binary” that is installed:ls /usr/lib/apt/methods
With apt-transport-https installed you should now see ‘https’ in that list.The state of APT & HTTPS on DebianYou may be wondering how relevant this APT HTTPS transport is today. Given the prevalence of HTTPS on the web today, I was surprised when I found out exactly how relevant it is.Up until a couple of weeks ago, Debian Stretch (9.x) was the current stable release; 9.0 was first released in June 2017 - and the latest version (9.9) includes apt 1.4.9 by default - meaning that securing your APT communication for Debian Stretch requires installing the optional apt-transport-https package.Thankfully, on July 6 of this year, Debian released the latest version - Buster - which currently includes apt 1.8.2 with HTTPS support built-in by default, negating the need for installing the apt-transport-https package - and removing the bootstrapping challenge of installing HTTPS support via HTTPS!BYO HTTPS APT RepositoryA powerful feature of APT is the ability to run your own repository. You can mirror a public repository to improve performance or protect against an outage. And if you’re producing your own software packages, you can run your own repository to simplify distribution and installation of your software for your users.If you have your own APT repository and you’re looking to secure it with HTTPS we’ve offered free Universal SSL since 2014 and last year introduced a way to require it site-wide automatically with one click. You’ll get the benefits of DDoS attack protection, a Global CDN with Caching, and Analytics.But what if you’re looking for more than just HTTPS for your APT repository? For companies operating private APT repositories, authentication of your APT repository may be a challenge. This is where our new, custom APT transport comes in.Building custom transportsThe system design of APT is powerful in that it supports extensibility via Transport executables, but how does this mechanism work?When APT attempts to connect to a repository, it finds the executable which matches the “scheme” from the repository URL (e.g. “https://” prefix on a repository results in the “https” executable being called). APT then uses the common Linux standard streams: stdin, stdout, and stderr. It communicates via stdin/stdout using a set of plain-text Messages, which follow IETF RFC #822 (the same format that .deb “Package” files use).Examples of input message include “600 URI Acquire”, and examples of output messages include “200 URI Start” and “201 URI Done”:If you’re interested in building your own transport, check out the APT method interface spec for more implementation details. APT meets AccessCloudflare prioritizes dogfooding our own products early and often. The Access product has given our internal DevTools team a chance to work closely with the product team as we build features that help solve use cases across our organization. We’ve deployed new features internally, gathered feedback, improved them, and then released them to our customers. For example, we’ve been able to iterate on tools for Access like the Atlassian SSO plugin and the SSH feature, as collaborative efforts between DevTools and the Access team.Our DevTools team wanted to take the same dogfooding approach to protect our internal APT repository with Access. We knew this would require a custom APT transport to support generating the required tokens and passing the correct headers in HTTPS requests to our internal APT repository server. We decided to build and test our own transport that both generated the necessary tokens and passed the correct headers to allow us to place our repository behind Access.After months of internal use, we’re excited to announce that we have recently open-sourced our custom APT transport, so our customers can also secure their APT repositories by enabling authentication via Cloudflare Access. By protecting your APT repository with Cloudflare Access, you can support authenticating users via Single-Sign On (SSO) providers, defining comprehensive access-control policies, and monitoring access and change logs.Our APT transport leverages another Open Source tool we provide, cloudflared, which enables users to connect to your Cloudflare-protected domain securely.Securing your APT RepositoryTo use our APT transport, you’ll need an APT repository that’s protected by Cloudflare Access. Our instructions (below) for using our transport will use apt.example.com as a hostname.To use our APT transport with your own web-based APT repository, refer to our Setting Up Access guide.APT Transport InstallationTo install from source, both tools require Go - once you install Go, you can install `cloudflared` and our APT transport with four commands:go get github.com/cloudflare/cloudflared/cmd/cloudflared
sudo cp ${GOPATH:-~/go}/bin/cloudflared /usr/local/bin/cloudflared
go get github.com/cloudflare/apt-transport-cloudflared/cmd/cfd
sudo cp ${GOPATH:-~/go}/bin/cfd /usr/lib/apt/methods/cfd
The above commands should place the cloudflared executable in /usr/local/bin (which should be on your PATH), and the APT transport binary in the required /usr/lib/apt/methods directory.To confirm cloudflared is on your path, run:which cloudflared
The above command should return /usr/local/bin/cloudflaredNow that the custom transport is installed, to start using it simply configure an APT source with the cfd:// rather than https:// e.g:$ cat /etc/apt/sources.list.d/example.list
deb [arch=amd64] cfd://apt.example.com/v2/stretch stable common
Next time you do `apt-get update` and `apt-get install`, a browser window will open asking you to log-in over Cloudflare Access, and your package will be retrieved using the token returned by `cloudflared`.Fetching a GPG Key over AccessUsually, private APT repositories will use SecureApt and have their own GPG public key that users must install to verify the integrity of data retrieved from that repository.Users can also leverage cloudflared for securely downloading and installing those keys, e.g:cloudflared access login https://apt.example.com
cloudflared access curl https://apt.example.com/public.gpg | sudo apt-key add -
The first command will open your web browser allowing you to authenticate for your domain. The second command wraps curl to download the GPG key, and hands it off to `apt-key add`.Cloudflare Access on "headless" serversIf you’re looking to deploy APT repositories protected by Cloudflare Access to non-user-facing machines (a.k.a. “headless” servers), opening a browser does not work. The good news is since February, Cloudflare Access supports service tokens - and we’ve built support for them into our APT transport from day one.If you’d like to use service tokens with our APT transport, it’s as simple as placing the token in a file in the correct path; because the machine already has a token, there is also no dependency on `cloudflared` for authentication. You can find details on how to set-up a service token in the APT transport README.What’s next?As demonstrated, you can get started using our APT transport today - we’d love to hear your feedback on this! This work came out of an internal dogfooding effort, and we’re currently experimenting with additional packaging formats and tooling. If you’re interested in seeing support for another format or tool, please reach out.

I was the 24th employee of Cloudflare and the first outside of San Francisco. Working out of my spare bedroom, I wrote a chunk of Cloudflare’s software before starting to recruit a team in London. Today, Cloudflare London, our EMEA headquarters, has more than 200 people working in the historic County Hall building opposite the Houses of Parliament. My spare bedroom is ancient history.CC BY-SA 2.0 image by Sridhar SarafAnd Cloudflare didn’t stop at London. We now have people in Munich, Singapore, Beijing, Austin, TX, Chicago and Champaign, IL, New York, Washington, DC, San Jose, CA, Miami, FL, and Sydney, Australia, as well as San Francisco and London. And today we’re announcing the establishment of a new technical hub in Lisbon, Portugal. As part of that office opening I will be relocating to Lisbon this summer along with a small number of technical folks from other Cloudflare offices.We’re recruiting in Lisbon starting today. Go here to see all the current opportunities. We’re looking for people to fill roles in Engineering, Security, Product, Product Strategy, Technology Research, and Customer Support.CC BY-SA 2.0 Image by Rustam AliyevMy first real idea of Lisbon dates to 30 years ago with the 1989 publication of John Le Carré’s The Russia House. As real, of course, as any Le Carré view of the world:[...] ten years ago on a whim Barley Blair, having inherited a stray couple of thousand from a remote aunt, bought himself a scruffy pied-a-terre in Lisbon, where he was accustomed to take periodic rests from the burden of his many-sided soul. It could have been Cornwall, it could have been Provence or Timbuktu. But Lisbon by an accident had got him [...]Cloudflare’s choice of Lisbon, however, came not by way of an accident but a careful search for a new continental European city in which to locate a technical office. I had been invited to Lisbon back in 2014 to speak at SAPO Codebits and been impressed by the size and range of technical talent present at the event. Subsequently, we looked at 45 cities across 29 countries, narrowing down to a final list of three.Lisbon’s combination of a large and growing existing tech ecosystem, attractive immigration policy, political stability, high standard of living, as well as logistical factors like time zone (the same as the UK) and direct flights to San Francisco made it the clear winner.Eu começei a aprender Português há três meses... and I’m looking forward to discovering a country and a culture, and building a new technical hub for Cloudflare. We have found a thriving local technology ecosystem, supported both by the government and a myriad of exciting startups, and we look forward to collaborating with them to continue to raise Lisbon's profile.

Almost nine years ago, Cloudflare was a tiny company and I was a customer not an employee. Cloudflare had launched a month earlier and one day alerting told me that my little site, jgc.org, didn’t seem to have working DNS any more. Cloudflare had pushed out a change to its use of Protocol Buffers and it had broken DNS.I wrote to Matthew Prince directly with an email titled “Where’s my dns?” and he replied with a long, detailed, technical response (you can read the full email exchange here) to which I replied:From: John Graham-Cumming
Date: Thu, Oct 7, 2010 at 9:14 AM
Subject: Re: Where's my dns?
To: Matthew Prince
Awesome report, thanks. I'll make sure to call you if there's a
problem. At some point it would probably be good to write this up as
a blog post when you have all the technical details because I think
people really appreciate openness and honesty about these things.
Especially if you couple it with charts showing your post launch
traffic increase.
I have pretty robust monitoring of my sites so I get an SMS when
anything fails. Monitoring shows I was down from 13:03:07 to
14:04:12. Tests are made every five minutes.
It was a blip that I'm sure you'll get past. But are you sure you
don't need someone in Europe? :-)
To which he replied:From: Matthew Prince
Date: Thu, Oct 7, 2010 at 9:57 AM
Subject: Re: Where's my dns?
To: John Graham-Cumming
Thanks. We've written back to everyone who wrote in. I'm headed in to
the office now and we'll put something on the blog or pin an official
post to the top of our bulletin board system. I agree 100%
transparency is best.
And so, today, as an employee of a much, much larger Cloudflare I get to be the one who writes, transparently about a mistake we made, its impact and what we are doing about it.The events of July 2On July 2, we deployed a new rule in our WAF Managed Rules that caused CPUs to become exhausted on every CPU core that handles HTTP/HTTPS traffic on the Cloudflare network worldwide. We are constantly improving WAF Managed Rules to respond to new vulnerabilities and threats. In May, for example, we used the speed with which we can update the WAF to push a rule to protect against a serious SharePoint vulnerability. Being able to deploy rules quickly and globally is a critical feature of our WAF.Unfortunately, last Tuesday’s update contained a regular expression that backtracked enormously and exhausted CPU used for HTTP/HTTPS serving. This brought down Cloudflare’s core proxying, CDN and WAF functionality. The following graph shows CPUs dedicated to serving HTTP/HTTPS traffic spiking to nearly 100% usage across the servers in our network.CPU utilization in one of our PoPs during the incidentThis resulted in our customers (and their customers) seeing a 502 error page when visiting any Cloudflare domain. The 502 errors were generated by the front line Cloudflare web servers that still had CPU cores available but were unable to reach the processes that serve HTTP/HTTPS traffic.We know how much this hurt our customers. We’re ashamed it happened. It also had a negative impact on our own operations while we were dealing with the incident.It must have been incredibly stressful, frustrating and frightening if you were one of our customers. It was even more upsetting because we haven’t had a global outage for six years. The CPU exhaustion was caused by a single WAF rule that contained a poorly written regular expression that ended up creating excessive backtracking. The regular expression that was at the heart of the outage is (?:(?:\"|'|\]|\}|\\|\d|(?:nan|infinity|true|false|null|undefined|symbol|math)|\`|\-|\+)+[)]*;?((?:\s|-|~|!|{}|\|\||\+)*.*(?:.*=.*))) Although the regular expression itself is of interest to many people (and is discussed more below), the real story of how the Cloudflare service went down for 27 minutes is much more complex than “a regular expression went bad”. We’ve taken the time to write out the series of events that lead to the outage and kept us from responding quickly. And, if you want to know more about regular expression backtracking and what to do about it, then you’ll find it in an appendix at the end of this post.What happenedLet’s begin with the sequence of events. All times in this blog are UTC.At 13:42 an engineer working on the firewall team deployed a minor change to the rules for XSS detection via an automatic process. This generated a Change Request ticket. We use Jira to manage these tickets and a screenshot is below.Three minutes later the first PagerDuty page went out indicating a fault with the WAF. This was a synthetic test that checks the functionality of the WAF (we have hundreds of such tests) from outside Cloudflare to ensure that it is working correctly. This was rapidly followed by pages indicating many other end-to-end tests of Cloudflare services failing, a global traffic drop alert, widespread 502 errors and then many reports from our points-of-presence (PoPs) in cities worldwide indicating there was CPU exhaustion.Some of these alerts hit my watch and I jumped out of the meeting I was in and was on my way back to my desk when a leader in our Solutions Engineering group told me we had lost 80% of our traffic. I ran over to SRE where the team was debugging the situation. In the initial moments of the outage there was speculation it was an attack of some type we’d never seen before.Cloudflare’s SRE team is distributed around the world, with continuous, around-the-clock coverage. Alerts like these, the vast majority of which are noting very specific issues of limited scopes in localized areas, are monitored in internal dashboards and addressed many times every day. This pattern of pages and alerts, however, indicated that something gravely serious had happened, and SRE immediately declared a P0 incident and escalated to engineering leadership and systems engineering.The London engineering team was at that moment in our main event space listening to an internal tech talk. The talk was interrupted and everyone assembled in a large conference room and others dialed-in. This wasn’t a normal problem that SRE could handle alone, it needed every relevant team online at once.At 14:00 the WAF was identified as the component causing the problem and an attack dismissed as a possibility. The Performance Team pulled live CPU data from a machine that clearly showed the WAF was responsible. Another team member used strace to confirm. Another team saw error logs indicating the WAF was in trouble. At 14:02 the entire team looked at me when it was proposed that we use a ‘global kill’, a mechanism built into Cloudflare to disable a single component worldwide. But getting to the global WAF kill was another story. Things stood in our way. We use our own products and with our Access service down we couldn’t authenticate to our internal control panel (and once we were back we’d discover that some members of the team had lost access because of a security feature that disables their credentials if they don’t use the internal control panel frequently).And we couldn’t get to other internal services like Jira or the build system. To get to them we had to use a bypass mechanism that wasn’t frequently used (another thing to drill on after the event). Eventually, a team member executed the global WAF kill at 14:07 and by 14:09 traffic levels and CPU were back to expected levels worldwide. The rest of Cloudflare's protection mechanisms continued to operate.Then we moved on to restoring the WAF functionality. Because of the sensitivity of the situation we performed both negative tests (asking ourselves “was it really that particular change that caused the problem?”) and positive tests (verifying the rollback worked) in a single city using a subset of traffic after removing our paying customers’ traffic from that location.At 14:52 we were 100% satisfied that we understood the cause and had a fix in place and the WAF was re-enabled globally.How Cloudflare operatesCloudflare has a team of engineers who work on our WAF Managed Rules product; they are constantly working to improve detection rates, lower false positives, and respond rapidly to new threats as they emerge. In the last 60 days, 476 change requests have been handled for the WAF Managed Rules (averaging one every 3 hours).This particular change was to be deployed in “simulate” mode where real customer traffic passes through the rule but nothing is blocked. We use that mode to test the effectiveness of a rule and measure its false positive and false negative rate. But even in the simulate mode the rules actually need to execute and in this case the rule contained a regular expression that consumed excessive CPU.As can be seen from the Change Request above there’s a deployment plan, a rollback plan and a link to the internal Standard Operating Procedure (SOP) for this type of deployment. The SOP for a rule change specifically allows it to be pushed globally. This is very different from all the software we release at Cloudflare where the SOP first pushes software to an internal dogfooding network point of presence (PoP) (which our employees pass through), then to a small number of customers in an isolated location, followed by a push to a large number of customers and finally to the world.The process for a software release looks like this: We use git internally via BitBucket. Engineers working on changes push code which is built by TeamCity and when the build passes, reviewers are assigned. Once a pull request is approved the code is built and the test suite runs (again). If the build and tests pass then a Change Request Jira is generated and the change has to be approved by the relevant manager or technical lead. Once approved deployment to what we call the “animal PoPs” occurs: DOG, PIG, and the Canaries.The DOG PoP is a Cloudflare PoP (just like any of our cities worldwide) but it is used only by Cloudflare employees. This dogfooding PoP enables us to catch problems early before any customer traffic has touched the code. And it frequently does.If the DOG test passes successfully code goes to PIG (as in “Guinea Pig”). This is a Cloudflare PoP where a small subset of customer traffic from non-paying customers passes through the new code. If that is successful the code moves to the Canaries. We have three Canary PoPs spread across the world and run paying and non-paying customer traffic running through them on the new code as a final check for errors.Cloudflare software release processOnce successful in Canary the code is allowed to go live. The entire DOG, PIG, Canary, Global process can take hours or days to complete, depending on the type of code change. The diversity of Cloudflare’s network and customers allows us to test code thoroughly before a release is pushed to all our customers globally. But, by design, the WAF doesn’t use this process because of the need to respond rapidly to threats.WAF ThreatsIn the last few years we have seen a dramatic increase in vulnerabilities in common applications. This has happened due to the increased availability of software testing tools, like fuzzing for example (we just posted a new blog on fuzzing here). Source: https://cvedetails.com/What is commonly seen is a Proof of Concept (PoC) is created and often published on Github quickly, so that teams running and maintaining applications can test to make sure they have adequate protections. Because of this, it’s imperative that Cloudflare are able to react as quickly as possible to new attacks to give our customers a chance to patch their software.A great example of how Cloudflare proactively provided this protection was through the deployment of our protections against the SharePoint vulnerability in May (blog here). Within a short space of time from publicised announcements, we saw a huge spike in attempts to exploit our customer’s Sharepoint installations. Our team continuously monitors for new threats and writes rules to mitigate them on behalf of our customers.The specific rule that caused last Tuesday’s outage was targeting Cross-site scripting (XSS) attacks. These too have increased dramatically in recent years.Source: https://cvedetails.com/The standard procedure for a WAF Managed Rules change indicates that Continuous Integration (CI) tests must pass prior to a global deploy. That happened normally last Tuesday and the rules were deployed. At 13:31 an engineer on the team had merged a Pull Request containing the change after it was approved. At 13:37 TeamCity built the rules and ran the tests, giving it the green light. The WAF test suite tests that the core functionality of the WAF works and consists of a large collection of unit tests for individual matching functions. After the unit tests run the individual WAF rules are tested by executing a huge collection of HTTP requests against the WAF. These HTTP requests are designed to test requests that should be blocked by the WAF (to make sure it catches attacks) and those that should be let through (to make sure it isn’t over-blocking and creating false positives). What it didn’t do was test for runaway CPU utilization by the WAF and examining the log files from previous WAF builds shows that no increase in test suite run time was observed with the rule that would ultimately cause CPU exhaustion on our edge.With the tests passing, TeamCity automatically began deploying the change at 13:42.QuicksilverBecause WAF rules are required to address emergent threats they are deployed using our Quicksilver distributed key-value (KV) store that can push changes globally in seconds. This technology is used by all our customers when making configuration changes in our dashboard or via the API and is the backbone of our service’s ability to respond to changes very, very rapidly.We haven’t really talked about Quicksilver much. We previously used Kyoto Tycoon as a globally distributed key-value store, but we ran into operational issues with it and wrote our own KV store that is replicated across our more than 180 cities. Quicksilver is how we push changes to customer configuration, update WAF rules, and distribute JavaScript code written by customers using Cloudflare Workers.From clicking a button in the dashboard or making an API call to change configuration to that change coming into effect takes seconds, globally. Customers have come to love this high speed configurability. And with Workers they expect near instant, global software deployment. On average Quicksilver distributes about 350 changes per second.And Quicksilver is very fast. On average we hit a p99 of 2.29s for a change to be distributed to every machine worldwide. Usually, this speed is a great thing. It means that when you enable a feature or purge your cache you know that it’ll be live globally nearly instantly. When you push code with Cloudflare Workers it's pushed out at the same speed. This is part of the promise of Cloudflare fast updates when you need them.However, in this case, that speed meant that a change to the rules went global in seconds. You may notice that the WAF code uses Lua. Cloudflare makes use of Lua extensively in production and details of the Lua in the WAF have been discussed before. The Lua WAF uses PCRE internally and it uses backtracking for matching and has no mechanism to protect against a runaway expression. More on that and what we're doing about it below.Everything that occurred up to the point the rules were deployed was done “correctly”: a pull request was raised, it was approved, CI/CD built the code and tested it, a change request was submitted with an SOP detailing rollout and rollback, and the rollout was executed. Cloudflare WAF deployment processWhat went wrongAs noted, we deploy dozens of new rules to the WAF every week, and we have numerous systems in place to prevent any negative impact of that deployment. So when things do go wrong, it’s generally the unlikely convergence of multiple causes. Getting to a single root cause, while satisfying, may obscure the reality. Here are the multiple vulnerabilities that converged to get to the point where Cloudflare’s service for HTTP/HTTPS went offline.An engineer wrote a regular expression that could easily backtrack enormously.A protection that would have helped prevent excessive CPU use by a regular expression was removed by mistake during a refactoring of the WAF weeks prior—a refactoring that was part of making the WAF use less CPU.The regular expression engine being used didn’t have complexity guarantees.The test suite didn’t have a way of identifying excessive CPU consumption.The SOP allowed a non-emergency rule change to go globally into production without a staged rollout.The rollback plan required running the complete WAF build twice taking too long.The first alert for the global traffic drop took too long to fire.We didn’t update our status page quickly enough.We had difficulty accessing our own systems because of the outage and the bypass procedure wasn’t well trained on.SREs had lost access to some systems because their credentials had been timed out for security reasons.Our customers were unable to access the Cloudflare Dashboard or API because they pass through the Cloudflare edge.What’s happened since last TuesdayFirstly, we stopped all release work on the WAF completely and are doing the following:Re-introduce the excessive CPU usage protection that got removed. (Done)Manually inspecting all 3,868 rules in the WAF Managed Rules to find and correct any other instances of possible excessive backtracking. (Inspection complete)Introduce performance profiling for all rules to the test suite. (ETA: July 19)Switching to either the re2 or Rust regex engine which both have run-time guarantees. (ETA: July 31)Changing the SOP to do staged rollouts of rules in the same manner used for other software at Cloudflare while retaining the ability to do emergency global deployment for active attacks.Putting in place an emergency ability to take the Cloudflare Dashboard and API off Cloudflare's edge.Automating update of the Cloudflare Status page.In the longer term we are moving away from the Lua WAF that I wrote years ago. We are porting the WAF to use the new firewall engine. This will make the WAF both faster and add yet another layer of protection.ConclusionThis was an upsetting outage for our customers and for the team. We responded quickly to correct the situation and are correcting the process deficiencies that allowed the outage to occur and going deeper to protect against any further possible problems with the way we use regular expressions by replacing the underlying technology used.We are ashamed of the outage and sorry for the impact on our customers. We believe the changes we’ve made mean such an outage will never recur. Appendix: About Regular Expression BacktrackingTo fully understand how (?:(?:\"|'|\]|\}|\\|\d|(?:nan|infinity|true|false|null|undefined|symbol|math)|\`|\-|\+)+[)]*;?((?:\s|-|~|!|{}|\|\||\+)*.*(?:.*=.*))) caused CPU exhaustion you need to understand a little about how a standard regular expression engine works. The critical part is .*(?:.*=.*). The (?: and matching ) are a non-capturing group (i.e. the expression inside the parentheses is grouped together as a single expression). For the purposes of the discussion of why this pattern causes CPU exhaustion we can safely ignore it and treat the pattern as .*.*=.*. When reduced to this, the pattern obviously looks unnecessarily complex; but what's important is any "real-world" expression (like the complex ones in our WAF rules) that ask the engine to "match anything followed by anything" can lead to catastrophic backtracking. Here’s why.In a regular expression, . means match a single character, .* means match zero or more characters greedily (i.e. match as much as possible) so .*.*=.* means match zero or more characters, then match zero or more characters, then find a literal = sign, then match zero or more characters.Consider the test string x=x. This will match the expression .*.*=.*. The .*.* before the equal can match the first x (one of the .* matches the x, the other matches zero characters). The .* after the = matches the final x.It takes 23 steps for this match to happen. The first .* in .*.*=.* acts greedily and matches the entire x=x string. The engine moves on to consider the next .*. There are no more characters left to match so the second .* matches zero characters (that’s allowed). Then the engine moves on to the =. As there are no characters left to match (the first .* having consumed all of x=x) the match fails.At this point the regular expression engine backtracks. It returns to the first .* and matches it against x= (instead of x=x) and then moves onto the second .*. That .* matches the second x and now there are no more characters left to match. So when the engine tries to match the = in .*.*=.* the match fails. The engine backtracks again.This time it backtracks so that the first .* is still matching x= but the second .* no longer matches x; it matches zero characters. The engine then moves on to try to find the literal = in the .*.*=.* pattern but it fails (because it was already matched against the first .*). The engine backtracks again.This time the first .* matches just the first x. But the second .* acts greedily and matches =x. You can see what’s coming. When it tries to match the literal = it fails and backtracks again.The first .* still matches just the first x. Now the second .* matches just =. But, you guessed it, the engine can’t match the literal = because the second .* matched it. So the engine backtracks again. Remember, this is all to match a three character string.Finally, with the first .* matching just the first x, the second .* matching zero characters the engine is able to match the literal = in the expression with the = in the string. It moves on and the final .* matches the final x.23 steps to match x=x. Here’s a short video of that using the Perl Regexp::Debugger showing the steps and backtracking as they occur.That’s a lot of work but what happens if the string is changed from x=x to x=xx? This time is takes 33 steps to match. And if the input is x=xxx it takes 45. That’s not linear. Here’s a chart showing matching from x=x to x=xxxxxxxxxxxxxxxxxxxx (20 x’s after the =). With 20 x’s after the = the engine takes 555 steps to match! (Worse, if the x= was missing, so the string was just 20 x’s, the engine would take 4,067 steps to find the pattern doesn’t match).This video shows all the backtracking necessary to match x=xxxxxxxxxxxxxxxxxxxx:That’s bad because as the input size goes up the match time goes up super-linearly. But things could have been even worse with a slightly different regular expression. Suppose it had been .*.*=.*; (i.e. there’s a literal semicolon at the end of the pattern). This could easily have been written to try to match an expression like foo=bar;.This time the backtracking would have been catastrophic. To match x=x takes 90 steps instead of 23. And the number of steps grows very quickly. Matching x= followed by 20 x’s takes 5,353 steps. Here’s the corresponding chart. Look carefully at the Y-axis values compared the previous chart.To complete the picture here are all 5,353 steps of failing to match x=xxxxxxxxxxxxxxxxxxxx against .*.*=.*;Using lazy rather than greedy matches helps control the amount of backtracking that occurs in this case. If the original expression is changed to .*?.*?=.*? then matching x=x takes 11 steps (instead of 23) and so does matching x=xxxxxxxxxxxxxxxxxxxx. That’s because the ? after the .* instructs the engine to match the smallest number of characters first before moving on.But laziness isn’t the total solution to this backtracking behaviour. Changing the catastrophic example .*.*=.*; to .*?.*?=.*?; doesn’t change its run time at all. x=x still takes 555 steps and x= followed by 20 x’s still takes 5,353 steps.The only real solution, short of fully re-writing the pattern to be more specific, is to move away from a regular expression engine with this backtracking mechanism. Which we are doing within the next few weeks.The solution to this problem has been known since 1968 when Ken Thompson wrote a paper titled “Programming Techniques: Regular expression search algorithm”. The paper describes a mechanism for converting a regular expression into an NFA (non-deterministic finite automata) and then following the state transitions in the NFA using an algorithm that executes in time linear in the size of the string being matched against.Thompson’s paper doesn’t actually talk about NFA but the linear time algorithm is clearly explained and an ALGOL-60 program that generates assembly language code for the IBM 7094 is presented. The implementation may be arcane but the idea it presents is not.Here’s what the .*.*=.* regular expression would look like when diagrammed in a similar manner to the pictures in Thompson’s paper.Figure 0 has five states starting at 0. There are three loops which begin with the states 1, 2 and 3. These three loops correspond to the three .* in the regular expression. The three lozenges with dots in them match a single character. The lozenge with an = sign in it matches the literal = sign. State 4 is the ending state, if reached then the regular expression has matched.To see how such a state diagram can be used to match the regular expression .*.*=.* we’ll examine matching the string x=x. The program starts in state 0 as shown in Figure 1. The key to making this algorithm work is that the state machine is in multiple states at the same time. The NFA will take every transition it can, simultaneously.Even before it reads any input, it immediately transitions to both states 1 and 2 as shown in Figure 2.Looking at Figure 2 we can see what happened when it considers first x in x=x. The x can match the top dot by transitioning from state 1 and back to state 1. Or the x can match the dot below it by transitioning from state 2 and back to state 2.So after matching the first x in x=x the states are still 1 and 2. It’s not possible to reach state 3 or 4 because a literal = sign is needed.Next the algorithm considers the = in x=x. Much like the x before it, it can be matched by either of the top two loops transitioning from state 1 to state 1 or state 2 to state 2, but additionally the literal = can be matched and the algorithm can transition state 2 to state 3 (and immediately state 4). That’s illustrated in Figure 3.Next the algorithm reaches the final x in x=x. From states 1 and 2 the same transitions are possible back to states 1 and 2. From state 3 the x can match the dot on the right and transition back to state 3. At that point every character of x=x has been considered and because state 4 has been reached the regular expression matches that string. Each character was processed once so the algorithm was linear in the length of the input string. And no backtracking was needed.It might also be obvious that once state 4 was reached (after x= was matched) the regular expression had matched and the algorithm could terminate without considering the final x at all.This algorithm is linear in the size of its input.

To learn more about the origins of The Network is the Computer®, I spoke with John Gage, the creator of the phrase and the 21st employee of Sun Microsystems. John had a key role in shaping the vision of Sun and had a lot to share about his vision for the future. Listen to our conversation here and read the full transcript below. [00:00:13]John Graham-Cumming: I’m talking to John Gage who was what, the 21st employee of Sun Microsystems, which is what Wikipedia claims and it also claims that you created this phrase “The Network is the Computer,” and that's actually one of the things I want to talk about with you a little bit because I remember when I was in Silicon Valley seeing that slogan plastered about the place and not quite understanding what it meant. So do you want to tell me what you meant by it or what Sun meant by it at the time?[00:00:40]John Gage: Well, in 2019, recalling what it meant in 1982 or 83’ will be colored by all our experience since then but at the time it seemed so obvious that when we introduced the first scientific workstations, they were not very powerful computers. The first Suns had a giant screen and they were on the Internet but they were designed as a complementary component to supercomputers. Bill Joy and I had a series of diagrams for talks we’d give, and Bill had the bi-modal, the two node picture. The serious computing occurred on the giant machines where you could fly into the heart of a black hole and the human interface was the workstation across the network. So each had to complement the other, each built on the strengths of the other, and each enhanced the other because to deal in those days with a supercomputer was very ugly. And to run all your very large computations, you could run them on a Sun because we had virtual memory and series of such advanced things but not fast. So the speed of scientific understanding is deeply affected by the tools the scientist has — is it a microscope, is it an optical telescope, is it a view into the heart of a star by running a simulation on a supercomputer? You need to have the loop with the human and the science constantly interacting and constantly modifying each other, and that’s what the network is for, to tie those different nodes together in as seamless a way as possible. Then, the instant anyone that’s ever created a programming language says, “so if I have to create a syntax of this where I’m trying to let you express, do this, how about the delay on the network, the latency”? Does your phrase “The Network is the Computer” really capture this hundreds, thousands, tens of thousands, millions perhaps at that time, now billions and billions and billions today, all these devices interacting and exchanging state with latency, with delay. It’s sort of an oversimplification, and that we would point out, but it’s just network is the computer. Four words, you know, what we tried to do is give a metaphor that allows you to explore it in your mind and think of new things to do and be inspired.[00:03:35]Graham-Cumming: And then by a sort of strange sequence of events, that was a trademark of Sun. It got abandoned. And now Cloudflare has swooped in and trademarked it again. So now it's our trademark which sort of brings us full circle, I suppose.[00:03:51]Gage: Well, trademarks are dealing with the real world, but the inspiration of Cloudflare is to do exactly what Bill Joy and I were talking about in 1982. It's to build an environment in which every participant globally can share with security, and we were not as strong. Bill wrote most of the code of TCP/IP implemented by every other computer vendor, and still these questions of latency, these questions of distributed denial of service which was, how do you block that? I was so happy to see that Cloudflare invests real money and real people in addressing those kinds of critical problems, which are at the core, what will destroy the Internet. [00:14:48]Graham-Cumming: Yes, I agree. I mean, it is a significant investment to actually deal with it and what I think people don't appreciate about the DDoS attack situation is that they are going on all the time and it's just a continuous, you know, just depends who the target is. It's funny you mentioned TCP/IP because about 10 years after, so in about ‘92, my first real job, I had to write a TCP/IP stack for an obscure network card. And this was prior to the Internet really being available everywhere. And so I didn't realize I could go and get the BSD implementation and recompile it. So I did it from scratch from the RFCs. [00:05:23]Gage: You did! [00:05:25]Graham-Cumming: And the thing I recommend here is that nobody ever does that because, you know, the real world, real code that really interacts is really hard when you're trying to work it with other things, so.[00:05:36]Gage: Do you still, John, do you have that code? [00:05:42]Graham-Cumming: I wonder. I have the binary for it. [00:05:46]Gage: Do hunt for it, because our story was at the time DARPA, the Defense Advanced Research Projects Agency, that had funded networking initiatives around the world. I just had a discussion yesterday with Norway and they were one of the first entities to implement using essentially Bill Joy’s code, but to be placed on the ARPANET. And a challenge went out, and at that time the slightly older generation, the Bolt Beranek and Newman Group, Vint Cerf, Bob Con, those names, as Vint Cerf was a grad student at UCLA where he had built one of the four first Internet sites and the DARPA offices were in Arlington, Virginia, they had massive investments in detection of nuclear underground tests, so seismological data, and the moment we made the very first Suns, I shipped them to DARPA, we got the network up and began serving seismic data globally. Really lovely visualization of events. If you’re trying to detect something, those things go off and then there’s a distinctive signature, a collapse of the underground cavern after. So DARPA had tried to implement, as you did, from the spec, from the RFC, the components, and Vint had designed a lot of this, all the acknowledgement codes and so forth that you had to implement in TCP/IP. So Bill, as a graduate student at Berkeley, we had a meeting in Arlington at DARPA headquarters where BBN and AT&T Bell Labs and a number of other people were in the room. Their code didn’t work, this graduate student from Berkeley named Bill Joy, his code did work, and when Bob Kahn and Vint Cerf asked Bill, “Well, so how did you do it?” What he said was exactly what you just said, he said, “I just read the spec and wrote the code.” [00:08:12]Graham-Cumming: I do remember very distinctly because the company I was working at didn’t have a TCP/IP stack and we didn’t have any IP machines, right, we were doing actually stuff that was all IBM networking, SMA stuff. Somehow we bought what was at that point a HP machine, it was an Apollo workstation and a Sun workstation. I had them on Ethernet and talking to each other. And I do distinctly remember the first time a ping packet came back from that Sun box, saying, yes I managed to send you an IP packet, you managed to send me ICMP response and that was pretty magical. And then I got to TCP and that was hard. [00:08:55]Gage: That was hard. Yeah. When you get down to the details, the spec can be wrong. I mean, it will want you to do something that’s a stupid thing to do. So Bill has such good taste in these things. It would be interesting to do a kind of a diff across the various implementations of the stack. Years and years later we had maybe 50 companies all assemble in a room, only engineers, throw out all the marketing people and all the Ps and VPs and every company in this room—IBM, Hewlett-Packard—oh my God, Hewlett-Packard, fix your TCP—and we just kept going until everybody could work with everybody else in sort of a pact. We’re not going to reveal, Honeywell, that you guys were great with earlier absolute assembly code, determinate time control stuff but you have no clue about how packets work, we’ll help you, so that all of us can make every machine interoperate, which yielded the network show, Interop. Every year we would go put a bunch of fiber inside whatever, you know, Geneva, or pick some, Las Vegas, some big venue. [00:10:30]Graham-Cumming: I used to go to Vegas all the time and that was my great introduction to Vegas was going there for Interop, year after year. [00:10:35]Gage: Oh, you did! Oh, great. [00:10:36]Graham-Cumming: Yes, yes, yes. [00:10:39]Gage: You know in a way, what you’re doing with, for example, just last week with the Verizon problem, everybody implementing what you’re doing now that is not open about their mistakes and what they’ve learned and is not sharing this, it’s a problem. And your global presence to me is another absolutely critical thing. We had about, I forget, 600 engineers in Beijing at the East Gate of Tsinghua a lot of networking expertise and lots of those people are at Tencent and Huawei and those network providers throughout the rest of the world, politics comes and goes but the engineering has to be done in a way that protects us. And so these conversations globally are critical. [00:11:33]Graham-Cumming: Yes, that's one of the things that’s fascinating actually about doing real things on the real Internet is there is a global community of people making computers talk to each other and you know, that it's a tremendously complicated thing to actually make that work, and you do it across countries, across languages. But you end up actually making them work, and that's the Internet we're sitting on, that you and I are talking on right now that is based on those conversations around the world.[00:12:01]Gage: And only by doing it do you understand more deeply how to do it. It’s very difficult in the abstract to say what should happen as we begin to spread. As Sun grew, every major city in Africa had installations and for network access, you were totally dependent on an often very corrupt national telco or the complications dealing with these people just to make your packet smooth. And as it turned out, many of the intelligence and military entities in all of these countries had very little understanding of any of this. That’s changed to some degree. But the dangerous sides of the Internet. Total surveillance, IPv6, complete control of exact identity of origins of packets. We implemented, let’s see, you had an early Sun. We probably completed our IPv6 implementation, was it still fluid in the 90s, but I remember 10 years after we finished a complete implementation of IPv6, the U.S. was still IPv4, it’s still IPv4. [00:13:25]Graham-Cumming: It still is, it still is. Pretty much. Except for the mobile carriers right now. I think in general the mobile phone operators are the ones who've gone more into IPv6 than anybody else.[00:13:37]Gage: It was remarkable in China. We used to have a conference. We’d bring a thousand Chinese universities into a room. Professor Wu from Tsinghua who built the Chinese Education and Research Network, CERNET. And now a thousand universities have a building on campus doing Internet research. We would get up and show this map of China and he kept his head down politically, but he managed at the point when there was a big fight between the Minister of Telecom and the Minister of Railways. The Minister of Railways said, look, I have continuity throughout China because I have railines. I’ve just made a partnership with the People’s Liberation Army, and they are essentially slave labor, and they’re going to dig the ditches, and I’m going to run fiber alongside the railways and I don’t care what you, the Minister of Telecommunications, has to say about it, because I own the territory. And that created a separate pathway for the backbone IPv6 network in China. Cheap, cheap, cheap, get everybody doing things.[00:14:45]Graham-Cumming: Yes, now of course in China that’s resulted in an interesting situation where you have China Telecom and China Unicom, who sort of cooperate with each other but they’re almost rivals which makes IP packets quite difficult to route inside China.[00:14:58]Gage: Yes exactly. At one point I think we had four hunks of China. Everyone was geographically divided. You know there were meetings going on, I remember the moment they merged the telecom ministry with the electronics ministry and since we were working with both of them, I walk in a room and there’s a third group, people I didn’t know, it turns out that’s the People’s Liberation Army. [00:15:32]Graham-Cumming: Yes, they’re part of the team. So okay, going back to this “Network is the Computer” notion. So you were talking about the initial things that you were doing around that, why is it that it's okay that Cloudflare has gone out and trademarked that phrase now, because you seem to think that we've got a leg to stand on, I guess.[00:15:56]Gage: Frankly, I’d only vaguely heard of Cloudflare. I’ve been working in areas, I’ve got a project in the middle of Nairobi in the slum where I’ve spent the last 15 years or so learning a lot about clean water and sewage treatment because we have almost 400,000 people in a very small area, biggest slum in East Africa. How can you introduce sanitary water and clean sewage treatment into a very, an often corrupt, a very difficult environment, and so that’s been a fascination of mine and I’ve been spending a lot of time. What's a computer person know about fluid dynamics and pathogens? There’s a lot to learn. So as you guys grew so rapidly, I vaguely knew of you but until I started reading your blog about post-quantum crypto and how do we devise a network in these resilient denial of service attacks and all these areas where you’re a growing company, it’s very hard to take time to do serious advanced research-level work on distributed computing and distributed security, and yet you guys are doing it. When Bill created Java, the subsequent step from Java for billions and billions of devices to share resources and share computations was something we call Genie which is a framework for validation of who you are, movement of code from device to device in a secure way, total memory control so that someone is not capable of taking over memory in your device as we’ve seen with Spectre and the failures of these billions of Intel chips out there that all have a flaw on take all branches parallel compute implementations. So the very hardware you’re using can be insecure so your operating systems are insecure, the hardware is insecure, and yet you’re trying to build on top with fallible pieces in infallible systems. And you’re in the middle of this, John, which I’m so impressed by.[00:18:13]Graham-Cumming: And Jini sort of lives on as called Apache River now. It moved away from Sun and into an Apache project. [00:18:21]Gage: Yes, very few people seem to realize that the name Apache is a poetic phrasing of “a patchy system.” We patch everything because everything is broken. We moved a lot of it, Brian Behlendorf and the Apache group. Well, many of the innovations at Sun, Java is one, file systems that are far more secure and far more resilient than older file systems, the SPARC implementation, I think the SPARC processor, even though you’re using the new ARM processors, but Fujitsu, I still think keeps the SPARC architecture as the world’s fastest microprocessor. [00:19:16]Graham-Cumming: Right. Yes. Being British of course, ARM is a great British success. So I'm honor-bound to use that particular architecture. Clearly.[00:19:25]Gage: Oh, absolutely. And the power. That was the one always in a list of what our engineering goals are. We wanted to make, we were building supercomputers, we were building very large file servers for the telcos and the banks and the intelligence agencies and all these different people, but we always wanted to make a low power and it just fell off the list of what you could accomplish and the ARM chips, their ratios of wattage to packets treated are—you have a great metric on your website someplace about measuring these things at a very low level—that’s key.[00:20:13]Graham-Cumming: Yes, and we had Sophie Wilson, who of course is one of the founders of ARM and actually worked on the original chip, tell this wonderful story at our Internet Summit about how the first chip they hooked up was operating fine until they realized they hadn't hooked the power up and they were asked to. It was so low power that it was able to use the power that was coming in over the logic lines to actually power the whole chip. And they said to me, wait a minute, we haven't plugged the power in but the thing is running, which was really, I mean that was an amazing achievement to have done that.[00:20:50]Gage: That’s amazing. We open sourced SPARC, the instruction set, so that anybody doing crypto that also had Fab capabilities could implement detection of ones and zeroes, sheep and goats, or other kinds of algorithms that are necessary for very high speed crypto. And that’s another aspect that I’m so impressed by Cloudflare. Cloudflare is paying attention at a machine instruction level because you’re implementing with your own hardware packages in what, 180 cities? You’re moving logistically a package into Ulan Bator, or into Mombasa and you’re coming up live. [00:21:38]Graham-Cumming: And we need that to be inexpensive and fast because we're promising people that we will make their Internet properties faster and secure at the same time and that's one of the interesting challenges which is not trading those two things off. Which means your crypto better be fast, for example, and that requires a lot of fiddling around at the hardware level and understanding it. In our case because we're using Intel, really what Intel chips are doing at the low level.[00:22:10]Gage: Intel did implement a couple of things in one or another of the more recent chips that were very useful for crypto. We had a group of the SPARC engineers, probably 30, at a dinner five or six months ago discussing, yes, we set the world standard for parallel execution branching optimizations for pipelines and chips, and when the overall design is not matched by an implementation that pays attention to protecting the memory, it’s a fundamental, exploitable flaw. So a lot of discussion about this. Selecting precisely which instructions are the most important, the risk analysis with the ability to make a chip specifically to implement a particular algorithm, there’s a lot more to go. We have multiples of performance ahead of us for specific algorithms based on a more fluid way to add instructions that are necessary into a specific piece of hardware. And then we jump to quantum. Oh my.[00:23:32]Graham-Cumming: Yes. To talk about that a little bit, the ever-increasing speed of processors and the things we can do; Do you think we actually need that given that we're now living in this incredibly distributed world where we are actually now running very distributed algorithms and do we really need beefier machines?[00:23:49]Gage: At this moment, in a way, it’s you making fun of Bill Joy for only wanting a megabit in Aspen. When Steve Jobs started NeXT, sadly his hardware was just terrible, so we sent a group over to boost NeXT. In fact we sort of secretly slipped him $30 million to keep him afloat. And I’d say, “Jobs, if you really understood something about hardware, it would really be useful here.” So one of the main team members that we sent over to NeXT came to live in Aspen and ended up networking the entire valley. At a point, megabit for what you needed to do, seemed reasonable, so at this moment, as things become alive by the introduction of a little bit of intelligence in them, some little flickering chip that’s able to execute an algorithm, many tasks don’t require. If you really want to factor things fast, quantum, quantum. Which will destroy our existing crypto systems. But if you are just bringing the billions of places where a little bit of knowledge can alter locally a little bit of performance, we could do very well with the compute power that we have right now. But making it live on the network, securely, that’s the key part. The attacks that are going on, simple errors as you had yesterday, are simple errors. In a way, across Cloudflare’s network, you’re watching the challenges of the 21st century take place: attacks, obscure, unknown exploits of devices in the power and water control systems. And so, you are in exactly the right spot to not get much sleep and feel a heavy responsibility.[00:26:20]Graham-Cumming: Well it certainly felt like it yesterday when we were offline for 27 minutes, and that’s when we suddenly discovered, we sort of know how many customers we have, and then we really discover when they start phoning us. Our support line had his own DDoS basically where it didn’t work anymore because so many people signed in. But yes, I think that it's interesting your point about a little bit extra on a device somewhere can do something quite magical and then you link it up to the network and you can do a lot. What we think is going on partly is some things around AI, where large amounts of machine learning are happening on big beefy machines, perhaps in the cloud, perhaps groups of machines, and then devices are doing their own little bits of inference or recognizing faces and stuff like that. And that seems to be an interesting future where we have these devices that are actually intelligent in our pockets. [00:27:17]Gage: Oh, I think that’s exactly right. There’s so much power in your pocket. I’m spending a lot of time trying to catch up that little bit of mathematics that you thought you understood so many years ago and it turns out, oh my, I need a little bit of work here. And I’ve been reading Michael Jordan’s papers and watching his talks and he’s the most cited computer scientist in machine learning and he will always say, “Be very careful about the use of the phrase, ‘Artificial Intelligence’.” Maybe it’s a metaphor like “The Network is the Computer.” But, we’re doing gradient descent optimization. Is the slope going up, or is the slope going down? That’s not smart. It’s useful and the real time language translation and a lot of incredible work can occur when you’re doing phrases. There’s a lot of great pattern work you can do, but he’s out in space essentially combining differentiation and integration in a form of integral. And off we go. Are your hessians rippling in the wind? And what’s the shape of this slope? And is this actually the fastest path from here to there to constantly go downhill. Maybe it’s sometimes going uphill and going over and then downhill that’s faster. So there’s just a huge amount of new mathematics coming in this territory and each time, as we move from 2G to 3G to 4G to 5G, many people don’t appreciate that the compression algorithms changed between 2G, 3G, 4G and 5G and as a result, so much more can move into your mobile device for the same amount of power. 10 or 20 times more for the same about of power. And mathematics leads to insights and applications of it. And you have a working group in that area, I think. I tried to probe around to see if you’re hiring.[00:30:00]Graham-Cumming: Well you could always just come around to just ask us because we'll probably tell you because we tend to be fairly transparent. But yes, I mean compression is definitely an area where we are interested in doing things. One of the things I first worked on at Cloudflare was a thing that did differential compression based on the insight that web pages don’t actually change that much when you hit ‘refresh’. And so it turns out that if you if you compress based on the delta from the last thing you served to someone you can actually send many orders of magnitude less data and so there's lots of interesting things you can do with that kind of insight to save a tremendous amount of bandwidth. And so yeah, definitely compression is interesting, crypto is interesting to us. We’ve actually open sourced some of our compression improvements in zlib which was very popular compression algorithm and now it's been picked up. It turns out that in neuroscience, because there's a tremendous amount of data which needs compression and there are pipelines used in neuroscience where actually having better compression algorithms makes you work a lot faster. So it's fascinating to see the sort of overspill of things we’re doing into other areas where I know nothing about what goes on inside the brain.[00:31:15]Gage: Well isn’t that fascinating, John. I mean here you are, the CTO of Cloudflare working on a problem that deeply affects the Internet, enabling a lot more to move across the Internet in less time with less power, and suddenly it turns into a tool for brain modeling and neuroscientists. This is the benefit. There’s a terrific initiative. I’m at Berkeley. The Jupiter notebooks created by Fernando Perez, this environment in which you can write text and code and share things. That environment, taken up by machine learning. I think it’s a major change. And the implementation of diagrams that are causal. These forms of analysis of what caused what. These are useful across every discipline and for you to model traffic and see patterns emerge and find webpages and see the delta has changed and then intelligently change the pattern of traffic in response to it, it’s all pretty much the same thing here.[00:32:53]Graham-Cumming: Yes and then as a mathematician, when I see things that are the same thing, I can't help wondering what the real deep structure is underneath. There must be another layer another layer down or something. So as you know it's this thing. There's some other deeper layer below all this stuff. [00:33:12]Gage: I think this is just endlessly fascinating. So my only recommendations to Cloudflare: first, double what you’re doing. That’s so hard because as you go from 10 people to 100 people to 1,000 people to 10,000 people, it’s a different world. You are a prime example, you are global. Suddenly you’re able to deal with local authorities in 60-70 countries and deal with some of the world’s most interesting terrain and with network connectivity and moving data, surveillance, and some security of the foundation infrastructure of all countries. You couldn’t be engaged in more exciting things.[00:34:10]Graham-Cumming: It's true. I mean one of the most interesting things to me is that I have grown up with the Internet when I you know I got an email address using actually the crazy JANET scheme in the UK where the DNS names were backwards. I was in Oxford and they gave me an email address and it was I think it was JGC at uk dot ac dot ox dot prg and that then at some point it flipped around and it went to DNS looked like it had won. For a long time my address was the wrong way around. I think that's a typically British decision to be slightly different to everybody else.[00:35:08]Gage: Well, Oxford’s always had that style, that we’re going to do things differently. There’s an Oxford Center for the 21st century that was created by the money from a wonderful guy who had donated maybe $100 million. And they just branched out into every possible research area. But when you went to meetings, you would enter a building that was built at the time of the Raj. It was the India temple of colonialism. [00:35:57]Graham-Cumming: There's quite a few of those in the UK. Are you thinking of the Martin School? James Martin. And he gave a lot of money to Oxford. Well the funny thing about that was the programming research group. The one thing they didn't teach us really as an undergraduate was how to program which was one of the most fascinating things they have because that was a bit getting your hands dirty so you needed to let all the theory. So we learnt all the theory we did a little bit of functional programming and that was the extent of it which set me really up badly for a career in an industry. My first job I had to pretend I knew how to program and see and learn very quickly. [00:36:42]Gage: Oh my. Well now you’ve been writing code in Go. [00:36:47]Graham-Cumming: Yes. Well the thing about Go, the other Oxford thing of course is Tony Hoare, who is a professor of computer science there. He had come up with this thing called CSP (Communicating Sequential Processes) so that was a whole theory around how you do parallel execution. And so of course everybody used his formalism and I did in my doctoral thesis and so when Go came along and they said oh this how Go works, I said, well clearly that’s CSP and I know how to do this. So I can do it again. [00:37:23]Gage: Tony Hoare occasionally would issue a statement about something and it was always a moment. So few people seem to realize the birth of so much of what we took in the 60s, 70s, 80s, in Silicon Valley and Berkeley, derived from the Manchester Group, the virtual memory work, these innovations. Today, Whit Diffie. He used to love these Bletchley stories, they’re so far advanced. That generation has died off. [00:38:37]Graham-Cumming: There’s a very peculiar thing in computer science and the real application of computing which is that we both somehow sit on this great knowledge of the past of computing and at the same time we seem to willfully forget it and reinvent everything every few years. We go through these cycles where it's like, let’s do centralized computing, now distributed computing. No, let’s have desktop PCs, now let’s have the cloud. We seem to have this collective amnesia and then on occasion people go, “Oh, Leslie Lamport wrote this thing in 1976 about this problem”. What other subject do we willfully forget the past and then have to go and doing archaeology to discover again?[00:39:17]Gage: As a sociological phenomenon it means that the older crowd in a company are depressing because they’ll say, “Oh we tried that and it didn’t work”. Over the years as Sun grew from 15 people or so and ended up being like 45,000 people before we were sold off to Oracle and then everybody dumped out because Oracle didn’t know too much about computing. So Ivan Sutherland, Whit Diffie. Ivan actually stayed on. He may actually still have an Oracle email. Almost all of the research groups, certainly the chip group went off to Intel, Fujitsu, Microsoft. It’s funny to think now that Microsoft’s run by a Sun person. [00:40:19]Graham-Cumming: Well that's the same thing. Everyone’s forgotten that Microsoft was the evil empire not that long ago. And so now it’s not. Right now it’s cool again. [00:40:28]Gage: Well, all of the embedded stuff from Microsoft is still that legacy that Bill Gates who’s now doing wonderful things with the Gates Foundation. But the embedded insecurity of the global networks is due to, in large part, the insecurities, that horrible engineering of Microsoft embedded everywhere. You go anywhere in China to some old industrial facility and there is some old not updated junky PC running totally insecure software. And it’s controlling the grid. It’s discouraging. It’s like a lot of the SCADA systems. [00:41:14]Graham-Cumming: I’m completely terrified of SCADA systems. [00:41:20]Gage: The simplest exploits. I mean, it’s nothing even complicated. There are a series of emerging journalists today that are paying attention to cybersecurity and people have come out with books even very recently. Well, now because we’re in this China, US, Iran nightmare, a United States presidential directive taking the cybersecurity crowd and saying, oops, now you’re an offensive force. Which means we got some 20-year-old lieutenant somewhere who suddenly might just for fun turn off Tehran’s water supply or something. This is scary because the SCADA systems are embedded everywhere, and they’re, I don’t know, would you say totally insecure? Just the simple things, just simple exploits. One of the journalists described, I guess it was the Russians who took a bunch of small USB sticks and at a shopping center near a military base just gave them away. And people put them into their PCs inside SIPRNet, inside the secure U.S. Department of Defense network. Instantly the network was taken over just by inserting a USB device to something on the net. And there you are, John, protecting against this. [00:43:00]Graham-Cumming: Trying hard to protect against these things, yes absolutely. It's very interesting because you mentioned before how rapidly Cloudflare had grown over the last few years. And of course Sun also really got going pretty rapidly, didn’t it?[00:43:00]Gage: Well, yes. The first year we were just some students from Berkeley, hardware from Stanford, Andy Bechtolsheim, software from Berkeley, Berkeley Unix BSD, Bill Joy. Combine the two, and 10 of us or so, and we were, I think the first year was 12 million booked, the second year was 50 or 60 million booked, and the third year was 150 or so million booked and then we hit 500 million and then we hit a billion. And now, it’s selling boxes, we were a manufacturing company so that’s different from software or services, but we also needed lots of people and so we instantly raided the immense benefit of variety of people in the San Francisco Bay Area, with Berkeley and Stanford. We had students in computer science, and mechanical engineering, and physics, and mathematics from every country in the world and we recruited from every country in the world. So a great part of Sun’s growth came, as you are, expanding internationally, and at one point I think we ran most of the telcos of the world, we ran China Mobile. 900 million subscribers on China Mobile, all Sun stuff in the back. Throughout Africa, every telco was running Sun and Cisco until Huawei knocked Cisco out. It was an amazing time. [00:44:55]Graham-Cumming: You ran the machine that ran Latek, that let me get my doctoral thesis done. [00:45:01]Gage: You know that’s how I got into it, actually. I was in econometrics and mathematics at Berkeley, and I walk down a hallway and outside a room was that funny smell from photographic paper from something, and there was perfectly typeset mathematics. Troff and nroff, all those old UNIX utilities for the Bell Technical Journal, and I open the door and I’ve got to get in there. There’s two hundred people sitting in front of these beehive-like little terminals all typing away on a UNIX system. And I want to get an account and I walk down the hall and there's this skinny guy who types about 200 words a minute named Bill Joy. And I said, I need an account, I’ve got to type set integral signs, and he said, what’s your name. I tell him my name, John Gage, and he goes voop, and I’ve never seen anybody type as fast as him in my life. This is a new world, here. [00:45:58]Graham-Cumming: So he was rude then?[00:46:01]Gage: Yeah he was, he was. Well, it’s interesting since the arrival of a device at Berkeley to complement the arrival of an MIT professor who had implemented in LISP, mathematical, not typesetting of mathematics, but actual maxima. To get Professor fetman, maxima god from MIT, to come to Berkeley and live a UNIX environment, we had to put a LISP up outside on the PDP. So Bill took that machine which had virtual memory and implemented the environment for significant computational mathematics. And Steve Wolfram took that CalTech, and Princeton Institute for Advanced Studies, and now we have Mathematica. So in a way, all of Sun and the UNIX world derived from attempting to do executable mathematics.[00:47:17]Graham-Cumming: Which in some ways is what computers are doing. I think one of the things that people don’t really appreciate is the extent to which all numbers underneath.[00:47:28]Gage: Well that’s just this discrete versus continuous problem that Michael Jordan is attempting to address. To my current total puzzlement and complete ignorance, is what in the world is symplectic integration? And how do Lyapunov functions work? Oh, no clue.[00:47:50]Graham-Cumming: Are we going to do a second podcast on that? Are you going to come back and teach us? [00:47:55]Gage: Try it. We’re on, you’re on, you’re on. Absolutely. But you’ve got to run a company. [00:48:00]Graham-Cumming: Well I've got some things to do. Yeah. But you can go do that and come tell us about it.[00:48:05]Gage: All right, Great John. Well it was terrific to talk to you.[00:48:08]Graham-Cumming: So yes it was wonderful speaking to you as well. Thank you for helping me dig up memories of when I was first fooling around with Sun Systems and, you know, some of the early days and of course “The Network is the Computer,” I'm not sure I fully yet understand quite the metaphor or even if maybe I do somehow deeply in my soul get it, but we’re going to try and make it a reality, whatever it is.[00:48:30]Gage: Well, I count it as a complete success, because you count as one of our successes because you‘re doing what you’re doing, therefore the phrase, “The Network is the Computer,” resides in your brain and when you get up in the morning and decide what to do, a little bit nudges you toward making the network work.[00:48:51]Graham-Cumming: I think that's probably true. And there's the dog, the dog is saying you've been yakking for an hour and now we better stop. So listen, thank you so much for taking the time. It was wonderful talking to you. You have a good day. Thank you very much.Interested in hearing more? Listen to my conversations with Ray Rothrock and Greg Papadopoulos of Sun Microsystems:Ray RothrockGreg PapadopoulosTo learn more about Cloudflare Workers, check out the use cases below:Optimizely - Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.Cordial - Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.AO.com - AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.Pwned Passwords - Troy Hunt’s popular "Have I Been Pwned" project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.Timely - Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store.Quintype - Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

Last week I spoke with Ray Rothrock, former Director of CAD/CAM Marketing at Sun Microsystems, to discuss his time at Sun and how the Internet has evolved. In this conversation, Ray discusses the importance of trust as a principle, the growth of Sun in sales and marketing, and that time he gave Vice President Bush a Sun demo. Listen to our conversation here and read the full transcript below. [00:00:07]John Graham-Cumming: Here I am very lucky to get to talk with Ray Rothrock who was I think one of the first investors in Cloudflare, a Series A investor and got the company a little bit of money to get going, but if we dial back a few earlier years than that, he was also at Sun as the Director of CAD/CAM Marketing. There is a link between Sun and Cloudflare. At least one, but probably more than one, which is that Cloudflare has recently trademarked, “The Network is the Computer”. And that was a Sun trademark, wasn’t it?[00:00:43]Ray Rothrock: It was, yes.[00:00:46]Graham-Cumming: I talked to John Gage and I asked him about this as well and I asked him to explain to me what it meant. And I'm going to ask you the same thing because I remember walking around the Valley thinking, that sounds cool; I’m not sure I totally understand it. So perhaps you can tell me, was I right that it was cool, and what does it mean?[00:01:06]Rothrock: Well it certainly was cool and it was extraordinarily unique at the time. Just some quick background. In those early days when I was there, the whole concept of networking computers was brand new. Our competitor Apollo had a proprietary network but Sun chose to go with TCP/IP which was a standard at the time but a brand new standard that very few people know about right. So when we started connecting computers and doing some intensive computing which is what I was responsible for—CAD/CAM in those days was extremely intensive whether it was electrical CAD/camera, or mechanical CAD/CAM, or even simulation solid design modeling and things—having a little extra power from other computers was a big deal. And so this concept of “The Network is the Computer” essentially said that you had one window into the network through your desktop computer in those days—there was no mobile computing at that time, this was like 84’, 85’, 86’ I think. And so if you had the appropriate software you could use other people's computers (for CPU power) and so you could do very hard problems at that single computer could not do because you could offload some of that CPU to the other computers. Now that was very nerdy, very engineering intensive, and not many people did it. We’d go to the SIGGRAPH, which was a huge graphics show in those days and we would demonstrate ten Sun computers for example, doing some graphic rendering of a 3D wireframe that had been created in the CAD/CAM software of some sort. And it was, it was hard, and that was in the mechanical side. On the electrical side, Berkeley had some software that was called Magic—it’s still around and is a very popular EDA software that’s been incorporated in those concepts. But to imagine calculating the paths in a very complicated PCB or a very complicated chip, one computer couldn't do it, but Sun had the fundamental technology. So from my seat at Sun at the time, I had access to what could be infinite computing power, even though I had a single application running, and that was a big selling point for me when I was trying to convince EDA and MDA companies to put their software on the Sun. That was my job. [00:03:38] Graham-Cumming: And hearing it now, it doesn’t sound very revolutionary, because of course we’re all doing that now. I mean I get my phone out of my pocket and connect to goodness knows what computing power which does image recognition and spots faces and I can do all sorts of things. But walk me through what it felt like at the time.[00:03:56]Rothrock: Just doing a Google search, I mean, how many data stores are being spun up for that? At the time it was incredible, because you could actually do side by side comparisons. We created some demonstrations, where one computer might take ten hours to do a calculation, two computers might take three hours, five computers might take 30 minutes. So with this demo, you could turn on computers and we would go out on the TCP/IP network to look for an available CPU that could give me some time. Let's go back even further. Probably 15 years before that, we had time sharing. So you had a terminal into a big mainframe and did all this swapping in and out of stuff to give you a time slice computing. We were doing the exact same thing except we were CPU slicing, not just time slicing. That’s pretty nerdy, but that's what we did. And I had to work with the engineering department, with all these great engineers in those days, to make this work for a demo. It was so unique, you know, their eyes would get big. You remember Novell...[00:05:37]Graham-Cumming: I was literally just thinking about Novell because I actually worked on IPX and SPX networking stuff at the time. I was going to ask you actually, to what extent do you think TCP/IP was a very important part of this revolution?[00:05:55]Rothrock: It was huge. It was fundamentally huge because it was a standard, so it was available and if you implemented it, you didn’t have to pay for it. When Bob Metcalfe did Ethernet, it was on top of the TCP stack. Sun, in my memory, and I could be wrong, was the first company to put a TCP/IP stack on the computer. And so you just plugged in the back, an RJ45 into this TCP/IP network with a switch or a router on it and you were golden. They made it so simple and so cheap that you just did it. And of course if you give an engineer that kind of freedom and it opens up. By the way, as the marketing guy at Sun, this was my first non-engineering job. I came from a very technical world of nuclear physics into Sun. And so it was stunning, just stunning.[00:06:59]Graham-Cumming: It’s interesting that you mentioned Novell and then you mentioned Apollo before that and obviously IBM had SNA networking and there were attempts to do all those networking things. It's interesting that these open standards have really enabled the explosion of everything else we've seen and with everything that's going on in the Internet.[00:07:23]Rothrock: Sun was open, so to speak, but this concept of open source now that just dominates the conversation. As a venture capitalist, every deal I ever invested in had open source of some sort in it. There was a while when it was very problematic in an M&A event, but the world’s gotten used to it. So open, is very powerful. It's like freedom. It's like liberty. Like today, July 4th, it’s a big deal. [00:07:52] Graham-Cumming: Yes, absolutely. It’s just interesting to see it explode today because I spent a lot of my career looking at so many different networking protocols. The thing that really surprises me, or perhaps shouldn’t surprise me when you’ve got these open things, is that you harness so many people's intelligence that you just end up with something that’s just better. It seems simple.[00:08:15]Rothrock: It seems simple. I think part of the magic of Sun is that they made it easy. Easy is the most powerful thing you can do in computing. Computing can be so nerdy and so difficult. But if you just make it easy, and Cloudflare has done a great job with that at that; they did it with their DNS service, they did it with all the stuff we worked on back when I was on the board and actively involved in the company. You’ve got to make it easy. I mean, I remember when Matthew and Lee worked like 20 hours a day on how to switch your DNS from whoever your provider was to Cloudflare. That was supposed to be one click, done. A to B. And that DNA was part of the magic. And whether we agree that Sun did it that way, to me at least, Sun did it that way as well. So it's huge, a huge lift.[00:09:08]Graham-Cumming: It’s funny you talk about that because at the time, how that actually worked is that we just asked people to give us their username and password. And we logged in and did it for them. Early on, Matthew asked me if I’d be interested in joining Cloudflare when it was brand new and because of other reasons I’d moved back to the UK and I wasn’t ready to change jobs and I’d just taken another job. And I remember thinking, this thing is crazy this Cloudflare thing. Who's going to hand over their DNS and their traffic to these four or five people above a nail salon in Palo Alto? And Matthew’s response was, “They’re giving us their passwords, let alone their traffic.” Because they were so desperate for it.[00:09:54]Rothrock: It tells you a lot about Matthew and you know as an attorney, I mean he was very sensitive to that and believes that one of the one of the founding principles is trust. His view was that, if I ever lose the customer’s trust, Cloudflare is toast. And so everything focused around that key value. And he was right.[00:10:18] Graham-Cumming: And you must have, at Sun, been involved with some high performance computing things that involved sensitive customers doing cryptography and things like that. So again trust is another theme that runs through there as well.[00:10:33]Rothrock: Yeah, very true. As the marketing guy of CAD/CAM, I was in the field two-thirds of the time, showing customers what was possible with them. My job was to get third party software onto the Sun box and then to turn that into a presentation to a customer. So I visited many government customers, many aerospace, power, all these very high falutin sort of behind the firewall kinds of guys in those days. So yes, trust was huge. It would come up: “Okay, so I’m using your CPU, how is it that you can’t use mine. And how do you convince me that you've not violated something.” In those days it was a whole different conversation that it is today but it was nonetheless just as important. In fact I remember I spent quite a bit of time at NCSA at the University of Illinois Urbana-Champaign. Larry Smarr was the head of NCSA. We spent a lot of time with Larry. I think John was there with me. John Gage and Vinod and some others but it was a big deal taking about high performance computing because that's what they were doing and doing it with Sun.[00:11:50]Graham-Cumming: So just to dial forward, so you’re at Venrock and you decide to invest in Cloudflare. What was it that made you think that this was worth investing in? Presumably you saw some things that were in some of Sun’s vision. Because Sun had a very wide-ranging visions about what was going to be possible with computing.[00:12:11]Rothrock: Yeah. Let me sort of touch on a few points probably. Certainly Sun was my first computer company I worked for after I got out of the nuclear business and the philosophy of the company was very powerful. Not only we had this cool 19 inch black and white giant Macintosh essentially although the Mac wasn't even born yet, but it had this ease of use that was powerful and had this open, I mean it was we preached that all the time and we made that possible. And Cloudflare—the related philosophy of Matthew and Michelle's genius—was they wanted to make security and distribution of data as free and easy as possible for the long tail. That was the first thinking because you didn't have access if you were in the long tail you were a small company you or you're just going to get whipped around by the big boys. And so there was a bit of, “We're here to help you, we're going to do it.” It's a good thing that the long tail get mobilized if you will or emboldened to use the Internet like the big boys do. And that was part of the attractiveness. I didn't say, “Boy, Matthew, this sounds like Sun,” but the concept of open and liberating which is what they were trying to do with this long tail DNS and CDN stuff was very compelling and seemed easy. But nothing ever is. But they made it look easy.[00:13:52]Graham-Cumming: Yeah, it never is. One of the parallels that I’ve noticed is that I think early on at Sun, a lot of Sun equipment went to companies that later became big companies. So some of these small firms that were using crazy work stations ended up becoming some of the big names in the Valley. To your point about the long tail, they were being ignored and couldn’t buy from IBM even if they wanted to. [00:14:25]Rothrock: They couldn’t afford SNA and they couldn’t do lots of things. So Sun was an enabler for these companies with cool ideas for products and software to use Sun as the underpinning. workstations were all the rage, because PCs were very limited in those days. Very very limited, they were all Intel based. Sun was 68000-based originally and then it was their own stuff, SPARC. You know in the beginning it was a cheap microprocessor from Motorola.[00:15:04]Graham-Cumming: What was the growth like at Sun? Because it was very fast, right?[00:15:09]Rothrock: Oh yes, it was extraordinarily fast. I think I was employee 130 or something like that. I left Sun in 1986 to go to business school and they gave me a leave of absence. Carol Bartz was my boss at that moment. The company was like at 2000 people just two and a half years later. So it was growing like a weed. I measured my success by how thick the catalyst—that was our catalog name and our program—how thick and how quickly I could add bonafide software developers to our catalog. We published on one sheet of paper front to back. When I first got there, our catalyst catalogue was a sheet of paper, and when I left, it was a book. It was about three-quarters of an inch thick. My group grew from me to 30 people in about a year and a half. It was extraordinary growth. We went public during that time, had a lot of capital and a lot of buzz. That openness, that our competition was all proprietary just like you were citing there, John. IBM and Apollo were all proprietary networks. You could buy a NIC card and stick it into your PC and talk to a Sun. And vise versa. And you couldn’t do that with IBM or Apollo. Do you remember those? [00:16:48]Graham-Cumming: I do because I was talking to John Gage. In my first job out of college, I wrote a TCP/IP stack from scratch, for a manufacturer of network cards. The test of this stack was I had an HP Apollo box and I had a Sun workstation and there was a sort of magical, can I talk to these devices? And can I ping them? And then that was already magical the first ping as it went across the network. And then, can I Telnet to one of these? So you know, getting the networking actually running was sort of the key thing. How important was networking for Sun in the early days? Was it always there? [00:17:35]Rothrock: Yeah, it was there from the beginning, the idea of having a network capability. When I got there it was network; the machine wasn’t standalone at all. We sort of mimicked the mainframe world where we had green screens hooked into a Sun in a department for example. And there was time sharing. But as soon as you got a Sun on your desk, which was rare because we were shipping as many as we could build, it was fantastic. I was sharing information with engineering and we were working back and forth on stuff. But I think it was fundamental: you have a microprocessor, you’ve got a big screen, you’ve got a graphic UI, and you have a network that hooks into the greater universe. In those days, to send an all-Sun email around the world, modems spun up everywhere. The network wasn’t what it is now. [00:18:35]Graham-Cumming: I remember in about 89’, I was at a conference and Whit Diffie was there. I asked him what he was doing. He was in a little computer room. I was trying to typeset something. And he said, “I’m telnetting into a machine which is in San Diego.” It was the first time I’d seen this and I stepped over and he was like, “look at this.” And he’s hitting the keyboard and the keys are getting echoed back. And I thought, oh my goodness, this is incredible. It’s right across the Atlantic and across the country as well. [00:19:10]Rothrock: I think, and this is just me talking having lived the last years and with all the investing and stuff I did, but you know it enabled the Internet to come about, the TCP/IP standard. You may recall that Microsoft tried to modify the TCP/IP stack slightly, and the world rejected it, because it was just too powerful, too pervasive. And then along comes HTTP and all the other protocols that followed. Telnetting, FTPing, all that file transfer stuff, we were doing that left, right, and center back in the 80s. I mean you know Cloudflare just took all this stuff and made it better, easier, and literally lower friction. That was the core investment thesis at the time and it just exploded. Much like when Sun adopted TCP/IP, it just exploded. You were there when it happened. My little company that I’m the CEO of now, we use Cloudflare services. First thing I did when I got there was switched to Cloudflare. [00:20:18]Graham-Cumming: And that was one of the things when I joined, we really wanted people get to a point where if you’re putting something on the web, you just say, well I’m going to put Cloudflare or a thing like Cloudflare just on it. Because it protects it, it makes it faster, etc. And of course now what we've done is we’ve given people compute facility. Right now you can write code and run it in our in our machines worldwide which is another whole thing. [00:20:43]Rothrock: And that is “The Network is the Computer”. The other thing that Sun was pitching then was a paperless office. I remember we had posters of paper flying out of a computer window on a Sun workstation and I don't think we've gotten there yet. But certainly, the network is the computer. [00:21:04]Graham-Cumming: It was probably the case that the paperless office was one of those things that was about to happen for quite a long time. [00:21:14]Rothrock: It's still about to happen if you ask me. I think e-commerce and the sort of the digital transformation has driven it harder than just networking. You know, the fact that we can now sign legal documents over the Internet without paper and things like that. People had to adopt. People have to trust. People have to adopt these standards and accept them. And lo and behold we are because we made it easy, we made it cheap, and we made it trustworthy.[00:21:42]Graham-Cumming: If you dial back through Sun, what was the hardest thing? I’m asking because I’m at a 1,000-person company and it feels hard some days, so I’m curious. What do I need to start worrying about? [00:22:03]Rothrock: Well yeah, at 1,000 people, I think that’s when John came into the company and sort of organized marketing. I would say, holding engineering to schedules; that was hard. That was hard because we were pushing the envelope our graphics was going from black and white to color. The networking stuff the performance of all the chips into the boards and just the performance was a big deal. And I remember, for me personally, I would go to a trade show. I'd go to Boston to the Association of Mechanical Engineers with the team there and would show up at these workstations and of course the engineers want to show off the latest. So I would be bringing with me tapes that we had of the latest operating system. But getting the engineers to be ready for a tradeshow was very hard because they were always experimenting. I don't believe the word “code freeze” meant much to them, frankly, but we would we would be downloading the software and building a trade show thing that had to run for three days on the latest and greatest and we knew our competitor would be there right across the aisle from us sort of showing their hot stuff. And working with Eric Schmidt in those days, you know, Eric you just got to be done on this date. But trade shows were wonderful. They focused the company’s endpoints if you will. And marketing and sales drove Sun; Scott McNealy’s culture there was big on that. But we had to show. It’s different today than it was then, I don’t know about the Cloudflare competition, but back then, there were a dozen workstation companies and we were fighting for mindshare and market share every day. So you didn't dare sort of leave your best jewels at home. You brought them with you. I will give John Gage high, high marks. He showed me how to dance through a reboot in case the code crashed and he’s marvelous and I learned how to work that stuff and to survive. [00:24:25] Rothrock: Can I tell you one sort of sales story? [00:24:28] Graham-Cumming: Yes, I’m very interested in hearing the non-technical stories. As an engineer, I can hear engineering stories all the time, but I’m curious what it was like being in sales and marketing in such an engineering heavy company as Sun. [00:24:48] Rothrock: Yeah. Well it was challenging of course. One of the strategies that Sun had in those days was to get anyone who was building their own computer. This was Computer Vision and Data General and all those guys to adopt the Sun as their hardware platform and then they could put on whatever they wanted. So because I was one of the demo gods, my job was to go along with the sales guys when they wanted to try to convince somebody. So one of the companies we went after was Data General (DG) in Massachusetts. And so I worked for weeks on getting this whole demo suite running MDA, EDA, word processing, I had everything. And this was a big, big, big deal. And I mean like hundreds of millions of dollars of revenue. And so I went out a couple of days early and we were going to put up a bunch of Suns and I had a demo room at DG. So all the gear showed up and I got there at like 5:30 in the morning and started downloading everything, downloading software, making it dance. And at about 8:00 a.m. in the morning the CEO of Data General walks in. I didn't know who he was but it turned out to be Ed de Castro. And he introduces himself and I didn’t know who he was and he said, “What are you doing?” And I explained, “I’m from Sun, I’m getting ready for a big demo. We’ve got a big executive presentation. Mr. McNealy will be here shortly, etc.” And he said, “Well, show me what you’ve got.” So I’m sort of still in the middle of downloading this software and I start making this thing dance. I’ve got these machines talking to each other and showing all kinds of cool stuff. And he left. And the meeting was about 10 or 11 in the morning. And so when the executive team from Sun showed up they said, “Well, how's it going?” I said, “Well I gave a demo to a guy,” and they asked, “Who's the guy,” and I said, “It was Ed de Castro.” And they went, “Oh my God, that was the CEO.” Well, we got the deal. I thought Ed had a little tactic there to come in early, see what he could see, maybe get the true skinny on this thing and see what’s real. I carried the day. But anyway, I got a nice little bonus for that. But Vinod and I would drop into Lockheed down in Southern California. They wanted to put Suns on P-3 airplanes and we'd go down there with an engineer and we’d figure out how to make it. Those were just incredible times. You may remember back in the 80s everyone dressed up except on Fridays. It was dress-down Fridays. And one day I dressed down and Carol Bartz, my boss, saw me wearing blue jeans and just an open collared shirt and she said, “Rothrock, you go home and put on a suit! You never know when a customer is going to walk in the front door.” She was quite right. Kodak shows up. Kodak made a big investment in Sun when it was still private. And I gave that demo and then AT&T, and then interestingly Vice President Bush back in the Reagan administration came to Sun to see the manufacturing and I gave the demo to the Vice President with Scott and Andy and Bill and Vinod standing there. [00:28:15]Graham-Cumming: Do you remember what he saw?[00:28:18]Rothrock: It was my standard two minute Sun demo that I can give in my sleep. We were on the manufacturing floor. We picked up a machine and I created a demo for it and my executive team was there. We have a picture of it somewhere, but it was fun. As John Gage would say, he’d say, “Ray, your job is to make the computer dance.” So I did. [00:28:44]Graham-Cumming: And one of the other things I wanted to ask you about is at some point Sun was almost Amazon Web Services, wasn't it. There was a rent-a-computer service, right? [00:28:53]Rothrock: I don't know. I don't remember the rent-a-computer service. I remember we went after the PC business aggressively and went after the data centers which were brand new in those days pretty aggressively, but I don’t remember the rent-a-computer business that much. It wasn’t in my domain. [00:29:14]Graham-Cumming: So what are you up to these days?[00:29:18]Rothrock: I’m still investing. I do a lot of security investing. I did 15 deals while I was at Venrock. Cloudflare was the last one I did, which turned out really well of course. More to come, I hope. And I’m CEO of one of Venrock’s portfolio companies that had a little trouble a few years back but I fixed that and it’s moving up nicely now. But I’ve started thinking about more of a science base. I’m on the board of the Carnegie Institute of Science. I'm on the board of MIT and I just joined the board of the Nuclear Threat Initiative in Washington which is run by Secretary Ernie Moniz, former secretary of energy. So I’m doing stuff like that. John would be pleased with how well that played through. But I'll tell you it is this these fundamental principles, just tying it all back to Sun and Cloudflare, and this sort of open, cheap, easy, enabling humans to do things without too much friction, that is exciting. I mean, look at your phone. Steve Jobs was the master of design to make this thing as sweet as it is. [00:30:37]Graham-Cumming: Yes, and as addictive. [00:30:39]Rothrock: Absolutely, right. I haven’t been to a presentation from Cloudflare in two years, but every time I see an announcement like the DNS service, I immediately switched all my DNS here at the house to 1.1.1.1. Stuff like that. Because I know it’s good and I know it’s trustworthy, and it’s got that philosophy built in the DNA. [00:31:09]Graham-Cumming: Yes definitely. Taking it back to what we talked about at the beginning, it’s definitely the trustworthiness is something that Cloudflare has cared about from the beginning and continues to care about. We’re sort of the guardians of the traffic that passes through it.[00:31:25]Rothrock: Back when the Internet started happening and when Sun was doing Java, I mean, all those things in the 90s, I was of course at Venrock, but I was still pretty connected to [Edward] Zander and [Scott] McNealy. We were hoping that it would be liberating, that it would create a world which was much more free and open to conversation and we’ve seen the dark side of some of that. But I continue to believe that transparency and openness is a good thing and we should never shut it down. I don't mean to get it all waxing philosophical here but way more good comes from being open and transparent than bad.[00:32:07] Graham-Cumming: Listen it's July 4th. It's evening here in London. We can be waxing philosophical as much as we like. Well listen, thank you for taking the time to chat with me. Are there any other reminiscences of Sun that you think the public needs to know in this oral history of “The Network is the Computer.”[00:32:28]Rothrock: Well you know the only thing I'd say is having landed in the Silicon Valley in 1981 and getting on with Sun, I can say this given my age and longevity here, everything is built on somebody else's great ideas. And starting with TCP/IP and then we went to this HTML protocol and browsers, it’s just layer on layer on layer on layer and so Cloudflare is just one of the latest to climb on the shoulders of those giants who put it all together. I mean, we don’t even think about the physical network anymore. But it is there and thank goodness companies like Cloudflare keep providing that fundamental service on which we can build interesting, cool, exciting, and mind-changing things. And without a Cloudflare, without Sun, without Apollo, without all those guys back in the day, it would be different. The world would just be so, so different. I did the New York Times crossword puzzle. I could not do it without Google because I have access to information I would not have unless I went to the library. It’s exponential and it just gets better. Thanks to Michelle and Matthew and Lee for starting Cloudflare and allowing Venrock to invest in it.[00:34:01]Graham-Cumming: Well thank you for being an investor. I mean, it helped us get off the ground and get things moving. I very much agree with you about the standing on the shoulders of giants because people don't appreciate the extent to which so much of this fundamental work that we did was done in the 70s and 80s. [00:34:19]Rothrock: Yea, it’s just like the automobile and the airplane. We reminisce about the history but boy, there were a lot of giants in those industries as well. And computing is just the latest. [00:34:32]Graham-Cumming: Yep, absolutely. Well, Ray, thank you. Have a good afternoon. Interested in hearing more? Listen to my conversations with John Gage and Greg Papadopoulos of Sun Microsystems:John GageGreg PapadopoulosTo learn more about Cloudflare Workers, check out the use cases below:Optimizely - Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.Cordial - Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.AO.com - AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.Pwned Passwords - Troy Hunt’s popular "Have I Been Pwned" project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.Timely - Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store. Quintype - Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

Promoter Tools

Industry Buzz

The Industry Buzz section is divided into three major sections, which is then subdivided into smaller sections.

Corporate Blogs which include official blogs from web hosts, registrars, search engines and other related sites.

Magazines & Blogs include interesting websites related to the hosting industry, but not necessarily from official company blogs.

Industry Leaders include personal blogs from important industry leaders, such as employees from Google and WordPress. These blogs sometimes include insights on how industry leaders think, but also may contain topics not related to hosting.