DNS is a critical piece of infrastructure used to facilitate communication across networks. It’s often described as a phonebook: in its most basic form, DNS provides a way to look up a host’s address by an easy-to-remember name. For example, looking up the domain name stripe.com will direct clients to the IP address 53.187.159.182, where one of Stripe’s servers is located. Before any communication can take place, one of the first things a host must do is query a DNS server for the address of the destination host. Since these lookups are a prerequisite for communication, maintaining a reliable DNS service is extremely important. DNS issues can quickly lead to crippling, widespread outages, and you could find yourself in a real bind.

It’s important to establish good observability practices for these systems so when things go wrong, you can clearly understand how they’re failing and act quickly to minimize any impact. Well-instrumented systems provide visibility into how they operate; establishing a monitoring system and gathering robust metrics are both essential to effectively respond to incidents. This is critical for post-incident analysis when you’re trying to understand the root cause and prevent recurrences in the future.

In this post, I’ll describe how we monitor our DNS systems and how we used an array of tools to investigate and fix an unexpected spike in DNS errors that we encountered recently.

Stripe uses machine learning to respond to our users’ complex, real-world problems. Machine learning powers Radar to block fraud, and Billing to retry failed charges on the network. Stripe serves millions of businesses around the world, and our machine learning infrastructure scores hundreds of millions of predictions across many machine learning models. These models are powered by billions of data points, with hundreds of new models being trained each day. Over time, the volume, quality of data, and number of signals have grown enormously as our models continuously improve in performance.

Running infrastructure at this scale poses a very practical data science and ML problem: how do we give every team the tools they need to train their models without requiring them to operate their own infrastructure? Our teams also need a stable and fast ML pipeline to continuously update and train new models as they respond to a rapidly changing world. To solve this, we built Railyard, an API and job manager for training these models in a scalable and maintainable way. It’s powered by Kubernetes, a platform we’ve been working with since late 2017. Railyard enables our teams to independently train their models on a daily basis with a centrally managed ML service.

In many ways, we’ve built Railyard to mirror our approach to products for Stripe’s users: we want teams to focus on their core work training and developing machine learning models rather than operating infrastructure. In this post, we’ll discuss Railyard and best practices for operating machine learning infrastructure we’ve discovered while building this system.

Stripe has engineering hubs in San Francisco, Seattle, Dublin, and Singapore. We are establishing a fifth hub that is less traditional but no less important: Remote. We are doing this to situate product development closer to our customers, improve our ability to tap the 99.74% of talented engineers living outside the metro areas of our first four hubs, and further our mission of increasing the GDP of the internet.

Stripe will hire over a hundred remote engineers this year. They will be deployed across every major engineering workstream at Stripe.

Our users are everywhere. We have to be, too.

Our remotes keep us close to our customers, which is key to building great products. They are deeply embedded in the rhythms of their cities. They see how people purchase food differently in bodegas, konbini, and darshinis. They know why it is important to engineer robustness in the face of slow, unreliable internet connections. They have worked in and run businesses that don’t have access to global payments infrastructure.

Stripe has had hundreds of extremely high-impact remote employees since inception. Historically, they’ve reported into teams based in one of our hubs. We had a strong preference for managers to be located in-office and for teams to be office-centric, to maximize face-to-face bandwidth when doing creative work.

As we have grown as a company, we have learned some things.

One is that the technological substrate of collaboration has gotten shockingly good over the last decade. Most engineering work at Stripe happens in conversations between engineers, quiet thinking, and turning those thoughts into artifacts. Of these, thinking is the only one that doesn’t primarily happen online.

There was a time when writing on a whiteboard had substantially higher bandwidth than a Word doc over email. Thankfully Google Docs, Slack, git, Zoom, and the like deliver high-bandwidth synchronous collaboration on creative work. The experience of using them is so remarkably good that we only notice it when something is broken. Since you write code via pull requests and not whiteboards, your reviewer needs to have access to the same PR; having access to the same whiteboard is strictly optional.

While we did not initially plan to make hiring remotes a huge part of our engineering efforts, our remote employees have outperformed all expectations. Foundational elements of the Stripe technology stack, our products, our business, and our culture were contributed by remotes. We would be a greatly diminished company without them.

Stripe’s new remote engineering hub

We have seen such promising results from our remote engineers that we are greatly increasing our investment in remote engineering.

We are formalizing our Remote engineering hub. It is coequal with our physical hubs, and will benefit from some of our experience in
scaling engineering
organizations.
For example, there will be dedicated engineering teams in the Remote hub that exist in no other hub. (Some individuals report to a team located in a different hub, and we expect this will remain common, but the bulk of high-bandwidth coworker relationships are within-hub.) We also have a remote engineering lead, analogous to the site leads we have for our physical hubs.

We are expanding the scope we will hire for remotely. In addition to hiring engineers, we plan to begin hiring remote product managers, engineering managers, and technical program managers later this year. (We will continue hiring remote employees in non-engineering positions across the company as well.)

We intend to expand our remote engineering hiring aggressively. We will hire at least a hundred remote engineers this year. We expect to be constrained primarily by our capacity to onboard and support new remote engineers, and we will work to increase that capacity.

We will continue to improve the experience of being a remote. We have carefully tracked the experience of our remote employees, including in our twice-annual employee survey. Most recently, 73% of engineers at Stripe believe we do a good job of integrating remote employees.

Great user experiences are made in the tiny details. We care about the details to a degree that is borderline obsessive. A recent example: we wrote code to attach a videoconferencing link to every calendar invitation by default, so that remotes never feel awkward having to ask for one.

More to come

There are still some constraints on our ambitions. In our first phase, we will be focused primarily on remote engineers in North America, starting with the US and Canada. While we are confident that great work is possible within close time zones, we don’t yet have structures to give remotes a reliably good experience working across large time zone differences. And though we intend to hire remote engineers in Europe and Asia eventually, our hubs in Dublin and Singapore are not sufficiently established to support remotes just yet.

Most engineers working at Stripe are full-time employees, with a full benefits suite. There is substantial organizational, legal, and financial infrastructure required to support each new jurisdiction we hire in, so we have to be measured in how quickly we expand. We can support most US states today, and plan to expand our hiring capabilities to include jurisdictions covering more than 90% of the US population as quickly as possible. We intend, over the longer term, to be everywhere our customers are.

We will continue encouraging governments worldwide to lower barriers to hiring. Our customers, from startups to international conglomerates, all feel the pain of this. We think making it easier for companies to hire would produce a step-function increase in global GDP.

We want to talk to you

We would love to talk about our Remote hub or remote positions at
Stripe. Our
CEO and co-founder, Patrick Collison, and I will host a remote coffee on May 22, 2019;
sign up to be invited to it. We are also, and always, available on the internet.

Since our launch last April, we’ve seen a wide range of businesses use Stripe Billing to manage their recurring revenue and send invoices, including European businesses like Deliveroo, Front, Channel 4, Shadow, and Typeform. Today, we’re launching new features for Stripe Billing to help recurring revenue businesses in Europe expand internationally and minimise the impact of upcoming Strong Customer Authentication (SCA) regulatory requirements.

Minimise churn with SCA-ready tools

SCA requirements go into effect in September 2019 and will require additional authentication for many European online payments. This will be particularly challenging for recurring revenue businesses because most subscription payments are processed automatically using stored payment information. When these payments require SCA, businesses will need to contact their subscribers and ask them to provide two-factor authentication (i.e. they may need to enter a password or verify the payment on their phone). Banks will begin to decline payments that have not collected this additional authentication when it’s required.

Stripe Billing makes meeting these new requirements easier in a few ways:

If a recurring charge requires SCA, Stripe Billing can now email subscribers automatically with a link to complete two-factor authentication via 3D Secure 2. These emails can be customised to match your brand. (Soon, you’ll also be able to send them from your own domain.)

Under the hood, these new features take advantage of Stripe’s new Payment Intents API, which allows us to automatically apply any SCA exemptions available for a given payment. Our goal is to help you comply with SCA requirements while minimising the number of charges requiring additional authentication.

Send customised and compliant invoices in multiple languages

When scaling internationally, it can be challenging for recurring revenue businesses to comply with local accounting rules and meet global customer expectations. These challenges are much more complicated for B2B businesses that sign custom pricing deals and send invoices for manual payment. To help make international invoicing easier, we’ve made our invoices much more flexible:

In addition to matching the logo and colour scheme of the invoice to your brand, Stripe Billing now lets you add memos, footers, and custom fields to your invoices.

Sequential and unique invoice numbers can be customised with a customer prefix.

Credit notes can be issued to refund an invoice or reduce the amount owed.

Invoices can now be sent in 13 different languages: French, German, Spanish, Italian, Dutch, Danish, Swedish, Finnish, Norwegian, Hebrew, Arabic, and Japanese, as well as English.

PDF and hosted invoices in French

Collect and report on VAT without leaving the Dashboard

We’ve added a new tax rates feature to help businesses collect the right amount of tax and remit it to the government.

You can now create inclusive or exclusive tax rates for VAT, GST, and US sales tax for different jurisdictions. These tax rates can then be applied to individual invoice line items, invoice subtotals, or all of the invoices for a subscription. Invoices can also now display the customer’s tax ID and Stripe Billing now automatically validates EU VAT numbers to make sure they are correct. As you’d expect, CSV reports detailing which tax rates were applied and how much tax was collected can be downloaded from the Dashboard.

To get started with Stripe Billing, visit the quickstart guide—you can use all of these new features either from the Dashboard or via the API. If you’re already using Stripe Billing and want to learn more about updating your integration to be SCA-ready, we’ve put together a short migration guide.

We’ll continue to add more features to Stripe Billing to help you manage the complexity of running a global business.

Please let us know if you have any questions or feedback—we’d love to hear from you.

Starting in September, new regulatory requirements called Strong Customer Authentication (SCA) will be rolled out for businesses who have customers in Europe. SCA is a pillar of the EU’s second Payment Services Directive (PSD2) and will require two-factor authentication on most payments made by European customers in an effort to decrease online fraud. Complying with SCA will be complex, as it will be implemented differently by individual banks and payment providers across Europe. And beyond the compliance burden, these new rules will also come with a cost—the new authentication step can add friction to checkout, reducing conversion.

Spurred by SCA and other similar global regulations on the horizon (the requirements of SCA resemble those in India, and similar rules have been proposed in Australia, and other countries globally), we’ve upgraded our products across the board in order to help insulate businesses from this complexity and to minimize the impact on conversion. As soon as SCA is mandated, our payments platform will analyze each transaction to ensure that additional authentication is requested only when it’s required. And when it is required, we use new technologies to make the authentication as user-friendly as possible. Our new SCA-ready products and updates include:

The new Payment Intents API to help businesses more easily build fully customized, dynamic payment flows that are ready for SCA

Stripe exists to grow the GDP of the internet. Simply put, this means making it easier for businesses to transact with customers anywhere in the world. Increasingly, fulfilling this mission entails helping our users overcome the challenge of navigating a payments regulatory landscape that is becoming both more complex and more Balkanized. To make this transition easy for businesses built on Stripe, we’re making our products more powerful and dynamic than ever before. We’ll be making a host of additional improvements to our products in 2019 to help our users prepare for regulatory changes, ensuring that operating a global internet business is as easy and as seamless as possible.

Stripe now supports 3D Secure 2, a new card authentication standard which introduces an improved user experience and frictionless authentication flows. Read our guide to learn more about this new version of 3D Secure, or get started with 3D Secure 2 using our new Payment Intents API or Checkout.

Stripe builds economic infrastructure, and we’re designing for a global audience and market. In doing so, we carefully consider our technology and tools, organizational structure, and employee representation. Successful global organizations establish this mindset for different reasons. For some, it’s foundational—their mission, product, and addressable market crosses time zones. Others develop an international customer base, hire remote employees, or begin to open offices abroad to extend their physical presence.

Stripe subscribes to all these definitions—and, in some markets, has been shaped by these choices. For example, Stripe first launched its products in Ireland in 2013. Two years later, we set up a Dublin-based office, which is now home to over 140 Stripes. And last summer, a landing team arrived in Dublin to establish our first engineering hub outside the United States.

We’ve learned that building a global company means building global products—and that those products improve even faster when they’re developed on the ground, closer to customers. In five months, we’ve helped our first customers go live in Estonia, Poland, Greece, Lithuania, and Latvia. We’re building products for Europe, and scoping our entry into the Middle East and Africa.

Scaling an engineering team requires new strategies for hiring engineers in new markets, developing products across time zones, and nurturing a distributed engineering culture. It’s not a sure fix, but we’ve found the best primer for success starts by forming and deploying a great landing team. Here’s what we’ve learned.

Many of our product features, like sending invoices or setting up plans with Stripe Billing, applying rules in Stripe Radar, or creating custom reports in Stripe Sigma, can be used directly from the Dashboard. We’ve recently made a number of updates to improve common workflows in the Dashboard and to make it easier to manage your business—no API requests required.

NewBulk refunds from the Dashboard

We’ve made it much faster to refund several payments at the same time from the Dashboard. You can now select multiple payments and refund them with a single click. Try out bulk refunds from the Payments page, in search results, the customer details page, or the Radar Reviews page.

NewCustomizable columns in exports from the Dashboard

If you’re downloading reports from the Dashboard, you can now customize the export by selecting exactly which columns of data you’d like to include. (This works on any page with lists of data.)

NewIssue refunds from the Android Dashboard app

Android users can use our Android Dashboard app to view earnings and payouts or search for particular payments and customers. With the latest version, you can also issue refunds directly from the app.

NewSubscription management in the iOS Dashboard app

The latest version of our iOS Dashboard app lets you manage your subscriptions on the go. You can now start or cancel a subscription for any customer directly from your iPhone.

We hope that these new updates help you do even more with Dashboard. These features were built based on feedback from our users, so if you have any ideas or flows you’d like us to improve, please let us know!