Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

7.
State of SoA networking in industry
A really big and confusing mess...

8.
State of SoA networking in industry
● Likely already in a world of hurt or rapidly approaching that point.
● Debugging is difficult or impossible (each application exposes different stats
and logs with no tracing).
● Limited visibility into infra components such as hosted load balancers,
databases, caches, network topologies, etc.).
● Multiple and partial implementations of circuit breaking, retry, and rate limiting
(If I had a $ for every time someone told me that retries are “easy” …).
● Furthermore, if you do have a good solution, you are likely using a library and
are locked into a particular technology stack essentially forever.
● Libraries are incredibly painful to upgrade. (Think CVEs).

9.
State of SoA networking in industry
● Ultimately, robust observability and easy debugging are everything.
● As SoAs become more complicated, it is critical that we provide a common
solution to all of these problems or developer productivity grinds to a halt (and
the site goes down … often).
Can we do better?

11.
What is Envoy
The network should be transparent to applications. When
network and application problems do occur it should be easy to
determine the source of the problem.
This sounds great! But it turns out it’s really, really hard.

12.
What is Envoy
● Out of process architecture: Let’s do a lot of really hard stuff in one place and
allow application developers to focus on business logic.
● Modern C++11 code base: Fast and productive.
● L3/L4 filter architecture: A TCP proxy at its core. Can be used for things other
than HTTP (e.g., MongoDB, redis, stunnel replacement, TCP rate limiter, etc.).
● HTTP L7 filter architecture: Make it easy to plug in different functionality.
● HTTP/2 first! (Including gRPC and a nifty gRPC HTTP/1.1 bridge).
● Service discovery and active health checking.
● Advanced load balancing: Retry, timeouts, circuit breaking, rate limiting,
shadowing, etc.
● Best in class observability: stats, logging, and tracing.
● Edge proxy: routing and TLS.

13.
Envoy service to service topology
Service Cluster
Envoy
Service
Discovery
Service Cluster
Envoy
Service
External Services
HTTP/2
REST / GRPC

16.
Eventually consistent service discovery
● Fully consistent service discovery systems are very popular (ZK, etcd, consul,
etc.).
● In practice they are hard to run at scale.
● Service discovery is actually an eventually consistent problem. Let’s
recognize that and design for it.
● Envoy is designed from the get go to treat service discovery as lossy.
● Active health checking used in combination with service discovery to produce a
routable overlay.
Discovery Status HC OK HC Failed
Discovered Route Don’t Route
Absent Route Don’t Route / Delete

18.
Observability
● Observability is by far the most important thing that Envoy provides.
● Having all SoA traffic transit through Envoy gives us a single place where we
can:
○ Produce consistent statistics for every hop
○ Create and propagate a stable request ID
○ Consistent logging
○ Distributed tracing

23.
Performance matters for a service proxy
● Throughput is important ultimately for cost and operational scaling, but for many
companies developer time is worth more than infra costs.
● BUT: Latency and predictability is what matters. And in particular tail latency
(P99+).
● We already deal with incredibly confusing deployments (virtual IaaS, multiple
languages and runtimes, languages that use GC, etc.). All of these niceties
improve productivity and reduce upfront dev costs, but they make debugging
really difficult.
● What is leading to sporadic error? The IaaS? The app? GC?
● Ability to reason about overall performance and reliability is critical.

24.
Performance matters for a service proxy
● A service proxy provides invaluable benefits, but if the proxy itself has tail
latencies that are hard to reason about, most of the debugging benefits go out
the window and you are back to square one.
● Anyone that is trying to sell you a service infra that does not consider the above
points is selling you a dream...

30.
Q&A
● Thanks for coming!
● We are super excited about building a community around Envoy. Talk to us if
you need help getting started.
● https://lyft.github.io/envoy/
● Lyft is hiring: Contact us if you want to work on hard scaling problems in a fast
moving company: https://www.lyft.com/jobs