This guide provides an introduction to the flexible environment for those who
are familiar with the standard environment. It explains the similarities and
key differences between the environments and also provides general architectural
recommendations for applications that use both environments.

Similarities and key differences

Both environments provide you with App Engine’s deployment, serving, and scaling
infrastructure. The key differences are the way the environment executes your
application, how your application accesses external services, how you run your
application locally, and how your application scales. You can also refer to
choosing an environment for a
high-level summary of these differences.

Application execution

In the standard environment, your application runs on a lightweight instance
inside of a sandbox. This sandbox restricts what your application can do. For
example, your application can not write to disk or use non-whitelisted binary
libraries. The standard environment also limits the amount of CPU and memory
options available to your application. Because of these restrictions, most App
Engine standard applications tend to be stateless web applications that respond
to HTTP requests quickly.

In contrast, the flexible environment runs your application in Docker containers
on Google Compute Engine virtual machines (VMs),
which have fewer restrictions. For example, you can use any programming language
of your choice, write to disk, use any library you'd like, and even run multiple
processes. The flexible environment also allows you to choose any Compute Engine
machine type for your instances so that your application has access to more
memory and CPU.

Accessing external services

In the standard environment, your application typically accesses services such
as Cloud Datastore via the built-in google.appengine APIs. However, in
the flexible environment, these APIs are no longer available. Instead, use the
Google Cloud client libraries. These client
libraries work everywhere, which means that your application is more portable.
If needed, applications that run in the flexible environment can usually run on
Google Kubernetes Engine or
Compute Engine without heavy modification.

Local development

In the standard environment, you typically run your application locally using
the App Engine SDK. The SDK handles running your application and emulates the
App Engine services. In the flexible environment, the SDK is no longer used to
run your application. Instead, applications written for the flexible environment
should be written like standard web applications that can run anywhere. As
mentioned, the flexible environment just runs your application in a Docker
container. This means that to test the application locally, you just run the
application directly. For example, to run a Python application using Django, you
would just run python manage.py runserver.

Another key difference is that flexible environment applications running locally
use actual Cloud Platform services, such as Cloud Datastore. Use a separate
project for testing locally and when available, use
emulators.

Scaling characteristics

While both environments use App Engine’s automatic scaling infrastructure, the
way in which they scale is different. The standard environment can scale from
zero instances up to thousands very quickly. In contrast, the flexible
environment must have at least one instance of your application running and can
take longer to scale up in response to traffic.

Standard environment uses a custom-designed autoscaling algorithm. Flexible
environment uses the Compute Engine
Autoscaler. Note that flexible environment does not
support all of the autoscaling options that are available to Compute
Engine. Developers should test their application behavior under a range of
conditions. For example, you should verify how autoscaling responds when a
CPU-bound application becomes I/O-bound during periods when calls to remote
services have elevated latency.

Health checks

Standard environment does not use health checks to determine whether or not to
send traffic to an instance. Flexible environment permits application developers
to write their own health check handlers that will be used by the load balancer
to determine whether or not to send traffic to an instance and whether or not it
should be autohealed. Developers should be careful when adding logic to health
checks. For example, if the health check makes a call to an external service
then a temporary failure in that service can cause all instances to go
unhealthy, possibly leading to a cascading failure.

Dropping requests when overloaded

Applications can drop requests when overloaded as part of a strategy to avoid
cascading failures. This capability is built into the traffic routing layer in
the standard environment. We recommend that developers of very high QPS
applications in the flexible environment build this capability to drop overload
traffic into their applications by limiting the number of concurrent requests.

You can verify that your flexible environment application is not susceptible to
this type of failure by creating a version with a limit to the maximum number of
instances. Then steadily increase traffic until requests are dropped. You should
ensure that your application is not failing health checks during overload.

Instance sizes

Flexible environment instances are permitted to have higher CPU and memory
limits than is possible with standard environment instances. This allows
flexible instances to run applications that are more memory and CPU
intensive. However, it may increase the likelihood of concurrency bugs due to
the increase in threads within a single instance.

Maximum request timeout

The standard environment imposes a 60 second request deadline for versions that
use automatic scaling.
Since flexible environment does not impose such a deadline, application
programmers should be careful to ensure that all calls to external services
specify a timeout in order to avoid requests hanging indefinitely and eventually
using up all threads on the web server.

Application developers can implement their own servlet filter to kill requests
that take longer than 60 seconds in the flexible environment. It is important to
handle clean up correctly when killing a request thread, so that the application
is not left in an inconsistent state.

Traffic migration

Standard environment provides a traffic migration feature that gradually moves
traffic to a new version to minimize latency spikes. See the Traffic Migration
docs for ways to
ensure you avoid a latency spike when switching traffic to a new version.

Single zone failures

Standard environment applications are single-homed, meaning that all instances
of the application live in a single availability zone. In the event of a failure
in that zone, the application starts new instances in a different zone in the
same region and the load balancer routes traffic to the new instances. You will
see a latency spike due to loading requests and also a Memcache flush.

Flexible environment applications use Regional Managed Instance
Groups,
meaning that instances are distributed among multiple availability zones within
a region. In the event of a single zone failure, the load balancer stops routing
traffic to that zone. If you have set autoscaling to run your instances as hot
as possible, then you will see a brief period of overload before autoscaling
creates more instances.

Cost comparisons

Many factors are involved in a cost comparison between workloads running on
standard and flexible environments. These include:

Price paid per MCycle.

CPU platform capabilities, which impacts work that can be done per MCycle

How hot you can run instances on each platform.

Cost of deployments, which may differ on each platform and can be signficant
if you are using Continuous Deployment for your application.

Runtime overhead.

You will need to run experiments to determine the cost of your workload on each
platform. In flexible environment, you can use QPS per core as a proxy for the
cost efficiency of your application when running experiments to determine
whether a change has an impact on costs. Standard environment does not provide
such a mechanism to get real-time metrics on the cost efficiency of your
application. You have to make a change and wait for the daily billing cycle to
complete.

Microservices

Standard environment allows secure authentication between applications using the
X-Appengine-Inbound-Appid
request header. Flexible environment does not have such a feature. The
recommended approach for secure authentication between applications is to use
OAuth.

Deployment

Deployments in standard environment are generally faster than deployments in
flexible environment. It is faster to scale up an existing version in flexible
environment than to deploy a new version, because the network programming for a
new version is normally the long pole in a flexible environment deployment. One
strategy for doing quick rollbacks in flexible environment is to maintain a
known good version scaled down to a single instance. You can then scale up that
version and then route all traffic to it using Traffic Splitting.

When to use the flexible environment

The flexible environment is intended to be complementary to the standard
environment. If you have an existing application running in the standard
environment, it’s not usually necessary to migrate the entire application to the
flexible environment. Instead, identify the parts of your application that
require more CPU, more RAM, a specialized third-party library or program, or
that need to perform actions that aren’t possible in the standard environment.
Once you’ve identified these parts of your application, create small App Engine
services that use the flexible environment to handle just those parts. Your
existing service running in the standard environment can call the other services
using HTTP, Cloud Tasks (beta), or Cloud Pub/Sub.

For example, if you have an existing web application running in the standard
environment and you want to add a new feature to convert files to PDFs, you can
write a separate microservice that runs in the flexible environment that just
handles the conversion to PDF. This microservice can be a simple program
consisting of just one or two request handlers. This microservice can install
and use any available Linux program to aid in the conversion, such as
unoconv.

Your main application remains in the standard environment and can call this
microservice directly via HTTP, or if you anticipate the conversion will take a
long time, the application can use Cloud Tasks (beta) or
Cloud Pub/Sub to queue the requests.