Saturday, October 13, 2012

Why are you using embedded mode of mod_wsgi?

If you are using Apache/mod_wsgi and when you set it up all you cared about was getting something working, but didn't care much for understanding how things worked and how you should set it up, chances are you are running your WSGI application in embedded mode of mod_wsgi. If you don't understand how to setup Apache, using embedded mode is nearly always a bad idea. Much more so if the Apache installation you are using is the default installation offered up by many Linux distributions, because in that case you are likely running the Apache prefork MPM, a choice which only compounds the problems you can experience.

The preferred mode of running a WSGI application under mod_wsgi if you have no idea of how to setup and tune Apache to match your specific web application, is daemon mode. Even if you reckon you do know how to setup Apache it is still safer to use daemon mode.

If daemon mode is a better option, why isn't it the default?

This unfortunate situation whereby embedded mode is the default came about because in the very first incarnation of mod_wsgi it was designed to mimic what mod_python did. As a result, it only supported the concept of embedded mode. This is where the WSGI application runs within the actual Apache child processes, the same processes which are also handling serving of static file requests.

Although daemon mode, which is more akin to how FASTCGI works with the WSGI application running in separate dedicated processes, was added later, embedded mode was already the default and it was hard to change at that point. Daemon mode also needed additional configuration whereas with embedded mode, things would at least run out of the box. Under Windows only embedded mode is supported, so having daemon mode be the default on UNIX systems but embedded mode the default on Windows was also seen as confusing.

Why is running in embedded mode so bad?

The problems with embedded mode aren't so much due to the fact that the WSGI application is running in the actual Apache child processes, but that management of the processes is done by Apache and as such is subject to the general MPM settings of Apache. For the typical default Apache configuration the MPM settings are set up for serving of static files. The settings are not necessarily going to work very well for a dynamic web application with a large memory footprint that performs better when kept persistent in memory, as is the case for the majority of Python web applications.

PHP gets away with running okay within the Apache child processes because of how PHP was designed to work. Specifically in PHP, any application code is effectively reloaded on each request and so it has been optimised in various ways to perform adequately under that scenario. Python being a general purpose programming language adapted to run web applications has a much larger startup cost for both the interpreter and for loading up a web application. Certain aspects of how the Python interpreter is implemented and loading of Python modules managed, also means it is not possible to use some of the techniques that PHP uses around preloading of the interpreter and code modules prior to forking of the Apache child processes. Python in Apache can simply therefore never match PHPs efficiency in this respect.

The big problem therefore is simply the default configuration. If the MPM settings are properly setup and tuned for your specific Python web application running under embedded mode, then embedded mode will perform better than daemon mode. Use the default settings or don't configure it properly and you risk setting yourself up for a world of hurt.

How should embedded mode be configured?

How best to configure the MPM settings of Apache when running a WSGI application in embedded mode is not the point of this post. I will deal with that another time. The point of this post is to help you identify when you are running embedded mode and show you how to setup daemon mode in its basic configuration instead. You can at least then shift away from using embedded mode if you didn't even realise you were using it and avoid causing problems for yourself.

Determining if embedded mode is being used.

To determine if your WSGI application is running in embedded mode, replace its WSGI script with the test WSGI script as follows:

If the configuration is such that the WSGI application is running in embedded mode, then you will see:

mod_wsgi.process_group = ''

That is, the process group definition will be an empty string.If instead you are already running in the preferred daemon mode, you would see a non empty string giving the name of the daemon process group.

Identifying which Apache MPM you are using.

Even if your WSGI application is running in daemon mode, if the only thing you are using Apache for is to host the WSGI application and serve static files, then it is also recommended that you use worker MPM rather than prefork MPM as worker MPM will cut down on memory use by the Apache child processes.

To determine if you are using prefork MPM or worker MPM, you could try and work it out by looking at what operating system packages are installed, but the definitive way of doing it, is to run the Apache binary with the '-V' option.

$ /usr/sbin/httpd -V

Server version: Apache/2.2.14 (Unix)

Server built: Feb 10 2010 22:22:39

Server's Module Magic Number: 20051115:23

Server loaded: APR 1.3.8, APR-Util 1.3.9

Compiled using: APR 1.3.8, APR-Util 1.3.9

Architecture: 64-bit

Server MPM: Prefork

threaded: no

forked: yes (variable process count)

Server compiled with....

-D APACHE_MPM_DIR="server/mpm/prefork"

-D APR_HAS_SENDFILE

-D APR_HAS_MMAP

-D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)

-D APR_USE_FLOCK_SERIALIZE

-D APR_USE_PTHREAD_SERIALIZE

-D SINGLE_LISTEN_UNSERIALIZED_ACCEPT

-D APR_HAS_OTHER_CHILD

-D AP_HAVE_RELIABLE_PIPED_LOGS

-D DYNAMIC_MODULE_LIMIT=128

-D HTTPD_ROOT="/usr"

-D SUEXEC_BIN="/usr/bin/suexec"

-D DEFAULT_PIDLOG="/private/var/run/httpd.pid"

-D DEFAULT_SCOREBOARD="logs/apache_runtime_status"

-D DEFAULT_LOCKFILE="/private/var/run/accept.lock"

-D DEFAULT_ERRORLOG="logs/error_log"

-D AP_TYPES_CONFIG_FILE="/private/etc/apache2/mime.types"

-D SERVER_CONFIG_FILE="/private/etc/apache2/httpd.conf"

The 'Server MPM' field will tell you which MPM your Apache has been compiled for.

If for some reason you can't work out which is the Apache binary, because your Linux distribution calls it something other than 'httpd', or they have modified it so it will not run unless some magic environment variables are set, then you can also guess what is running by using the following WSGI script.

import sys

def application(environ, start_response):

status = '200 OK'

output = 'wsgi.multithread = %s' % repr(environ['wsgi.multithread'])

response_headers = [('Content-type', 'text/plain'),

('Content-Length', str(len(output)))]

start_response(status, response_headers)

return [output]

If you get the output:

wsgi.multithread = True

you are likely running worker MPM and otherwise are running prefork MPM.

Running a WSGI application in daemon mode.

To force a WSGI application to run in daemon mode, the WSGIDaemonProcess and WSGIProcessGroup directives would need to be defined. For example, to setup a daemon process group containing two multithreaded processes one could use:

WSGIDaemonProcess example.com processes=2 threads=15

WSGIProcessGroup example.com

The WSGIDaemonProcess directive specifies the details of the daemon process group. The WSGIProcessGroup indicates that any WSGI application specified within the same context is to be delegated to run in that daemon process group.

A complete virtual host configuration for this type of setup would therefore be something like:

After appropriate changes have been made Apache will need to be restarted. For this example, the URL 'http://www.example.com/' would then be used to access the the WSGI application.

Note that you obviously should substitute the paths and hostname with values appropriate for your system.

After making the changes, use the test WSGI script described before to verify the WSGI application is in fact running in daemon mode. It is a common mistake to see people use WSGIDaemonProcess but then not use WSGIProcessGroup or other configuration mechanisms to ensure that the WSGI application is in fact delegated to the daemon process group, so double check you got the configuration correct.

Immediate benefits of using daemon mode.

When you use daemon mode, the number of processes and threads is static. This is one of the immediate benefits of using daemon mode. Specifically, that process management is more predictable. One of the big problems with using embedded mode is that Apache can decide to create additional processes or kill off existing ones. For a web application with large startup costs this is not a good idea as you could suddenly see increased CPU usage due to more processes being started right at the time you don't need it such as when a throughput spike occurs. This can actually cause performance to degrade in the short term rather than improve.

If using embedded mode and you need to update the code for your Python web application, you have no choice but to restart the whole of Apache. If using daemon mode, you can avoid restarting the whole of Apache and can instead simply touch the WSGI script file to update its modification date. This will have the side effect of causing the daemon processes to restart on the next request. This is also convenient when using Apache/mod_wsgi as a development environment to ensure parity with your production environment.

Additional reference documentation.

The information above gives a quick heads up on how to check whether you are running in embedded mode and how instead to get your WSGI application running in daemon mode. For additional information also read:

The most up to date version of mod_wsgi 3.4. This was only released recently and the majority of distributions will not as yet have it as a packaged binary. You should at least aim to use mod_wsgi 3.3. If you are on a Linux distribution which is still only supplying mod_wsgi 2.8 or older, you really should think about upgrading to a more modern operating system distribution as 2.8 was released almost 3 years ago.

10 comments:

Speaking of WSGIProcessGroup, is the order important? In other words, if I have a WSGIScriptAlias directive above the WSGIDaemonProcess and WSGIProcessGroup directives inside the same , will they apply to that WSGIScriptAlias?

The only ordering issue is that if you have multiple WSGIProcessGroup directives in the same scope, the last one will be used. In other words, determined by scope and last one encountered for that scope is used.

Further, if you have a nested scope, any defined in the nested scope overrides that in the parent scope. Thus if you have WSGIProcessGroup at VirtualHost scope and then also set WSGIProcessGroup inside of a Location or Directory context, and a specific request matches in some way that nested context, the WSGIProcessGroup from that nested context is used instead.

This means for example you could have two WSGIDaemonProcess directives for different named groups. You might have default WSGIProcessGroup at VirtualHost context select one of them, but then for a specific URL subset, use Location and a nested WSGIProcessGroup to override that requests in that case go to the other daemon process group.

Such a mechanism might be used if some subset of URLs used a C extension module for Python which wasn't thread safe. The default daemon process group could be default single process and multithreaded (thus less memory) and the second daemon process group could be a couple of single threaded processes.

So have flexibility to segment parts of an application by URLs using Location directive across multiple daemon process groups, each configured differently to suite the requirements for those URLs.

Just a minor point, I am using embedded mode (I don't have the daemon directives, and I also double checked using the checking wsgi_app given). Nonetheless, when I change my python code, the changes are immediately visible - without restarting apache (in the post it is asserted that you only get this behaviour in daemon mode).

@Michael Are you sure the next request isn't just being picked up by a different Apache child worker process that hasn't loaded your code yet? Depending on your Apache MPM configuration, Apache can recycle processes that are idle and so easily give this impression. In embedded mode, mod_wsgi by itself definitely doesn't reload the whole process automatically under any circumstances. The only way you would get it is by the separate code on mod_wsgi site which checks for changes from a background thread and kills the processes when such changes are detected, something which I wouldn't recommend in embedded mode if other stuff is running in the same Apache, such as static file serving or PHP applications etc. Anyway, all details of how reloading works is documented in http://code.google.com/p/modwsgi/wiki/ReloadingSourceCode

Hmm, I ran the daemon vs embedded test, and got a process group back, which I assume means I'm running this as a daemon (with apache prefork, though). However, I notice that only some changes seem to be immediately reflected upon reload of a webpage without an apache restart.

Python will cache all code in the process and doesn't automatically reload code after it has changed. Whether templates are reloaded automatically can depend on the web framework being used or how you wrote your code. Read http://code.google.com/p/modwsgi/wiki/ReloadingSourceCode on details of how to trigger reloading.

I'm having no luck with configured daemon mode on my prefork single threaded apache2 installation (after several attempts). Hence I'd like to optimize my embedded mode - you mention that this is an appropriate approach, and that you'd write about how to do it elsewhere. Is there some reference you can point me to about the best apache configuration for embedded mode? Thanks for your work!

@Michael If you need help there is the mod_wsgi mailing list. Comments on blog posts are not the appropriate place for debugging problems. I would very much suggest you use that list to resolve your issues with your daemon mode configuration before giving up on it.

I am using embedded mode of mod_wsgi for my django application. Also i am using daemons to do some background processing for my application.Problem is my processes are running twice everytime.I have used all the directives.Please point me in the direction to solve this

@awais If you need help there is the mod_wsgi mailing list. Comments on blog posts are not the appropriate place for debugging problems. I would very much suggest you use that list to resolve your issue.