Optimizing Apache can be a simple task, or it can be a massive pain. The first thing to check is what version of Apache is in use. Always use the latest version of Apache since updated versions are more secure and usually perform better than previous versions.

To check what version of Apache you are running, use this command. If you are not using Apache 2.4.x or later you should upgrade Apache.

After you have made sure that Apache is up to date, the next item to look into is what MPM Apache is using. The default option used tobe PreFork,but currently the best Apache MPM is Event. I'll cover each of these below and explain what each MPM is good for.

It is appropriate for sites that need to avoid threading for compatibility with non-thread-safe libraries. It is also the best MPM for isolating each request, so that a problem with a single request will not affect any other.

This MPM is very self-regulating, so it is rarely necessary to adjust its configuration directives. Most important is that MaxClients be big enough to handle as many simultaneous requests as you expect to receive, but small enough to assure that there is enough physical RAM for all processes.

The most important setting for Prefork is Max Clients. If the server has limited RAM, please try and keep this low so that the server does not run out of RAM.

Pros

Tried and true. This is the most compatible MPM and you can use DSO, SuPHP and most other php handlers with this MPM.

Also the most flexible MPM in terms of application compatibility.

Cons

It's old and is not able to handle as many requests as Worker or Event.

Not the most efficient on systems with multiple CPUs / Cores

Not considered to be a high performance MPM, large sites or busy sites might get better performance with Worker or MPM

Summary
For the most part, this just works and performs well assuming you do not have a massive amount of traffic to your server. There is nothing wrong with using Prefork, and if performance is an issue you might benefit more from using a caching plugin like w3totalcache along with memcached instead of moving to a Worker MPM.

This Multi-Processing Module (MPM) implements a hybrid multi-process multi-threaded server. By using threads to serve requests, it is able to serve a large number of requests with fewer system resources than a process-based server. However, it retains much of the stability of a process-based server by keeping multiple processes available, each with many threads.

The most important directives used to control this MPM are ThreadsPerChild, which controls the number of threads deployed by each child process and MaxClients, which controls the maximum total number of threads that may be launched.

Is preferred for high traffic sites because it is able to serve a lot of requests using little RAM. It does this by using threads to process requests. There is still some isolation since Event spawns multiple processes which each have their own threads. By using multiple threads instead of just processes libraries do not need to be loaded as much so less RAM is used.

If a single thread was causing issues it would only effect other threads within that process, it would not effect the other processes and their threads.

Event has a dedicated thread that only handles keep alive connections. It then passes requests to the child threads if one of the keep alive connections actually makes another request. Because the child threads don't care about keep alive, they can immediately move on after they process the request instead of having to wait to see if another request comes.

PHP is NOT thread safe so it's not possible to embed the PHP interpreter into each process. To get around this issue you would use something like PHP-FPM to handle the PHP processes. This means that Event is really only handling static content and passing along the PHP processing to PHP-FPM which runs it's own set of processes.

In general, EVENT uses less memory than PREFORK and provides much better performance in terms of requests per second.

This is just a quick example of what is needed to modify Apache Event server settings. You might want to lower these settings if you are running a server with little RAM, then again this configuration uses little RAM, with 2 start servers Apache uses about 57MB total when it's not hammered with requests.

I've found that in general, using the default settings for Apache Event is really all you need to do to optimize Apache. Most of the time, PHP and MySQL are the slowest links in the chain, so spending time optimizing PHP and MySQL is the best place to start.

This module can significantly help website performance. It has a large amount of "filters" that will be applied to a website when a request is made to help optimize the site. This module can help to improve load time and can also help to lower the server load.

WARNING You can add pagespeed to a cPanel server, however I have not done a lot of testing with this. Use caution with this!

This module controls the setting of the Expires HTTP header and the max-age directive of the Cache-Control HTTP header in server responses.

Most of the time, this module is enabled by default, to make sure it is enabled run the following:

httpd -M | grep -i expire

If this is enabled, then you can configure this in a site's .htaccess file. Doing this can significantly improve the user's load time by telling the browser to cache static images and files in the browser so that they do not have to re-load this information the next time they visit the site.

The mod_deflate module provides the DEFLATE output filter that allows output from your server to be compressed before being sent to the client over the network.

This should be enabled by default, however to make sure it is installed and enabled, run the following command:

httpd -M | grep -i deflate

If this is enabled, you can tell Apache to compress files before they get sent to the browser. This will help to reduce bandwidth usage. This can increase CPU usage on the server, however if the server is using a decent CPU then this is safe to enable.

You can enable this in .htaccess. For more detail, please see the .htaccess section below.

Recently, I noticed that cPanel decided it would be a good idea to place a TON of possible directory indexes that Apache should look out for and specify them in the main httpd.conf file. This is great for compatibility, however I found out that if you don't place the most common indexes first, Apache will search each directory under the request path for every index option listed in this section. I was able to see this activity when I used sysdig to view system calls that Apache makes when a webpage is requested.

What that means is that if my www/ directory for my site is located in

/home/$user/www/

Apache is going to look for every possible index file until it finds a match. If I use "index.php" but I don't specify it first in the list, then apache is going to search for like 20 files every time a request is made.

does /home/$user/www/index.html.var exist?
no
does /home/$user/www/index.htm exist?
no
does /home/$user/www/index.html exist?
no
does /home/$user/www/index.shtml exist?
no
does ........
OMG index.php exists, time to serve it up!

The point is that if you only use index.php or index.html and you never use any of the other types of indexes, then make sure you put index.php and index.html at the front of the list so that Apache doesnt have to search for 20 something types of indexes every time you make a request.

Enabling mod_deflate can help reduce the size of your website, which makes it much faster to load in your browser. This can use a decent amount of CPU to compress the resources, however if your server is not CPU bound then this is generally a very good thing to enable. Most of the current processors should be able to handle this without issue. By current I mean E3-1200 series and E5-2600 series.

You can configure many different types of files to be cached by default, however a good starting point is to focus on images and add more as needed. Assuming that mod_expires in already enabled for Apache, all you need to do is slap this in the site's .htaccess and you should be good to go.

To optimize PHP, the first place to start is with opcode caching which will significantly improve PHP performance and lower CPU usage at the same time. Enabling OPcache with PHP-FPM or mod_fcgid can improve performance by 4 to 5 times in some cases. If you are not using PHP 5.5 with OPcache you are missing out. If you still want more performance out of PHP I suggest you look into using HHVM if possible.

To optimize MySQL you should make sure you are using innodb_buffer_pool_size and configuring it in a way that makes sense. If you have 2GB of InnoDB tables that are actively used you should have an innodb_buffer_pool_size of at least 2GB.

Regardless of what happens on the client or server side. There will always be a small amount of latency that is ruled by the speed of light. It takes about 21ms for something to travel from New York to San Francisco. From New York to London, it will take about 28ms. Keep in mind that this is a one way measurement, a round trip "RTT" would take double this time. This assumes there is an optical cable ran between these two points which is a best case scenario.

Users have reported a delay, or lag, when 100 - 200 milliseconds are added to a request. And a delay of 1 second can become extremely annoying for a user. This means that by shaving off anything over 100 ms will help to improve the user's experience.

For today's applications, responding to a request within a few hundred milliseconds is a must.

This means that when a user accesses a website the page needs to be able to load completely, or at least start to display content in UNDER 1 second.

Last Mile Latency Is often the slowest part of a request. This is due to the request moving through the core routing infrastructure of an ISP, which is where many other requests are going.

To determine how much latency is caused by the last few hops, use a command such as traceroute to see where the latency is highest.

The 3 way handshake that occurs for every new connection adds latency to the request. Since the 3 way handshake must be completed before data can be sent, it is critical that the application reuse connections as much as possible. If the application is designed to create a new connection after every new request it will be adding in additional latency every time, and those new connections are not sending data, they are simply reconnecting.

TCP Window Scaling Was created to prevent networks from collapsing on themselves. This was happening when two networks were talking to each other, but one was significantly slower than the other. Client or server requests would begin to pile up in the router / switch buffer and would eventually get so delayed that they would be dropped. This would begin to happen all the time, which prevented the network from serving requests.

To avoid this, the server and clients now implement Window Scaling which is designed to let both sides determine how much data they can handle at one time. If the network connectivity is fast and good this gets raised so that more data is sent per trip, which increased speed.

To check tcp window scaling is enabled on a Linux system you can use the sysctl command

Slow Start Was designed to address the issue of the underlying network capacity. TCP Window Scaling helped to prevent the client and server from getting overloaded, however it did not help prevent the network from getting flooded. Slow Start was a solution, or at least one solution to this issue. This is set by the Congestion Window Size on the server which basically determines the maximum amount of network segments that can be "In Flight" at any given time without receiving an ACK from the receiving end. This starts out slowly, with a limit of 4 segments, but as the client sends more ACKs, this gets raised further and further.

If there is a network issue, this value gets dropped back to the last known value and starts over the process again.

It is important to keep this in mind for applications. Since the Window size starts off slow, as well as the number of network segments in flight, the first few connections will have a higher latency than the following connections, which is WHY IT IS IMPORTANT TO KEEP CONNECTIONS OPEN AS LONG AS POSSIBLE.

Congestion Window size has been set at 4 for a while now in Linux. Starting in 2013 that will be raised to a value of 10. Tuning this value and raising up to 10 could help to improve initial response times!.

To set a custom value for the interface you want to change:

ip route show
## Look for a line that starts with "default" like below
default via 192.168.1.1 dev eth0 proto static
## To set a customer cwnd, change the route by pasting in the line above, and add the value at the end
ip route change default via 192.168.1.1 dev eth0 proto static initcwnd 10

Slow Start has a restart mechanism in place that will reset the congestion window value if a connection has been idle for a small period of time. For Web Servers that use Keep Alive, this can cause performance issues since the scaling process would need to start over again. To prevent this, it is recommended that the reset function is disabled on Linux Web Servers.

The first thing to do if you want to optimize your server is to make sure that the Operating System is up to date! Since a lot of the TCP/IP standards are constantly being updated, the most efficient way to make sure that your server is optimized is to make sure you are running the latest Kernel version (stable release of course!).

Increase TCP Initial Congestion Window Starting at a larger value for this will help to send more data sooner. This helps to accelerate Window growth, this is especially effective on servers that handle small, bursty type activity. I'll add in a section on how to change this, but for now see the section above which covers this.

Slow Start Restart Disabling this restart on idle connections will improve performance of long lived connections. This can help improve Keep Alive performance. Again, see the section above for instructions on how to do this.

Window Scaling On most servers this is enabled, but if you are going to tune a server, it's a good idea to check this setting to make sure it is enabled. Enabling this helps the server send more data per trip, which reduces the impact of network latency. This is also covered in the section above.

A useful Linux command to view current connections and their settings is:

ss --options --extended --memory --processes --info

Below is a checklist of things to check and consider when attempting to optimize server and application performance:

To start off, there are a few items that can easily be disabled, enabled, or configured to help reduce the latency and number of requests a server has to handle.

Disable DNS Lookups This can be done for Apache or Nginx, or any web server. By doing this you reduce latency and the amount of work the server has to handle.

Use a CDN Static content gets served from a server that is located closest to the client. This also reduces the server load and helps to improve the responsiveness of the website.

Add Expires Header and configure ETags Enable this for certain types of files that appear on multiple pages, or for images and static content that does not change often. This reduces the amount of redundant data that needs to be transferred for each request. Configuring ETags provide cache revalidation and add a fingerprint, or timestamp for the files.

Use Gzip All text based resources should be compressed before transfer. This reduces bandwidth usage.

Enable KeepAlive If this is not enabled, then each time something is requested the client and server need to make a new TCP connection, which means another 3 way handshake each time. This is like driving to the store to buy some lunch meat, returning home, then driving back to buy some bread so you can make a sandwich. You don't do this in real life, since The Internet is considered real life, don't do it here either. Enable this for Apache, Nginx, or whatever else you use!

The examples below may or may not help performance. Since these options can add complexity to the application, they are not something that should be blindly applied to an application.

Concatenation of Java or CSS By combining multiple JavaScript or CSS files into a single resource, you can reduce the amount of additional requests / trips that the browser must make to the server.

Spriting Similar to the previous method, combine images into a larger, composite image to reduce the amount of requests / trips that the browser needs to ask for.

Reuse TCP Connections Use Keep Alive wherever possible. Doing this reduces the amount of TCP handshakes that must occur for new connections, in turn this eliminates unnecessary latency, which makes your site faster and also makes your end users happier!

Reduce the amount of redirects Every time you redirect someone to another hostname, there is an additional DNS lookup that takes place. Again, this increases latency.

Use a CDN End users make requests to servers that are closer to them, reducing latency. This can be done for both static and dynamic content.

Eliminate Unnecessary Requests No request is faster than the one that was never made! Is there some commented out novel in your code that doesn't need to be there? Remove it! Every byte matters.

PageSpeed can be added to any server runtime and applied dynamically to any application. This is available as a module for Apache and Nginx. This helps to optimize resources based on a lot of "web optimization filters".

By default, the module uses "CoreFilters", which is a safe set of rules that most sites benefit from, if you want to disable everything and only enable certain items, you can do this globally here, however it's recommended to just enable / disable things per vhost using .htaccess or vhost directive. You would just uncomment the line below