Login

Optimizing System Performance

In this second part of a two part series, you’ll learn how to use debuggers and optimize performance. It is excerpted from chapter 12 of the book Zend PHP Certification, written by George Schlossnagle et al (Sams; ISBN: 0672327090).

Logging and Debugging

Displaying error messages to the browser is a problem from many points of view. First, you’re letting your visitors know that something in your code is broken, thus shaking their confidence in the solidity of your website. Second, you’re exposing yourself to potential security vulnerabilities because some of the information outputted might be used to hack into your system. Third, you’re preventing yourself from finding out what error occurred so that you can fix it.

A good solution to this problem consists of changing your php.ini setting so that the errors are not displayed to the screen, but stored in a log file. This is done by turning display_errors to off and log_errors to on, as well as setting a log file where the error messages are stored through the error_log option. You can open a shell and use tail –f to follow the PHP log.

If you want to go a step further, you could use the set_error_handler() function to define your own error handlers and log additional information that you might find useful when trying to troubleshoot the problem.

Naturally, you can also use the error-suppression operator @ to prevent PHP from displaying or logging the error. Although this is an easy way to solve the problem, it could lead to problems when using in production scenarios in which you do need to find out when an error occurs so that you can fix it.

Using Debuggers

Ultimately, not all bugs can be solved just by staring really hard at the code (although it often helps to). In some cases, you just need to "see" the program running to discover what’s causing it not to perform properly. What you need is a debugger.

A lot of debuggers exist, starting with the ancient DBG (now integrated into NuSphere’s PHPEd) and progressing to APD, XDebug and the debugger that’s integrated into the Zend Studio IDE. Most debuggers have the capability to set breakpoints on specific lines in your code and watch points where you can watch the global scope of your PHP variables.

Using a debugger, you can step through each line of your application and see exactly how it flows. As such, you should definitely be familiar with one because some day you’re definitely going to need one.

{mospagebreak title=Optimizing Performance}

Performance is a "happy problem" until the day it falls in your lap. Nothing can ruin your day like a pointy-haired manager screaming in your ears because the website is not responding well to an increase in traffic.

Although it won’t have an immediate impact on your ability to go live, measuring the performance of a website is an important step that will come in handy on the day in which you will need to troubleshoot it.

Hardware Issues

Naturally, the easiest way to help a system that is ailing because of too much traffic is to throw more hardware at it. You could increase your onboard RAM or the speed of your hard disks, or you could even add another server altogether.

Another good idea is to ensure that your data is all stored in the right place. By saving the logs on a separate disk or partition than where your main application files are stored, you can help the operating system optimize its caching mechanisms and provide higher performance.

Although a well-configured computer is a great starting point as far as ensuring that your application is performing to the best of its capabilities, eventually you are going to find that an alternative solution is required since you obviously can’t simply add new servers to your farm indefinitely.

Web Server Issues

Proper web server configuration goes a long way toward improving performance. A good starting point is to turn off reverse DNS resolution since you don’t need it at the time when your web server is simply logging information about site access. You can always perform that operation offline when you analyze your logs.

It’s also a good idea to familiarize yourself with how the web server you’re using works. For example, Apache 1.3.x is a forking web server—meaning that it creates copies of its own process as children. Each child process waits for a connection (for example from a web browser) and, if there are more connections than available idle children, the server creates new ones as needed.

In its default configuration, Apache pre-forks 5 children and has a maximum of 150. If you consider that each child requires between 2 and 5 megabytes of memory to run (assuming your scripts don’t require even more), this could easily lead to a performance bottleneck if the traffic on your server goes up. At maximum load, 150 child processes could require between 300MB and 750MB of RAM. And, if you run out of physical memory, the operating system will switch to its virtual memory, which is significantly slower.

This problem can also become self-compounding. As more and more child processes are created and the system is forced to rely increasingly on virtual memory, the average response time will increase. This, in turn, will cause even more child processes to be created to handle new connections, eventually exhausting all your system resources and causing connection failures.

As a result, a careful read of your web server’s documentation is probably one of the cheapest (and smartest) investments that you can make. Do take the time to tune the appropriate configuration options for minimum and maximum clients and only compile or activate those web server modules you really need in order to save memory consumption.

Avoid Overkill

If you’re dealing with a mixture of static and dynamic content, it’s a good idea to keep things simple and let a lightweight web server handle the static data. Because you don’t need any of the advanced features provided by PHP and Apache, using a different server that requires fewer system resources to run will increase your performance. You can even move the static data to a different server altogether and neatly divide the work across multiple machines.

Zip It Up

HTML is a very verbose language. As a result, web pages are often rather large—although maybe not as large as, say, a video or audio stream. Still, even a 20KB page will take its sweet time across a slow dial-up connection.

PHP makes it possible to compress the output of a script so that it can travel faster to the user. This can be done in a number of ways—for example, you can enable the GZIP buffer handler in your php.ini file or turn it on directly from within your scripts:

ob_start("ob_gzhandler");

Naturally, the output of your scripts will only be compressed if the browser that is requesting the document supports the GZIP compression standard.

{mospagebreak title=Database Optimizations}

Although we’ve briefly discussed databases in Chapter 9, "PHP and Databases," it’s a good idea to start thinking about them in terms of performance. When you execute a database query, you depend on an external resource to perform an operation and, if that operation is slow, your entire website will suffer.

There is no predetermined "maximum number of queries" that you should use when writing database-driven websites. Generally speaking, the higher the number, the slower a page will be, but a single badly written query can slow down a web page more than 20 well-written ones. As a general guideline, most developers try to keep the number of queries performed in every page below five—however, many websites use a higher number without suffering any significant performance degradation.

Optimizing the tables that your queries use is the first step toward ensuring fast data access. This means that you will have to normalize your database so that a particular field is stored only in one table and each table is properly linked with the others through foreign keys. In addition, you will have to ensure that all your tables have been properly indexed to ensure that the queries you execute can take full advantage of the DBMS’s capability to organize data in an efficient way.

Naturally, your optimizations should not come at the expense of security. Always make sure that you escape all user input properly (as discussed in Chapter 9) and that the statements you perform are safe even if the database itself changes.

For example, consider this simple query:

INSERT into my_table
values (10, 'Test')

This query expects that my_table will always have two fields. If you extend it to include additional columns, the query will fail. This might seem like a far-fetched scenario, but it really isn’t. A complex application often includes hundreds, or even thousands, of queries, and it’s easy to forget that one exists when making such sweeping changes.

On the other hand, it’s easy enough to fix this problem by simply rewriting the query so that it specifies which fields it intends to insert data in:

INSERT into my_table (id, name)
values (10, 'Test')

In this case, it will be a lot more difficult for an error to crop up—but by no means impossible. If the new fields you have added to my_table do not accept null values and have no default values defined, the query will still fail because the database won’t accept empty columns. Thus, you really have to be careful when making changes to your database!

{mospagebreak title=Keep Your Code Simple}

If you’re coming from a Java background, you might be used to writing a large infrastructure of classes that rely on each other to perform a particular task.

Don’t try this with PHP! PHP’s OOP features work best when your framework is small and efficient. Creating objects in PHP is a rather slow process, and, as such, it should be used conscientiously.

Caching Techniques

Sometimes, it’s just not possible to optimize your code beyond a certain point. It might be that your queries are too complicated or that you depend on a slow external resource, such as a web service, over which you have no control.

In these cases, you might want to think about using a caching solution that "saves" the output of an operation and then allows you to access it without performing that operation again.

There are several types of cache; for example, you can save the results of a database query, or even an entire web page. The latter means that you generate your pages normally at predetermined intervals and save them in the cache. When a page is requested by a user, it is actually retrieved from the cache instead of being generated from scratch.

You can find several packages in the PEAR repository that are useful for output caching of various type. Naturally, there are also commercial solutions that perform a similar task, such as the ones provided by Zend.

Bytecode Caches

When PHP runs your scripts, it does so in two steps. First, it parses the script itself, transforming it into a sort of intermediate language referred to as bytecode. Then, it actually interprets the bytecode (which is simpler than PHP itself) and executes it. If your scripts don’t change between one execution and the next, the first step could easily be skipped, and only the second step would have to be taken.

This is what "bytecode caches" do. They are usually installed as simple extensions to PHP that act in a completely transparent way, caching the bytecode versions of your script and skipping the parsing step unless it is necessary—either because the script has never been parsed before (and, therefore, can’t be in the cache yet) or because the original script has changed and the cache needs refreshing.

A number of commercial and open-source bytecode caches (also called accelerators) are available on the market, such as the one contained in the Zend Performance Suite, or the open-source APC. Most often, they also modify the bytecode so as to optimize it by removing unnecessary instructions.

Bytecode caching should always be the last step in your optimization process because no matter how efficient your code is, it’s always going to provide you with the same performance boost. And, as a result, it could trick you into a false sense of security that would prevent you from looking at the other performance optimization techniques available.

{mospagebreak title=Exam Prep Questions}

How can the following line of code be improved?

$db->query("insert into foo values($id,$bar)")

Use addslashes and sprintf to avoid security holes and make the code cleaner

B. Split the query over several lines

C. Use mysql_query() instead of $db->query()

D. Define the table fields that will be affected by the INSERT statement

E. Use mysql_query() instead of $db->query() and addslashes to avoid security hole

Answers A, B, and D are correct. First of all, you need to ensure that the query is secure; this is done by executing addslashes (or the equivalent function for your DBMS of choice) to prevent scripting attacks. If your query is long, it’s not a bad idea to split it over several lines to get a better overview of your code. Use sprintf() where possible to make the code cleaner. Finally it’s always a good idea to define the table fields that will be filled by an INSERT statement to prevent unexpected errors if the table changes.

You developed a big application accessed by several thousand users at the same time. Suddenly, your web server stops responding and users are getting connection errors. What could have happened?

A. The database server was terminated because of the unusually high amount of database accesses.

B.The web server was misconfigured so that it ran into virtual memory usage and consequent resource starvation because of too many child processes.

C. You didn’t optimize your code design properly.

Answer B is correct. Although it could be possible that the database server was killed because of the many requests from the users, they should at least be able to see the HTML pages from the website because the web server would still be running. If connections are timing out, it is likely that the server ran into swap space because of misconfiguration of the number of concurrent web server child processes and crashed because of resource starvation.

You are in a team of developers working on a number of different business applications. Your project manager tells you that in two weeks another three PHP developers will join the team and that you have to ensure that they will be ready to dive in to the current PHP code without problems. What could you do?

A.Write proper end user documentation on how to use the web front end.

B. Write proper end user documentation and generate proper PHPDoc comments inside the code to get an API documentation.

C. The absence of documentation will actually encourage the new developers to delve more deeply into the code.

Answer B is correct—or, at least, as correct as you can get in a general situation. The key here is that you should write proper documentation at the same time as you’re writing your code. You could then use a tool such as PHPDocumentor to generate a nicely formatted API documentation in HTML or PDF and make it available to any new developers who join your team.

Suppose that you are receiving input from the user in the form of the string "0mydeviceid" for a field for which you only allow valid numeric values. You want to test if this variable is equal to 0 and, if it isn’t, output an error. Which comparison operation should you use?

A. (0 = "0mydeviceid")

B. (0 == "0mydeviceid")

C. (0 === "0mydeviceid")

D. None of the above

Answer D is correct. Because PHP is automatically trying to convert the string "0mydeviceid" to 0 when comparing it with the equal operator == , your condition in answer B evaluates to true even though the user input is not a valid numeric value. The expression in answer C, on the other hand, correctly determines that the user input is not a valid integer—but that will always be the case because you’re likely to always receive user input in the form of a string—so, even if that string can be converted to an integer value, the identity test will fail.