The example is quite self-explanatory, but let’s take a look and go step-by-step to see what happened exactly. First, we create a Process instance, which is the main class of the component. Besides passing the command we want to execute, it is also possible to pass the working directory, environment variables or a timeout.

As soon as we call the run() method, the command is executed and the PHP interpreter waits until the command finishes. In case that the execution was not successful (exit code different than 0), it throws an exception. If the execution was successful, it simply prints out the command output.

The exit code is also used to determine if a command was executed successfully, by the isSuccessful() method, which returns true if the exit code is 0 and false otherwise.

Long Running Processes

When we have to deal with long running processes things tend to get a little trickier, as we have take into account things like timeouts, incremental outputs, responsiveness, and signals. The Process class provides ways to make these problems manageable.

Timeouts

There are two available timeouts: process timeout (max runtime) and process idle timeout (max. time since last output). In the following code, as the ping command in Unix systems runs infinitely (unless we specify the “-c” option), a ProcessTimedOutException exception will be thrown after 10 seconds:

This would not be true with the idle timeout, as most of the time the ping command outputs new information in less than 10 seconds. In this case, the process will probably run until exceeds the memory_limit setting:

Outputs

In long running processes we need some sort of “real time” output so the user perceives that the process is still running and is not dead. There are two ways to do this: outputting the command output as soon as it gets available, or printing some “loading” or “in progress” message. Let’s see an example of both approaches using again the ping command:

In the following example, we pass PHP callable to run whenever there is some output available on STDOUT (standard output) or STDERR (standard error). Each time there is output available, we print a dot so the user knows the command did not hang and is still running.

It is also possible to print the command output as soon as it gets available defining a second parameter in the callable. This time, the script will print the command output as soon as it gets available:

Executing PHP Code

The component provides the class PhpProcess (which extends from Process), to execute PHP code in isolation. That means that it is run in a different process so no variables or open resources are shared between them.

Under the hood

If you are following the posts of this series you know that we always like to dive a little bit deeper and find out how the component is made internally. Usually, the official documentation for Symfony components is excellent so we try to give back to the community by explaining them in a different way and trying to share roughly how they work internally.

Internally, the Process class makes use of the proc_open() function to execute a command and open file pointers for input/output. The proc_open() function is not straightforward to use, as it needs a descriptor specification.

The descriptor specification is an indexed array to tell the function how we want to handle stdin, stdout and stderr. By default, in the component, pipes are used, but it can be configured to use a file for stdout instead. These are the parameters that proc_open() receives when executing “ls -lh”:

The option “suppress_errors” is only for Windows systems and suppresses errors generated by the proc_open() function, while “binary_pipes” forces to open pipes in binary mode, instead of using the usual stream_encoding.

The Process component is one of the oldest in the Symfony framework, it is quite interesting to view the transformation it went through over 4 years, take a look at how simple it was (254 LoC) and how much more complete it became (1446 LoC). This is a big reason why it is great to reuse well developed and tested libraries.

Finally, as a curiosity, the PhpProcess class, which is used to execute PHP code, already supports HHVM. The PhpExecutableFinder class, used to find the PHP binary, checks whether HHVM is being used by reading the HHVM_VERSION constant, only available when HHVM is the current engine.

Upcoming Conferences

Even in today’s world, with fast networks and almost unlimited storage, data compression is still relevant, especially for mobile devices and countries with poor Internet connections. This post covers the de-facto lossless compression method for compressing text data in websites: GZIP.

Morse code is one of the first lossless compression standards. More frequent letters get shorter codes

GZIP compression

GZIP provides a lossless compression, that is, we can recover the original data when decompressing it. It is based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman coding.

The LZ77 algorithm replaces repeated occurrences of data with references. Each reference has two values: the jump and the length. Let’s see an example:

As you can see, the strings ” hosting ” and ” PHP ” are repeated, so the second time that the substring is found, is replaced by a reference. There are other repetitions too, like “er”, but as there would not be any gain, the original text is left.

Huffman coding is a variable-length coding method that assigns shorter codes to more frequent “characters”. The problem with variable-length codes is usually that we need a way to know when a code ends and the new one starts to decode it. Huffman coding solves this by creating a preﬁx code, where no codeword is a preﬁx of another one. It can be understood more easily by an example:

ASCII is a fixed-length character-encoding scheme, so the letter “e”, which appears three times and is also the most frequent letter in the English language, has the same size as the letter “G”, which only appears once. Using this statistical information, Huffman can create a most optimized scheme:

Huffman: "1110 00 01 10 00 01 1111 01 110 10 00" (27 bits)

The Huffman method allows us to get shorter codes for “e”, “r” and “v”, while “S” and “G” got the longer ones. Explaining how to use the Huffman method is out of the scope of this post, but if you are interested I recommend you to check this great video from Computerphile.

DEFLATE, which is the algorithm used for GZIP compression, is a combination of both these algorithms.

Is GZIP the best compression method?

The answer is NO. There are other compression methods that get higher compression ratios, but there are a few good reasons to use it.

First, even though GZIP is not the best compression method, it provides a good tradeoff between speed and ratio. Compressing and decompressing data with GZIP is fast, and the ratio is quite decent.

Second, it is not easy to add a new global compression method that everyone can use. Browsers would need to be updated, which today is much simpler using self-update mechanisms. However, browsers are not the only problem, Chromium tried to add support for BZIP2, a better compression method based on the Burrows–Wheeler transform, but had to cancel it as some old intermediate proxies corrupted the data as they were not able to understand the bzip2 header and tried to gzip the contents. You can see the bug report here.

GZIP + HTTP

The process between the client (browser) and the server to get the content gzipped is simple. If the browser has support for GZIP/DEFLATE, it lets the server know by the “Accept-Encoding” request header. Then, the server can choose whether sending the contents gzipped or raw.

Implementations

The DEFLATE specification provides some freedom to developers to implement the algorithm using different approaches, as long as the resulting stream is compatible with the specification.

GNU GZIP

The GNU implementation is the most common and was designed to be a replacement for the compress utility, free from patented algorithms. To compress a file using the GNU GZIP utility:

$ gzip -c file.txt > file.txt.gz

There are 9 levels of compression, being “1″ the fastest with the smallest compression ratio and “9″ the slowest with better compression ratio. By default, “6″ is used. If we want maximum compression at the cost of using more memory and time in the process, the -9 flag (or –best) can be used:

$ gzip -9 -c file.txt > file.txt.gz

7-zip

7-zip implements the DEFLATE algorithm differently and usually achieves higher compression ratios. To compress a file with the maximum compression:

7z a -mx9 file.txt.gz file.txt

7-zip is also available in Windows and provides implementations for other compression methods such as 7z, xz, bzip2, zip and others.

Zopfli

Zopfli is ideal for one-time compression, for example, in build processes when the file is compressed once and served many. It is ~100x slower, but compresses around 5% better than other compressors.

Enabling GZIP compression

Apache

The mod_deflate module provides support for GZIP compression, so the response is compressed on the fly before being sent to the client over the network.

It is also possible to serve pre-gzipped files instead of doing every time on the fly. This is especially useful for files that don’t change in every request such as JavaScript or CSS files, which can be compressed using a slow algorithm and then served directly. In your .htaccess, include this:

What we are doing here is telling Apache that files with .gz extensions should be served with the gzip encoding-type (line 2), checking that the browser accepts gzip (line 3) and if the gzipped file exists (line 4), we append .gz to the requested filename.

OpenSSL on VPS

If you are running a VPS with CentOS 6.x, Ubuntu 12.04, or Debian 7, we highly recommend that you upgrade OpenSSL immediately. If you have a previous OS version you are not vulnerable, unless you upgraded OpenSSL manually, to be sure check your OpenSSL version. Here’s how to do it:

Please note that in order to use the updated library, you will need to restart Apache or other servers that use SSL.

UPDATE: Due to the severity of the issue we have upgraded all VPS with OpenSSL installations that we found to be vulnerable.

OpenSSL on Shared Hosting

On our shared hosting servers we run a previous version of OpenSSL which is not vulnerable so no update is required.

What steps should you take to secure your applications?

If your system was affected, it is advisable to take steps to secure your application. Even though there is no way to know if your system was compromised, the safest option is to act as if it were.

1) If you are using SSL certificates for your sites there is a risk that your certificates have been compromised. So we recommend that you ask your certificate provider to re-issue your certificates and then replace your certificates with the new ones.
2) Change any passwords or other credentials that were encrypted by your old SSL certificates.
3) If your application has user accounts, we recommend you change the passwords on all user accounts
4) If you’re using phpMyAdmin or phpPgAdmin on our servers you should change these passwords.
5) You may want to invalidate all current sessions after requesting your users change their passwords to rule out any potential session hijacking.

You can find more information about the heartbleed bug at http://heartbleed.com

Upcoming Conferences

This is the 10th post in our series on Symfony2 components and we will cover the latest component added to Symfony: the ExpressionLanguage component. This component was added in version 2.4 and provides a way to have dynamic aspects in static configurations. For example, it can be used to evaluate expressions in configuration files, create a DSL ,or build a business rules engine.

The ExpressionLanguage component adds a bit of “color” to static data

Installation

Simple example

Imagine we want to create a blog system where users can create their own blogs. Also, we would like to give users some flexibility by letting them to define if a given article is featured or not based on almost anything. It could be based on the number of visits that the article has received, the category, or even something weird as the current time. The expression that determines if a given article is featured or not in run time would be saved in the database too.

Doing this in a classic way would be cumbersome, we would need to define fixed rules and force users to choose between one of them… unless we use eval().

The number of visits of the post is 15 and the expression to make it featured is “$article->getVisits() > 10″, so when evaluated returns true. The problem of this approach is that we are using eval() and we all know that eval is evil as it allows execution of arbitrary PHP code. In this example, eval() works fine and adding a return statement we get the result of the comparison “15 > 10″, but that will not be always the case. Since we are letting users define their own expressions that will be executed by the PHP engine, a malicious user could configure his blog with something like “exec(‘rm -fr *’)”.

To quote Rasmus Lerdorf, “if eval() is the answer, you’re almost certainly asking the wrong question”.

The ExpressionLanguage component elegantly solves this issue. Since it has its own engine, no raw PHP is executed. Never. The only operations that will work are those defined and whitelisted. This is same example, but now using the ExpressionLanguage component:

We created an instance of the ExpressionLanguage class to safely evaluate the expression “article.getVisits() > 10″. The evaluate() method evaluates the expression and optionally accepts an array of input parameters. The engine will only have access to the passed parameters, avoiding one of the problems of eval(), which has access to the current scope where is being executed. And it also solves the potential security problem with code execution, as the component does not execute PHP code, but a pseudo-language, which is limited and sandboxed.

Evaluate != compile

The ExpressionLanguage class provides two methods to deal with expressions: evaluate and compile.

The evaluate method evaluates the expression and returns its value. The return value can be a PHP variable of any type, even objects:

Ok, it is a bit confusing why we are kind of “repeating” the function body… the register() method takes three arguments: the name of the function and two closures, one for compiling the function (converting it into PHP code) and another for evaluating. We defined the sum_digits function, which calculates the sum of the digits of a string, and works as expected:

Caching

Parsing expressions can be slow, so the component adds a cache layer to save parsed expressions (ParsedExpression). This way the same expressions are not parsed twice in the same request. This is achieved by the parser cache: ArrayParserCache, which caches parsed expressions in an array.

These parsed expressions can also be persisted to be used between requests. We can implement our own cache layer by implementing the ParserCacheInterface, which has the methods save() and fetch(). For example, to create a simple file cache:

The save() method saves the serialized parsed expression in a file, while fetch() checks if the file exists and then reads its contents. As the key may not be suitable for file names, we use sha1() to create a hash, that will act as a filename.

To use this parser cache instead of the default one, we inject it when creating the engine object. Both evaluate() and compile() accept strings (what we are passing so far) or ParsedExpression instances:

The basic idea behind the parser is that converts a sequence of tokens to a node tree, understanding how operators work and associate to each other (unary/binary, associativity and precedence). For example, the following operations are equivalent:

compile() is faster than evaluate()

It may not be obvious, but actually, compile() is faster than evaluate(). Both methods need to tokenize and parse the expression, but compile() just returns the string containing the PHP code while evaluate() loops through the tree nodes to evaluate them on the fly.

Who’s using it?

The Symfony2 full-stack framework, in the version 2.4, uses expressions extensively in service definitions, access control rules, caching, routing and validation. But as the component is quite new, there are not many projects using it already. Here are a few:

Upcoming Conferences

It is well known that Facebook is currently the largest site built using PHP, but not everyone knows that they have created their own new language called Hack. It’s based on PHP and contains several new features and improvements. You can start learning about Hack with their online tutorial.

To run Hack they created a new runtime platform, HHVM. HHVM can provide big performance gains, which allowed Facebook to run their site in less servers, saving lots of money. Both HHVM and Hack have been released to the public so everyone outside of Facebook can use them.

HHVM is not fully compatible with PHP, yet. The HHVM development team is working hard to increase compatibility and they already support most of the major PHP frameworks. Having said this, there is a lot of custom PHP code that may not run in HHVM, so before you go ahead and install HHVM in your production servers, test your applications and sites really well.

As we said, Hack is very similar to PHP, a simple script looks like this:

In fact, you can use composer with hhvm, which will provide a big boost in performance:

$ hhvm composer.phar update

HHVM also includes a web server and a FastCGI server, giving you the option to run a standalone HHVM server or configure your Apache or NGINX using FastCGI.

Installing HHVM & Hack on ServerGrove

We have created customized packages for our customers so they can install HHVM in their VPS following some simple steps. Currently HHVM is available for Centos 6.x and Ubuntu 12.04 VPSes. We are preparing packages for additional distributions.

For Centos 6.x:

$ yum install hhvm

For Ubuntu 12.04:

$ apt-get update && apt-get install hhvm

Once installed, you can find HHVM configuration files in /etc/hhvm. The log files will be located in /var/log/hhvm. All the HHVM commands are located in /opt/hhvm/bin which is added to your PATH configuration.

Configuring HHVM

The /etc/hhvm/server.hdf file will contain most configuration options pertaining to the web and fastcgi servers.

HHVM & FastCGI

If you want to use the fastcgi mode, you will need to set this configuration in server.hdf

Server {
Type = fastcgi
FileSocket = /var/run/hhvm/sock
Port = 9000
}

This will make the server available in port 9000 or /var/run/hhvm/sock. Then you need to configure Apache with mod_fastcgi support. Install and enable mod_fastcgi and add this to the Apache configuration:

FastCgiExternalServer /var/www/html -host 127.0.0.1:9000

Our installation also includes hh_single_type_check, hh_client and hh_server which are used to run the typecheker.

Using our packages outside of ServerGrove

If you want to use our packages in your servers or VMs outside of ServerGrove, you are more than welcome. Follow the instructions on setting up our repository in your server and then install hhvm as described above:
- Centos 6.x
- Ubuntu 12.04

Conclusion

HHVM and Hack bring a new perspective into PHP and provide big performance improvements. We will keep updating our repository with new versions as they come out. At the time of publishing this post, you will install HHVM 3.0.1. And don’t forget to share with us your experience with HHVM & Hack!