About

Accounts

Friends

March232013

We use Solr a lot at InterNations. Beside usual full text searches, we use it every time we need to receive documents nearly free of charge. It is fast, it is stable and, after some wrestling, our data import works very well as well.

For more than a year we added more and more functionality to a component for building Solr/Lucene queries programmatically. We provide two different ways to create queries: for complex queries we use an expression builder, for simpler ones we have a string based class. A huge advantage of a programmatic API is security: while Lucene’s query language is read-only and therefore non-destructive, query injections can lead to serious data breaches, which both components help to avoid by escaping input strings.

January172013

PHP has traditionally been a simple, procedural language that took a lot of inspiration from C and Perl. Both syntax wise and making sure function signatures are as convoluted as possible. PHP 5.0 introduced a proper object model but you know all of that already. PHP 5.3 introduced closures and PHP 5.4 improved closures very much (hint: $this is available per default).

What is functional programming anyway?

After a few years introducing more and more functional elements to my source code, it is not that straight forward to answer. I still do not have a coherent definition, but “I know it when I see it”. Let me put it that way: functional programs generally do not alter state, but use pure functions. Pure functions take a value and return another value without altering its input argument. The opposite example is a typical setter in an object oriented context.

Typical functional programming languages support higher order functions, that is, functions that take or return other functions. A lot of them support a concept that is called currying or partial function application (PFA).

Functional programming has a lot of desirable attributes: not fumbling with state makes parallelism easier (not easy, it’s never easy), focusing on the smallest unit of reusable code, a function, can lead to really interesting effects with regards to reusability, requiring functions to be deterministic is generally a good idea for stable software.

What does PHP has to offer?

PHP is not a “real” or “pure” functional language. Far from it. We don’t have a proper type system, the cool kids make fun of our exotic syntax for closures and we have array_walk() that looks functional but allows altering state.

Nevertheless, there are a few interesting building blocks for functional programming. Let’s start with call_user_func, call_user_func_array and $callable(). call_user_func takes a callback and a list of arguments and invokes the given callback with the given arguments. call_user_func_array does something similar, except, it takes an array of arguments. That’s pretty much the same as fn.call() and fn.apply() in JavaScript (without passing a scope). A less well known shiny new thing in PHP 5.4 is the ability to call functions directly on callables. callable is a meta type in PHP (as in: it consists of various underlying types): a callable can be a string to call simple functions, an array of <string,string> to call static methods, and array of <object,string> to call object methods, an instance of Closure or anything implementing the __invoke() magic method, also known as a Functor. This will look like that:

$print = 'printf';
$print("Hello %s
", 'World');

Additionally, PHP 5.4 introduces a new type hint “callable” that provides a simple contract for the callable meta type.

PHP also supports anonymous functions. As I said before, the Haskell community made fun us but hey, we finally have it. The jokes are somewhat expected, because the syntax is somewhat verbose. Let’s see a simple Python example.

map(lambda v: v * 2, [1, 2, 3])

Nice. Let’s see a little Ruby example:

[1, 2, 3].map{|x| x * 2}

Also pretty, nevertheless we use a block here and not strictly a lambda expression. Ruby has lambda as well, but List.map happens to take a block, not a function. Next example is Scala:

List(1, 2, 3).map((x: Int) => x * 2)

For a strictly typed language that is pretty compact. Now look at PHP:

array_map(function ($x) {return $x * 2;}, [1, 2, 3]);

A function keyword and no implicit return is what makes it look quite cumbersome. But anyway, it works. Another building block for functional programming.

array_map is a good start, there is also array_reduce. Another two important functions.

A real world functional example

Let’s start with a simple program to calculate totals in a shopping cart:

Now we no longer alter state, not even in the function itself. array_map() returns a new array of cart positions with gross, tax, net amounts and array reduce puts together the totals array. But can we go further? Can we make it much simpler?

What if we destructure the program further and abstract it to what it really does:

* Sum an element of an array multiplied it by another element
* Take away a percentage of that sum
* Calculate the difference between the percentage and the sum

Now we need a little helper. This little helper is functional-php, a small library of functional primitives I’ve been developing for a few years now. First, there is Functionalpluck() that does the same as _.pluck() from underscore.js. Another helpful function is Functionalzip(). It “zips” together two lists, optionally using a callback. Functionalsum() sums the elements of a list.

A good counter argument is: is that really easier to read. At first: no, at a second look: you’ll get used to it. It took me while to get used to the syntax of Scala, it took a while to learn object oriented programming and it takes a while to grasp functional programs. Is this the perfect solution? No. But it shows what you can do by thinking more in terms of applying functions to data structures instead of using expressions like foreach to handle work on data structures.

What else can we do?

Ever had issues with null pointer exceptions? There is php-option that provides an implementation of a polymorphic “maybe type” using a PHP object.

Than there is partial application: it transforms a function that takes n parameters to a function that takes <n parameters. Why is that helpful? Think about extracting the first character from a list of strings.

Yes. … (HORIZONTALELLIPSIS, U+2026) is a valid function name in PHP. But if you do not like that, use Curryplaceholder() instead.

The end

Functional programming is a fascinating topic and I needed to name a single thing that I learned the most from in the
last years, it was looking into functional paradigms. It’s so different it will make you brain hurt. But in a good way.
Ah, one last thing: read Real World Functional Programming. It’s full of good advice and real world examples.

Update

Thank you Christopher Jones for fixing the higher order function example (the second step).

Update II

Thank you Anthony Ferrara for pointing out that the array_map example was wrong. Gotta love parameter ordering.

December202012

Every API has an visible, an invisibile and a hidden part. The visible part is obvious: public methods and properties but also constants and parameter values. That’s the most visible part to any client (read: user) of your API. The invisible part is everything private, you can’t really see it and – more important – you can’t use it (except if you resort to reflection). The hidden part consists of all the protected symbols, as you can’t really see them until you extend a class. The other hidden part are Exceptions. You can’t really see them and there is no common expectation what methods throw what kind of exception. Yes, throws@-docblocks help, but that’s mostly all we have.

Exceptions handling: the problem

The usability of hidden parts of an API is all about expectations: people love languages like Ruby because once you learned a certain set of API (e.g. the string API), you can instinctively infer a large part of other APIs. This is good and keeps learning costs down. PHP, on the other hand, with its historically grown standard extension is on the opposite site of the fence: various parameter order the naming scheme is unreliable at best.
The future is multi-lingual, you need to know more than one programming language and speed of learning matters. Like, a lot. Because “X programmers”, for any value of X, are weak players. What type of exception a class might throw should be defined by clear expectations for the general case. If you use a preconceived HTTP client httpFoo, call method request() and want to handle exception cases, what exactly do you catch?

Talent borrows, genius steals

Zend Framework 2 has a lot of problems but there are two things they did particularly well: naming of abstract classes and interfaces and how they treat exceptions. Every component (see, component is a lie here, as they aren’t really stand alone components but I digress) has its own exception subpackage which has extension specific exceptions. Those exception all implement a single marker interface called ExceptionInterface. If you use ZendSomething and want to handle all exceptions, just catch ZendSomethingExceptionExceptionInterface.

Programming transaction costs

Time to relevant data is the new time to market. We no longer optimize for feature-complete products shipping on a certain date but relevant changes generating relevant data as soon as possible. Therefore programmer round-trips matter. I consider everything that is not core domain or core UI a round trip:

I need to create another config file

I need to write another test

I need to ad a few more specific exception classes

I need to write a new contract

These steps aren’t worthless, they are worth less from a business perspective as they don’t generate revenue very soon. However they are needed to keep revenue over time. So let’s make those things cheaper.

When dealing with Exceptions in Symfony 2 projects, two steps are particularly expensive:

Creating the initial Exception infrastructure for a bundle

Creating new specific Exception classes for bundles

Especially the latter can be simplified quite dramatically.

Simplifying

To simplify Exception handling, we just open sourced a bundle we developed at InterNations. Let’s create a few custom exceptions:

December022012

Metaprogramming is the writing of computer programs that write or manipulate other programs (or themselves) as their data, or that do part of the work at compile time that would otherwise be done at runtime

Metaprogramming is quite an interesting sub-discipline and knowing about certain techniques and tools allows you to cut corners quite dramatically for certain tasks. As always, don’t overdo but to find out when you are overdoing, first start doing, get excited, overdo, find out the right dose. Let’s have a look at what kind of tools you have available in PHP to solve typical meta programming problems.

On a lower level you typically interact with a certain kind of syntax tree to answer questions like:

Where is an array declaration happening

Where does a method start, where does it end

Do all switch statements have a break statement

A third category is adding metadata to the declared types: Java, C# and a few others have first-class Annotation support for this kind of things but PHP only has user space solutions so far. A few things you need metadata for:

This property is stored in the database as column foo

I need the dependency Bar here

This method should be access protected to the rules of the DSL I put here

This method returns a value of type ABCD

The toolkit

Reflection APIs

PHP core delivers 2.5 key APIs for meta programming. The first one is ext/reflection. You can create reflection classes form a lot of things, functions, classes, extensions and use them to make programming assumptions about the APIs you are introspecting.

A simple example to find out the number of required parameters for each method in the class DirectoryIterator:

Refection is all nice and shiny, except when you don’t want to include everything you want to inspect. This is of interest if you inspect various source trees at once that declare duplicate symbols. To do so, there is PHP-Token-Reflection by Ondřej Nešpor. It’s a pretty nifty replacement for ext/reflection completely built in user land and on top of ext/tokenizer that even copes with invalid declarations. Additionally it fixes some oddities of the internal reflection API but tries to keep it as close as possible. I’ve played around with it a bit and I quite like it.

Tokenizer

Another core API, this time much more low level, is ext/tokenizer. If enabled at compile time it allows you to parse PHP source code into a list of tokens. Because the API is so low level it is quite hard to use without a proper abstraction layer on top of it. Most of the successful projects built upon ext/tokenizer have built one. One of them is phpcs by Greg Sherwood that built an Token Stream abstraction on top of ext/tokenizer that allows much more convenient navigation in the token stream. Another one shipping its own token stream abstraction is pdepend by Manuel Pichler. Another noteworthy, standalone abstraction is php-manipulator.
For an example on how the raw API can be used, I once wrote this little script to apply a few transformations to source trees to ease converting source trees to PHP 5.4.

PHP Parser: a fully fledged AST parser for PHP

Between a high level API like Reflection and a low level API like ext/tokenizer there surely is a gap: what if I want to work on an AST data structure. There is this beautiful project PHP-Parser by Nikita Popov. This is quite interesting for more complex transformations like user space AOP, all kinds of static code analysis and so on. If ext/tokenizer feels way underpowered, have a look at this project.

Aspect oriented programming

While we are talking about AOP: a relative newcomer is PECL AOP that provides a quite simple API for aspect orientated programming in PHP. For Zend Framework 2 there is also an AOP module available. Let’s stick to AOP for a moment: for Symfony 2 there is JMSAopBundle by Johannes Schmitt. It provides basic AOP functionality for Symfony 2. JMSSecurityExtraBundle and JMSDiExtraBundle use it to provide annotation support for Symfony security bundle and the Symfony dependency injection component.

Metadata management

Traditionally, every docblock documentation parser rolled it’s own annotation system. This changed a little with the rise of Symfony and Doctrine 2. Doctrine 2 allows you to use annotations for persistence definition and Symfony allows you to use annotations for a lot of things (routes, security, etc.). While Doctrine still ships it’s own metadata handling component in doctrine-common, there is another library by Johannes Schmitt, Metadata that aims to consolidate metadata handling for PHP. The API of the Metadata library as well as the one of doctrine-common is quite simple: you have some sort of annotation reader that maps metadata information to classes. Think about this annotation:

This kind of annotation will map to an instance of MyAnnotationSome with the property $foo set to “bar”.

Radioactive, specialized or obscure

Ever dreamed of renaming functions, redeclaring classes and so on? Let us not discuss whether this is a good idea or not, but if you would like, look no further: there is runkit for that (I think this is the most current fork).

The future

PHP core itself could really use native support for annotations. This would fix little differences in how annotations are used nowadays by major projects. Another very interesting development is quite definitely PHPAOP. I would consider that a candidate for core inclusion at some point.

The userland libraries could see some consolidation and now that we have composer dependency management isn’t so much of a problem. Especially in the Symfony 2 world, reusing the same metadata framework would make totally sense. A first step is that Zend Framework 2 uses doctrine-common for annotations support.

June252012

Learned a lot about HAProxy and keepalived and going to replace diverse load balancing solutions for different protocols with a single solution

Learned to love RabbitMQ (Erlang, mnesia, durable queues, nice failover mechanisms) and reimplemented our internal mailing infrastructure. Lot of work but it looks good so far. While doing that we fixed a number of issues (#62410, #62411, #62412) with PECL amqp and Swiftmailer

Got four new machines up and running and (for the better part) into production.

Learned more about how to structure our site navigation-wise in the future

Where we are

We still use Drupal as a Content Repository and just consume it’s content data via webservices to let our application do the complicated rendering. We launched the external part of our content in November 2011 (see our Expat Magazine and our Country & City Guides) and the internal part in December (you need to request an account to see it). Development of both parts was smooth, but we reached some limits of what one can do with Drupal’s view module and we needed to adjust our “no custom code” to “as little custom code as possible”.

October172011

As one of my first projects at InterNations we want to introduce rich content management functionality for internal usage. We have a custom made PHP application and want to publish a bunch of content to provide our customers with an even richer experience and greater service. Our requirements can be read along the lines of:

Provide an easy to use interface for content and media management for our editorial team

May062011

Introduce Parameter

When configuring objects you will stumble upon occurrences of duplicated configuration. As configuration duplication is as bad as code duplication, making refactorings and maintenance time-intense and error-prone, we try to avoid them. Occurrences I had, started from defining the same hosts over and over for different services and quasi hard-coded upload prefixes for files sprinkled all over my configuration. I will illustrate this refactoring with the image upload example. We configure Zend_File_Transfer and add a few validators to allow image uploads but only specific ones:

When adding validators to Zend_File_Transfer the fourth argument (in this case photo) is the name of the array key of the file. In our case the markup would look like this:

<input type="file" name="photo"/>

The specific key is important if you allow the upload of various file types in one request. Now we change the requirements and allow not only photos but photos and PDFs (in the same input as photos, so that the user does not need to use different inputs based on file formats). To not mislead the next programmer working on this piece of code, we should change the markup to something like this (give me a better name please):

<input type="file" name="photoOrPdf"/>

Now we open our container configuration and change every occurrence of “photo” to “photoOrPdf” and hope not to forget one. Except the one you’ll find out two month later. To avoid this duplication of configuration, we introduce a parameter and our container configuration changes.

To make things even more smooth we could inject that parameter into the view and into the controller to make sure, configuration value duplication is no longer an issue with this specific module.

Parametererize Service

Excluded, as I no longer think this is actually a good idea.

Allow Environment Specific configuration

When you have a development process where you pass several acceptance stages before an artefact goes into production, these stages are typically slightly different from each other. Starting from different service IP addresses over single machine vs. multi machine, there will definitely be some variance among them. Typical variances are:

Caching: no caching on development, caching enabled on testing and production stages

Code generation and building: “rebuild on request” on development, once per deployment on testing and production

One way to do so is to sprinkle conditions all over your application and check on which host you are but that will lead to an application well beyond manageability. That’s why I was never happy (at least for large applications >100 person-days) with typical PHP application configurations like the preposterous config.inc.php. Having a touring complete programming language at hand for configuration will eventually introduce ugly conditionals making configurations unreadable. But I digress.

There are various models for stage configuration, including inheritance from each former stage, inheritance from a main configuration, standalone configuration and all mixes of these models. All of them are well implementable with the Symfony 2 dependency injection container. Let’s start with the most simplistic one, standalone configuration for each stage:

April222011

I just released 0.9.0 of PECL mogilefs. This is release comes with a few but small API breaks. Basically whenever there was no open connection, we returned false in the past. We no longer do that, instead we throw an exception of type MogileFsException. So the API breakage will be fairly visible. The complete list of changes:

Adding new methods setReadTimeout(float readTimeout) and getReadTimeout(). This can be used to set a differing read timeout to the connect timeout. In the past releases, the connect timeout (to the tracker) was used as a read timeout (to the storage nodes). From my experience the read timeout should be a little bit higher than the connect timeout.

Remove PHP max version limit so we no longer have to release a new version when PHP is released. This is what other PECL packages are doing, so I think this will work better.

Comply with stricter c99 standard. Yeah, nerd stuff

Fixed tests and made them more robust. Try them: PHP_TEST_EXECUTABLE=<php> php tests.php

Optimized mogilefs_sock_read() and introduced maximum message size (based on a patch from Andre Pascha of kwick.de). Less allocs, less frees. Good stuff

MogileFs::put() throws more exceptions: as said before

Comments, ideas, patches and anything else are more than welcome. Have fun with this release.

April192011

Working heavily with the Symfony2 Dependency Injection Container, I feel that we found some typical refactorings towards a DI container that emerge during the introduction of such a component. I want to write down the preliminary results of trying to systematize more or less as a draft. I will use the Symfony2 DI container configuration as an example but most of the refactorings should be applicable to other containers as well, some of them even to dependency injection without a container.

Make Dependency Explicit

This is typically the first step towards Dependency Injection: make a dependency explicit. There are three typical ways to do so, first is constructor injection, second is setter injection and third and less preferred is property injection. I roughly prefer constructor injection for invariant dependency in my domain and setter injection for infrastructure (setNotifier e.g.). Consider this example:

Client creates a new instance of Dependency and call execute(). Bad for testing and for configuration, Dependency will always be hard coded there. To make it easier manageable we refactor towards setter injection:

We see that the dependency is explicit: we specifically configure Example\Client and pass a specific Example\Dependency object.

Introduce Interface Injection

After a number of Explicit Dependency refactorings our configuration file for the service container will become huge. We will notice that we have common dependencies that are used at various places, an event manager for example. To fix that rapid growth we choose to utilize Interface Injection to ease configuration.

We notice that both Example\Client and Example\AnotherClient depend on Example\Dependency. First of all we need an interface contracting setDependency. This is basically the Extract Interface refactoring. We call the newly extracted interface Example\DependencyAware.

February192011

On Wednesday I 0.8.1 of PECL MogileFs has been released. The new version features a few important changes and fixes:

Changing timeout parameter for MogileFs::connect() to float to allow specifying microseconds. This is an important change if you want to do connection pooling for your trackers in PHP. You can now limit the time the client tries to connect to a tracker and connect to an alternative one if this fails

Connect timeout does not set read timeout. This change became necessary with the better connect timeout handling and is the whole reason there is a 0.8.1. The previous assumption was to reuse the connect timeout as read timeout. This is no longer feasible. If somebody needs the functionality of setting a specific read timeout, I would be happy to implement that as a specific option though. I personally have no use for it.

Fixing arginfo for MogileFs::put(). You dawg, I’ve heard you like reflections. So I’ve put some reflection into your reflection so you can reflect while you reflect

September052010

Ran into a bug yesterday, where http://pecl.php.net/uuid in combination with http://pecl.php.net/imagick yielded a segfault when using uuid_create(). GDB backtrace looks like this (without the exact place where it happens in libuuid, as there is unfortunatly no libuuid1-dbg-package in current Ubuntu versions):

For whatever reason this is happening, this is most likely the root cause of the issue.

Solution (sort of)

pecl/uuid was loaded by /etc/php5/conf.d/uuid.ini and pecl/imagick by /etc/php5/conf.d/imagick.ini. As they are loaded in there alphabetical order, imagick initialized before uuid. Renaming /etc/php5/conf.d/uuid.ini to /etc/php5/conf.d/00-uuid.ini fixed the issue, as uuid is than initialized before imagick and the segmentation fault was gone.
Not sure about that, but maybe it would be a good idea to check in PHP_MINIT(uuid) in pecl/uuid if pecl/imagick has been initialized before and warn the user about it?

August012010

The Problem

For a project I need non-guessable synthetic primary keys. I will use them to construct URIs and these URIs need to be non-guessable. If I would use the traditional way of doing so, going the down the route of integer primary keys with auto increments, or using a sequence table an attacker could easily increment or decrement the integer to find some similar items. Next idea was to use UUIDs or GUIDs. These identifiers are globally unique, so this would work for primary keys too. Reading some documentation on the topic brought up the interesting issue of space usage. Storing the UUIDs in a CHAR column would be a huge waste of space compared to an integer primary key. As primary keys are referenced in related table, this would be a huge issue. Finally I found a trick storing there binary representation in a BINARY column. Doing that in MySQL is fairly easy:

INSERT INTO items SET id = UNHEX(REPLACE(UUID(), '-', '');

Selecting a human readable reasult is easy too:

SELECT HEX(id) FROM items;

Achieving the same thing in PHP is pretty straightforward too. You need the PECL extension UUID (pecl install uuid) and pack()/unpack():

Doctrine2 integration

Next step would be integration with Doctrine2. To do so, we need to create a custom mapping type. I’m not using Doctrine2 for database abstraction, but for it’s object relational mapping capabilities so I ignore portability and concentrate on MySQL.

One issue I stumbled upon was the default Doctrine2 does. With MySQL it maps binary types to intermediate blob types (in the Doctrine2 type system). This default behavior is not configurable, so we need to patch Doctrine\DBAL\Schema\MySqlSchemaManager. I’m sure there is a more elegant way and I would love to receive some remarks here:

The important part here is the createUuid()-method to generate the UUID once before persisting the domain object. With GeneratedValue(strategy="NONE") we told Doctrine not to generate the ID by itself and with HasLifecycleCallbacks we configure Doctrine to scan for lifecycle callback methods, so that generateUuid() will be called before persisting the entity.

Fetching an object by ID is as easy as ever, but don’t forget to convert the ID: