Posts

When I write PHP-CLI applications or when I use PHP for scripting only, I often find myself in the situation to create a safe block.

The block is basically a try-catch block that encapsulates the whole code.
In PHP-CLI applications it oftens allows for retrying of certain things via fgets(STDIN).
In scripting applications it is mostly an error mail sending and ROLLBACK command for the database.

If the project I am working on is non framework-based but spans multiple files, I often register error handlers to get alerted when anything doesn’t work as expected. My classic code looks about like this:

<?php

function bindCustomErrorHandler() {

// We should use our custom function to handle errors.
error_reporting(-1);
ini_set('display_startup_errors', 1);

It is very little commented, but I think it is self-explanatory. Basically for all errors that can be detected from inside PHP I register a handler that tells me, that an error occured.
If possible I attach dumps and tracebacks. For E_NOTICE I continue processing.

One important part I learned is to unregister the error_handler when sending the mail, but actually rather suppress all errors. When using 3rd-Party libraries here an unexpected E_NOTICE may come in unhandy 😉

What this code can’t rescue is errors that lead to HTTP 500 errors. Therefore: Always watch your logs!

The new way in PHP 7

It is always better however to catch errors where they occur. Here you can attach better dumps and be more in context.

In PHP 7 this is now also possible for runtime errors like accessing an element in an unexpected way (json_decode[‘x’] I am looking at you!) or having syntax errors in included files.

Introduction

Recently I found myself AGAIN with the problem to validate an email address in PHP.

You might think that this problem has been addressed a million times before and there is a fool proof solution out there.

One of the best results google yields is certainly this stackoverflow article.
It gives you a very good outline why using a simple regex is most languages not a solution to comply with the very complex format outlined in the RFC 5321.

However PCREs (Perl Compatible Regular Expressions) which are available in PHP can do the job according to the stackoverflow post. In the article the following post is linked: http://ex-parrot.com/~pdw/Mail-RFC822-Address.html

As you can see it contains a ridiulous regex, which, as the author states, has been autogenerated. When I tried to run the regex PHP complained with the error:Warning: preg_match(): Compilation failed: unmatched parentheses at offset 1551 in

The other provided also the original PERL module, which is pretty easy to compile. I tried that too, more on the results later.

Other promising approaches was http://isemail.info/about , where the author has written a very easy to integrate PHP function that basically uses a for loop to iterate over the email address and validate it according to his implementation of the RFC 5321.
This class can also do DNS lookups on the domain to check if an MX record is present.

The final most promising approach was a PHP state machine extracted from the PHP-CMS Barebone (Ultimate email toolkit) which also could give suggestions on what the correct email address could be, if the validation failed.
You could use such a class to be very forthcoming to the user.

Example: User types “arne.tarara.@googlemail.com” – A classic!

The class would then yield “Validation failed! Suggestion: arne.tarara@googlemail.com”

Also the class could do DNS lookups, so basically a one-size-fits-all solution!

Alright, let’s dive into the the Tests.

Testing Results

Candidates

Basic PCRE, often used ‘/^([a-zA-Z0-9_\-\+\~\^\{\}]+[\.]?)+@{1}([a-zA-Z0-9_\-\+\~\^\{\}]+[\.]?)+\.[A-Za-z0-9]{2,}$/’

Testing set

I used the tests bundled with is_email() on https://code.google.com/p/isemail/downloads/detail?name=is_email-3.01.zip&can=2&q= and I also used the tests bundled with the perl module on http://ex-parrot.com/~pdw/Mail-RFC822-Address.html

Both did contain negatives and positives. However, I had to eliminate all the positvies from the perl module, because it claimed “abigail@example.com ” (note the space at the end) to be a valid email address, which it is clearly not. Other cases where “abigail @example.com” etc.
The most confusing case was: “*()@[]” => How is that a valid email? the local part, well maybe, but the hostname?

Please note: I indexed the array with “true” for valid emails and “false” for invalid.
Since not all libraries had DNS checking available in the “true” array are some emails, which are syntactically correct, but do NOT have a matching MX record.

Basic Regex

This basic regex fails to validate the new TLD-less domain names such as “test@io” and also does not like domains such as “test@iana.a”. Other domains with double-dots occur as false-positives: “test@iana..com”

True, “.a” is not a valid TLD at the moment, but it can be. So it would be better to leave this check to MX-Record matching …

Also “!#$%&`*+/=?^`{|}~@iana.org” failed, which is syntactically correct according to the testset from http://isemail.info/about

However, it can handle IPV6 guys like “test@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888]” without choking.

PHP internal function filter_var()

The accuracy is way better than the simple PCRE. It fails on less, but the same addresses that the simple PCRE could not handle, such as “test@[IPv6:1111:2222:3333:4444:5555:6666::8888]” and
The filter_var() function fails on IPv6 addresses such as “test@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888]” and “test@[IPv6:::]”.
Also it cannot handle escaped spaces such as here: “”test\ test”@iana.org” and “test@io”

Also it accounts false-positives such as “test@org”, which is invalid.

Another blog made a really more in-depth test of this function and other regex functions: http://fightingforalostcause.net/content/misc/2006/compare-email-regex.php

The author states, that the filter_var function is the best he got in his database. However, I could not find his testset for an easy download, so I really could not compare with his data.
Still, it seems that this is a very solid regex.

PHP state machine for email testing. Extracted from Barebone CMS

https://barebonescms.com/documentation/ultimate_email_toolkit/

Well, no data? Japp.

The problem is, the function gets into an infinite loop while parsing the address "test\ test"@iana.org

Looking at the source-code line, an unterminating while loop is the culprit:

It may be due to the fact, that the test are build for RFC. The module in PERL however is built for RFC822

Problem: This is not correct. According to isemail.info and the wikipedia RFC5322 is the right place to look (http://en.wikipedia.org/wiki/Email_address)

I am not going into detail which of the addresses failed and which not, cause the result is too poor.

Results:

Keep it simple: The winner is is_email() from isemail.info

Conclusion and Usage

isemail.info provides a solid PHP based email testing class in accordance to the RFC5322, which is the relevant RFC for SMTP email.

The class has a light footprint and is easy to use. Also it can do DNS lookups and integrate the result of a found MX or not found MX into the result of the is_email() function.

As theoretically perfect this result is, there is a drawback in the implementation of the function itself.
It uses the internal PHP function dns_get_record() to make the DNS check. If that function is not available (because it is not compiled with) the check is simply omitted.

The problem with “dns_get_record()” is that it uses underlying mechanics of the OS and does not support a PHP stream to mingle with. This results in the problem, that you cannot set a timeout for the function call.

So let’s say you got an unresponsive DNS the call will run forever without hitting a timeout. I tested this myself by setting up a DNS on my localhost that keeps the connection open forever.

The result was, that the function did not return. Also using set_time_limit(1) did not make any effect, since system calls do NOT ACCOUNT for running time of the script.

As good as “is_email()” is for making syntactical check, it is as bad when using it in a real life scenario. If you want to live-check an email, lets say in a webshop, the page gets unresponsive and the user will eventually abort the registration!

The golden solution: Patching is_email()

I decided to patch the is_email() class and integrate the well known NET_DNS2 class (http://pear.php.net/package/Net_DNS2)

This class uses a 5 second timeout and can even be more graceful if the domain does have an A record, but no MX.

Since all the classes were under BSD license I attached my modded class along with a short usage example.

Example:

However some legacy webspaces still have this feature, and also open_basedir is often active.

When using cURL this may be some kind of a bummer, because it prevents you from following redirects. This may be due to the fact, that cURL as a native extension would then be able to follow symlinks in the filesystem and access files which it should not be allowed to do.

The problem

You encounter the following error:

curl_setopt_array() [function.curl-setopt-array]: CURLOPT_FOLLOWLOCATION cannot be activated when safe_mode is enabled or an open_basedir is set

The solution

You will find many great approaches to circumenventing this feature when it comes to HTTP connections.
This is done by parsing the Location header directly from the returned data and issuing a new request.

On php.net you will find many solutions. While some are broken, many also work. However, I could not find a solution that worked for my problem:

I wanted to follow a redirect on a site that needed cookies and needed a correct switch of the request-method from POST to GET.

Typically the circumventions copy the cURL-handle which makes it loose the cookies. Also they do no reset the request type, as normal browsers do it.

The code

This code worked for me. Hopefully it works for you.

Note that this code is an improvement to the code from zsalab orgininally posted on php.net

Recently I made a new campaign using the german Yahoo! Sponsored Search.

What I did expect is the Ad to be delivered on de.search.yahoo.com as well as on bing.com, cause this is what the Search Alliance is all about.

As well all love it on Yahoo, the Ads cannot be blocked from showing on weird third-party sites, who Yahoo! thinks are interesting for my campaign. Google allows the restriction to only Google Search.

Have a look at the screenshot and see what I mean. Do you think the campaign I’m running on Yahoo! is interesting for customers on erkaltung.com (German for common cold) AND consultdomain.de? And why on earth is my site showing on buzzdock.com although the campaign is targeted for Germany? Probably, we will never know …

Yahoo! Sponsored Search Metrics

However, what confused me the most is that Google.de is showing as a URL where the Ad got delivered. I do know about the Yahoo-Google Advertising Agreement, but I thought this only means that Ads from Google can get delievered on Yahoo! not the other way round?

Does this mean that I can place the over-long ads from Yahoo! on Google Search?

Post a comment under this post if you got any idea why this is happening.

just wanted to keep you up to date, that the Wunderlist project for Blackberry has not been droppped.

Currently I am facing major issues in deploying the application the Blackberry. It seems to me like the CodeSigning-Servers from RIM are still not really working as they should.

Only 1 out of 10 signing attempts works in my virtual machine and I currently resigned to develop further using my Virtual Machine.

Only 1 out of 10 attempts can complete the request to the signing servers

In November I will be getting my new Windows 7 Machine and will restart working on this project. Till then it has to be freezed, cause at the moment it is the pure HORROR to develop for Blackberry.

Why do I have to sign an application that I want to run on my own phone, and why is it only possible to work under Windows. RIM I am telling you, if you make it so hard for developers you will definitely loose you place in the market.

Hopefully development will go easier on my Windows Machine in November. I will keep you guys up to date.

Maybe some of you are using Wunderlist on OS X or on any other supported platform.

So am I, and I can just say that I love it since it is free, and is very good for implementing GTD.

My last GTD tool, iGTD has really gotten a bit old, since its Sync options are totally weird. Syncing with my iCal and then syncing to my Blackberry Bold 9780 never really worked. Sometimes I got the appointments and tasks twice, sometimes they were in the wrong calender and so on …

Wundelist does a great job, but it is still lacking of Blackberry support. Since I love Blackberry and certainly don’t wanna move to an iPhone or even pay for it, I decided to develop my own App that runs on Blackberry and can sync my Wunderlist data.

Until now, only the roadmap has been set, but it is fairly straight-forward. If you wanna download the plugin already … well you are quite to early. But check back later, or catch my RSS and you will be informed when the plugin is ready. You can also post a comment to this post, and I will eMail you when it is ready.

For all you interested guys, here is how the plugin will work:

Syncing the wunderlist.db via Dropbox

Reading the wunderlist.db via native HTML5 database support

Blackberry integration via PhoneGap

There are quite some steps to go, but the proof of concept has already been done. In the first version that I will be releasing next week, the plugin can only read data from the wunderlist.db. In later versions you will be able to make new notes to your Inbox.

Implementing the whole category, tag aso. stuff will never be implemented by me, cause I think this functionality does the trick when you are on your mobile. The organization of the tasks can still be done @ home.