I'm a web developer in Norfolk. This is my blog...

Recently I had the occasion to check out MariaDB’s implementation of full-text search. As it’s a relatively recent arrival in MySQL and MariaDB, it doesn’t seem to get all that much attention. In this post I’ll show you how to use it, with a few Laravel-specific pointers. We’ll be using the default User model in a new Laravel installation, which has columns for name and email.

Our first task is to create the fulltext index, which is necessary to perform the query. Run the following command:

ALTERTABLEusersADD FULLTEXT (name, email);

As you can see, we can specify multiple columns in our table to index.

If you’re using Laravel, you’ll want to create the following migration for this:

<?php

useIlluminate\Support\Facades\Schema;

useIlluminate\Database\Schema\Blueprint;

useIlluminate\Database\Migrations\Migration;

classAddFulltextIndexForUsersextendsMigration

{

/**

* Run the migrations.

*

* @return void

*/

publicfunctionup()

{

DB::statement('ALTER TABLE users ADD FULLTEXT(name, email)');

}

/**

* Reverse the migrations.

*

* @return void

*/

publicfunctiondown()

{

DB::statement('ALTER TABLE users DROP INDEX IF EXISTS name');

}

}

Note that the index is named after the first field passed to it, so when we drop it we refer to it as name. Then, to actually query the index, you should run a command something like this:

Note that NATURAL LANGUAGE MODE is actually the default, so you can leave it off if you wish. We also have to specify the columns to match against.

If you’re using Laravel, you may want to create a reusable local scope for it:

publicfunctionscopeSearch($query, $search)

{

if (!$search) {

return $query;

}

return $query->whereRaw('MATCH(name, email) AGAINST (?)', [$search]);

}

Then you can call it as follows:

User::search('jeff')->get();

I personally have noticed that the query using the MATCH keywords seems to be far more performant, with the response time being between five and ten times less than a similar command using LIKE, however this observation isn’t very scientific (plus, we are talking about queries that still run in a fraction of a second). However, if you’re doing a particularly expensive query that currently uses a LIKE statement, it’s possible you may get better results by switching to a MATCH statement. Full-text search probably isn’t all that useful in this context - it’s only once we’re talking about longer text, such as blog posts, that some of the advantages like support for stopwords comes into play.

From what I’ve seen this implementation of full-text search is a lot simpler than in PostgreSQL, which has ups and downs. On the one hand, it’s a lot easier to implement, but conversely it’s less useful - there’s no obvious way to perform a full-text search against joined tables. However, it does seem to be superior to using a LIKE statement, so it’s probably a good fit for smaller sites where something like Elasticsearch would be overkill.

PHP isn’t the first language that springs to mind when it comes to machine learning. However, it is practical to use PHP for machine learning purposes. In this tutorial I’ll show you how to build a pipeline for classifying letters.

The brief

Before I was a web dev, I was a clerical worker for an FTSE-100 insurance company, doing a lot of work that nowadays is possible to automate away, if you know how. When they received a letter or other communication from a client, it would be sent to be scanned on. Once scanned, a human would have to look at it to classify it, eg was it a complaint, a request for information, a request for a quote, or something else, as well as assign it to a policy number. Let’s imagine we’ve been asked to build a proof of concept for automating this process. This is a good example of a real-world problem that machine learning can help with.

As this is a proof of concept we aren’t looking to build a web app for this - for simplicity’s sake this will be a command-line application. Unlike emails, letters don’t come in an easily machine-readable format, so we will be receiving them as PDF files (since they would have been scanned on, this is a reasonable assumption). Feel free to mock up your own example letters using your own classifications, but I will be classifying letters into four groups:

Complaints - letters expressing dissatisfaction

Information requests - letters requesting general information

Surrender quotes - letters requesting a surrender quote

Surrender forms - letters requesting surrender forms

Our application will therefore take in a PDF file at one end, and perform the following actions on it:

Convert the PDF file to a PNG file

Use OCR (optical character recognition) to convert the letter to plain text

Strip out unwanted whitespace

Extract any visible policy number from the text

Use a machine learning library to classify the letter, having taught it using prior examples

Sound interesting? Let’s get started…

Introducing pipelines

As our application will be carrying out a series of discrete steps on our data, it makes sense to use the pipeline pattern for this project. Fortunately, the PHP League have produced a excellent package implementing this. We can therefore create a single class for each step in the process and have it handle that in isolation.

We’ll also use the Symfony Console component to implement our command-line application. For our machine learning library we will be using PHP ML, which requires PHP 7.1 or greater. For OCR, we will be using Tesseract, so you will need to install the underlying Tesseract OCR library, as well as support for your language. On Ubuntu you can install these as follows:

$ sudo apt-get install tesseract-ocr tesseract-ocr-eng

This assumes you are using English, however you should be able to find packages to support many other languages. Finally, we need ImageMagick to be installed in order to convert PDF files to PNG’s.

Your composer.json should look something like this:

{

"name": "matthewbdaly/letter-classifier",

"description": "Demo of classifying letters in PHP",

"type": "project",

"require": {

"league/pipeline": "^0.3.0",

"thiagoalessio/tesseract_ocr": "^2.2",

"php-ai/php-ml": "^0.6.2",

"symfony/console": "^4.0"

},

"require-dev": {

"phpspec/phpspec": "^4.3",

"psy/psysh": "^0.8.17"

},

"autoload": {

"psr-4": {

"Matthewbdaly\\LetterClassifier\\": "src/"

}

},

"license": "MIT",

"authors": [

{

"name": "Matthew Daly",

"email": "matthewbdaly@gmail.com"

}

]

}

Next, let’s write the outline of our command-line client. We’ll load a single class for our processor command. Save this as app:

Note how our command accepts the file name as an argument. We then instantiate our pipeline and pass it through a series of classes, each of which has a single role. Finally, we retrieve our response and output it.

With that done, we can move on to implementing our first step. Save this as src/Stages/ConvertPdfToPng.php:

<?php

namespaceMatthewbdaly\LetterClassifier\Stages;

useImagick;

classConvertPdfToPng

{

publicfunction__invoke($file)

{

$tmp = tmpfile();

$uri = stream_get_meta_data($tmp)['uri'];

$img = new Imagick();

$img->setResolution(300, 300);

$img->readImage($file);

$img->setImageDepth(8);

$img->setImageFormat('png');

$img->writeImage($uri);

return $tmp;

}

}

This stage fetches the file passed through, and converts it into a PNG file, stores it as a temporary file, and returns a reference to it. The output of this stage will then form the input of the next. This is how pipelines work, and it makes it easy to break up a complex process into multiple steps that can be reused in different places, facilitating easier code reuse and making your code simpler to understand and reason about.

Our next step carries out optical character recognition. Save this as src/Stages/ReadFile.php:

<?php

namespaceMatthewbdaly\LetterClassifier\Stages;

usethiagoalessio\TesseractOCR\TesseractOCR;

classReadFile

{

publicfunction__invoke($file)

{

$uri = stream_get_meta_data($file)['uri'];

$ocr = new TesseractOCR($uri);

return $ocr->lang('eng')->run();

}

}

As you can see, this accepts the link to the temporary file as an argument, and runs Tesseract on it to retrieve the text. Note that we specify a language of eng - if you want to use a language other than English, you should specify it here.

At this point, we should have some usable text, but there may be unknown amounts of whitespace, so our next step uses a regex to strip them out. Save this as src/Stages/StripTabs.php:

<?php

namespaceMatthewbdaly\LetterClassifier\Stages;

classStripTabs

{

publicfunction__invoke($content)

{

return trim(preg_replace('/\s+/', ' ', $content));

}

}

With our whitespace issue sorted out, we now need to retrieve the policy number the communication should be filed under. These are generally regular alphanumeric patterns, so regexes are a suitable way of matching them. As this is a proof of concept, we’ll assume a very simple pattern for policy numbers in that they will consist of between seven and nine digits. Save this as src/Stages/GetPolicyNumber.php:

<?php

namespaceMatthewbdaly\LetterClassifier\Stages;

classGetPolicyNumber

{

publicfunction__invoke($content)

{

$matches = [];

$policyNumber = '';

preg_match('/\d{7,9}/', $content, $matches);

if (count($matches)) {

$policyNumber = $matches[0];

}

return [

'content' => $content,

'policy' => $policyNumber

];

}

}

Finally, we’re onto the really tough part - using machine learning to classify the letters. Save this as src/Stages/Classify.php:

In our constructor, we train up our model by passing our sample data through the following steps:

First, we use the token count vectorizer to convert our samples to a vector of token counts - replacing every word with a number and keeping track of how often that word occurs.

Next, we use TfIdfTransformer to get statistics about how important a word is in a document.

Then we instantiate our classifier and train it on a random subset of our data.

Finally, we pass our message to our now-trained classifier and see what it tells us.

Now, bear in mind I don’t have a background in machine learning and this is the first time I’ve done anything with machine learning, so I can’t tell you much more than that - if you want to know more I suggest you investigate on your own. In figuring this out I was helped a great deal by this article on Sitepoint, so you might want to start there.

The finished application is on GitHub, and the repository includes a CSV file of training data, as well as the examples folder, which contains some example PDF files. You can run it as follows:

$ php app process examples/Quote.pdf

I found that once I had trained it up using the CSV data from the repository, it was around 70-80% accurate, which isn’t bad at all considering the comparatively small size of the dataset. If this were genuinely being used in production, there would be an extremely large dataset of historical scanned letters to use for training purposes, so it wouldn’t be unreasonable to expect much better results under those circumstances.

Exercises for the reader

If you want to develop this concept further, here are some ideas:

We should be able to correct the model when it’s wrong. Add a separate command to train the model by passing through a file and specifying how it should be categorised, eg php app train File.pdf quote.

Try processing information from different sources. For instance, you could replace the first two stages with a stage that pulls all unread emails from a specified mailbox using PHP’s IMAP support, or fetching data from the Twitter API. Or you could have a telephony service such as Twilio set up as your voicemail, and automatically transcribe them, then pass the text to PHP ML for classification.

If you’re multilingual, you could try adding a step to sort letters by language and have separate models for classifying in each language

Summary

It’s actually quite a sobering thought that already it’s possible to use techniques like these to produce tools that replace people in various jobs, and as the tooling matures more and more tasks involving classification are going to become amenable to automation using machine learning.

This was my first experience with machine learning and it’s been very interesting for me to solve a real-world problem with it. I hope it gives you some ideas about how you could use it too.

Recently I’ve had the occasion to add a series of console commands to a legacy application. This can be made straightforward by using the Symfony console component. In this post I’ll demonstrate how to write a simple console command for clearing a cache folder.

The first step is to install the Console component:

$ composer require symfony/console

Then we write the main script for the application. I usually save mine as console - note that we don’t want to have to type out a file extension, so instead we use the shebang:

#!/user/bin/env php

<?php

require__DIR__.'/vendor/autoload.php';

useSymfony\Component\Console\Application;

define('CONSOLE_ROOT', __DIR__);

$app = new Application();

$app->run();

In this case, I’ve defined CONSOLE_ROOT as the directory in which the console command is run - that way, the commands can use it to refer to the application root.

We can then run our console application as follows:

$ php console

Console Tool

Usage:

command [options] [arguments]

Options:

-h, --help Display this help message

-q, --quiet Do not output any message

-V, --version Display this application version

--ansi Force ANSI output

--no-ansi Disable ANSI output

-n, --no-interaction Do not ask any interactive question

-v|vv|vvv, --verbose Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug

Available commands:

help Displays helpfor a command

list Lists commands

This displays the available commands, but you’ll note that there are none except for help and list. We’ll remedy that. First, we’ll register a command:

$app->add(new App\Console\ClearCacheCommand);

This has to be done in console, after we create $app, but before we run it.

Don’t forget to update the autoload section of your composer.json to register the namespace:

"autoload": {

"psr-4": {

"App\\Console\\": "src/Console/"

}

},

Then create the class for that command. This class must extend Symfony\Component\Console\Command\Command, and must have two methods:

configure()

execute()

In addition, the execute() method must accept two arguments, an instance of Symfony\Component\Console\Input\InputInterface, and an instance of Symfony\Component\Console\Output\OutputInterface. There are used to retrieve input and display output.

As you can see, in the configure() method, we set the name, description and help text for the command.

The execute() method is where the actual work is done. In this case, we have some code that needs to be called recursively, so we have to pull it out into a private method. Once that’s done we use $output->writeln() to write a line to the output.

Now, if we run our console task, we should see our new command:

$ php console

Console Tool

Usage:

command [options] [arguments]

Options:

-h, --help Display this help message

-q, --quiet Do not output any message

-V, --version Display this application version

--ansi Force ANSI output

--no-ansi Disable ANSI output

-n, --no-interaction Do not ask any interactive question

-v|vv|vvv, --verbose Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug

Available commands:

help Displays helpfor a command

list Lists commands

cache

cache:clear Clears the cache

And we can see it in action too:

$ php console cache:clear

Cache cleared

For commands that need to accept additional arguments, you can define them in the configure() method:

$this->addArgument('file', InputArgument::REQUIRED, 'Which file do you want to delete?')

Then, you can access it in the execute() method using InputInterface:

$file = $input->getArgument('file');

This tutorial is just skimming the surface of what you can do with the Symfony Console components - indeed, many other console interfaces, such as Laravel’s Artisan, are built on top of it. If you have a legacy application built in a framework that lacks any sort of console interface, such as CodeIgniter, then you can quite quickly produce basic console commands for working with that application. The documentation is very good, and with a little work you can soon have something up and running.

This was a bit of a weird post to write. It started out explaining how I resolved an issue years ago on a CodeIgniter site, but amended to work for Laravel. In the process, I realised it made sense to implement it as middleware, and I ended up pulling it out into a package. However, it’s still useful to understand the concept behind it, even if you prefer to just install the complete package, because your needs might be slightly different to mine.

On web development forums, it’s quite common to see variants of the following question:

How do I redirect a user on a mobile device to a mobile version of the site?

It’s quite surprising that this is still an issue that crops up. For many years, it’s been widely accepted that the correct solution for this problem is responsive design. However, there are ways in which this may not be adequate for certain applications. For instance, you may have an application where certain functionality only makes sense in a certain context, or your user interface may need to be optimised for specific environments.

The trouble is that a dedicated mobile site isn’t a good idea either. Among other things, it means that users can’t easily use the same bookmarks between desktop and mobile versions, and can result in at least some of the server-side logic being duplicated.

Fortunately, there is another way - dynamic serving allows you to render different content based on the user agent. You can also easily enable users to switch between desktop and mobile versions themselves if their client isn’t detected correctly or they just prefer the other one. I’ve implemented this years ago for a CodeIgniter site. Here’s how you might implement it in Laravel, although if you understand the principle behind it, it should be easy to adapt for any other framework.

Don’t try to implement mobile user agent detection yourself. Instead, find an implementation that’s actively maintained and install it with Composer. That way you can be reasonably sure that as new mobile devices come onto the market the package will detect them correctly as long as you keep it up to date. I would be inclined to go for Agent, since it has Laravel support baked in.

We could just use Agent to serve up different content based on the user agent. However, user agent strings are notoriously unreliable - if a new mobile device appears and it doesn’t show up correctly in Agent, users could find themselves forced to use the wrong UI. Instead, we need to check for a flag in the session that indicates if the session is mobile or not. If it’s not set, we set it based on the user agent. That way, if you need to offer functionality to override the detected session type, you can just update that session variable to correct that elsewhere in the application. I would be inclined to use a button in the footer that makes an AJAX request to toggle the flag, then reloads the page.

You also need to set the HTTP response header Vary: User-Agent to notify clients (including not only search engines, but also proxies at either end of the connection, such as Varnish or Squid) that the response will differ by user agent, in order to prevent users being served the wrong version.

Middleware is the obvious place to do this. Here’s a middleware that sets the session variable and the appropriate response headers:

<?php

namespaceApp\Http\Middleware;

useClosure;

useJenssegers\Agent\Agent;

useIlluminate\Contracts\Session\Session;

classDetectMobile

{

protected $agent;

protected $session;

publicfunction__construct(Agent $agent, Session $session)

{

$this->agent = $agent;

$this->session = $session;

}

/**

* Handle an incoming request.

*

* @param \Illuminate\Http\Request $request

* @param \Closure $next

* @return mixed

*/

publicfunctionhandle($request, Closure $next)

{

if (!$this->session->exists('mobile')) {

if ($this->agent->isMobile() || $this->agent->isTablet()) {

$this->session->put('mobile', true);

} else {

$this->session->put('mobile', false);

}

}

$response = $next($request);

return $response->setVary('User-Agent');

}

}

Now, you could then work with the session directly to retrieve the mobile flag, but as you may be working in the view, it makes sense to create helpers for this:

<?php

if (!function_exists('is_mobile')) {

functionis_mobile()

{

$session = app()->make('Illuminate\Contracts\Session\Session');

return $session->get('mobile') == true;

}

}

if (!function_exists('is_desktop')) {

functionis_desktop()

{

$session = app()->make('Illuminate\Contracts\Session\Session');

return $session->get('mobile') == false;

}

}

Now, if you want to serve up completely different views, you can use these helpers in your controllers. If you instead want to selectively show and hide parts of the UI based on the user agent, you can instead use these in the views to determine what parts of the page should be shown.

Agent offers more functionality than just detecting if a user agent is a mobile or desktop device, and you may find this useful as a starting point for developing middleware for detecting bots, or showing different content to users based on their device type or operating system. If you just need to detect if a user is a mobile or desktop client, this middleware should be sufficient.

I’m not going to sugarcoat it. As a developer, I think Wordpress is shit, and I’m not alone in that opinion. Its code base dates from a time before many of the developments of the last few years that have hugely improved PHP as a language, as well as the surrounding ecosystem such as Composer and PSR-FIG, and it’s likely it couldn’t adopt many of those without making backward-incompatible changes that would affect its own ecosystem of plugins and themes. It actively forces you to write code that is far less elegant and efficient than what you might write with a proper framework such as Laravel, and the quality of many of the plugins and themes around is dire.

Unfortunately, it’s also difficult to avoid. Over a quarter of all websites run Wordpress, and most developers will have to work with it at some point in their careers. However, there are ways that you can improve your experience when working with Wordpress somewhat. In this post I’m going to share some methods you can use to make Wordpress less painful to use.

This isn’t a post about the obvious things like “Use the most recent version of PHP you can”, “Use SSL”, “Install this plugin”, “Use Vagrant/Lando” etc - I’m assuming you already know stuff like that for bog standard Wordpress development. Nor is it about actually developing Wordpress plugins or themes. Instead, this post is about bringing your Wordpress development workflow more into line with how you develop with MVC frameworks like Laravel, so that you have a better experience working with and maintaining Wordpress sites. We can’t solve the fundamental issues with Wordpress, but we can take some steps to make it easier to work with.

Use Bedrock

The Wordpress core, plugins and themes can be managed with Composer for easier updates

The configuration can be done with a .env file that can be kept out of version control, rather than putting it in wp-config.php

The web root is isolated to limit access to the files

In short, it optimizes Wordpress for how modern developers work. Arguably that’s at the expense of site owners, since it makes it harder for non-developers to manage the site, however for any Wordpress site that’s sufficiently complex to need development work done that’s a trade-off worth making. I’ve been involved in projects where Wordpress got used alongside an MVC framework for some custom functionality, and in my experience it caused a world of problems when updating plugins and themes because version control would get out of sync, so moving that to use Composer to manage them instead would have been a huge win.

Using Bedrock means that if you have a parent theme you use all the time, or custom plugins of your own, you can install them using Composer by adding the Git repositories to your composer.json, making it easier to re-use functionality you’ve already developed. It also makes recovery easier in the event of the site being compromised, because the files outside the vendor directory will be in version control, and you can delete the vendor directory and re-run composer install to replace the rest. By comparison, with a regular Wordpress install, if it’s compromised you can’t always be certain you’ve got all of the files that have been changed. Also, keeping Wordpress up to date becomes a simple matter of running composer update regularly, verifying it hasn’t broken anything, and then deploying it to production.

Bedrock uses WPackagist, which regularly scans the Wordpress Subversion repository for plugins and themes, so at least for plugins and themes published on the Wordpress site, it’s easy to install them. Paid plugins may be more difficult - I’d be inclined to put those in a private Git repository and install them from there, although I’d be interested to know if anyone else uses another method for that.

If you can’t use Bedrock, use WP CLI

If for any reason you can’t use Bedrock for a site, then have a look at WP CLI. On the server, you can use it to install and manage both plugins and themes, as well as the Wordpress core.

It’s arguably even more useful locally, as it can be used to generate scaffolding for plugins, themes (including child themes based on an existing theme), and components such as custom post types or taxonomies. In short, if you do any non-trivial amount of development with Wordpress you’ll probably find a use for it. Even if you can use Bedrock, you’re likely to find WP CLI handy for the scaffolding.

Upgrade the password encryption

I said this wouldn’t be about using a particular plugin, but this one is too important. Wordpress’s password hashing still relies on MD5, which is far too weak to be considered safe. Unfortunately, Wordpress still supports PHP versions as old as 5.2, and until they drop it they can’t really switch to something more secure.

wp-password-bcrypt overrides the password functionality of Wordpress to use Bcrypt, which is what modern PHP applications use. As a result, the hashes are considerably stronger. Given that Wordpress is a common target for hackers, it’s prudent to ensure your website is as secure as you can possibly make it.

If you use Bedrock, it uses this plugin by default, so it’s already taken care of for you.

Use a proper templating system

PHP is a weird hybrid of a programming language and a templating system. As such, it’s all too easy to wind up with too much logic in your view layer, so it’s a good idea to use a proper templating system if you can. Unfortunately, Wordpress doesn’t support that out of the box.

However, there are some third-party solutions for this. Sage uses Laravel’s Blade templating system (and also comes with Webpack preconfigured), while Timber lets you use Twig.

Use the Wordpress REST API for AJAX where you can

Version 4.7 of Wordpress introduced the Wordpress REST API, allowing the data to be exposed via RESTful endpoints. As a result, it should now be possible to build more complex and powerful user interfaces for that data. For instance, if you were using Wordpress to build a site for listing items for sale, you could create a single-page web app for the front end using React.js and Redux, and use the API to submit it, then show the submitted items.

I’m not a fan of the idea the Wordpress developers seem to have of trying to make it some kind of all-singing, all-dancing universal platform for the web, and the REST API seems to be part of that idea, but it does make it a lot easier than it was in the past to do something a bit out of the ordinary with Wordpress. In some cases it might be worth using Wordpress as the backend for a headless CMS, and the REST API makes that a practical approach. For simpler applications that just need to make a few AJAX calls, using the REST API is generally going to be more elegant and practical than any other approach to AJAX with Wordpress. It’s never going to perform as well or be as elegant as a custom-built REST API, but it’s definitely a step forward compared to the hoops you used to have to jump through to handle AJAX requests in Wordpress.

Summary

Wordpress is, and will remain for the foreseeable future, a pain in the backside to develop for compared to something like Laravel, and I remain completely mystified by the number of people who seem to think it’s the greatest thing since sliced bread. However, it is possible to make things better if you know how - it’s just that some of this stuff seems to be relatively obscure. In particular, discovering Bedrock is potentially game-changing because it makes it so much easier to keep the site under version control.

About me

I'm a web and mobile app developer based in Norfolk. My skillset includes Python, PHP and Javascript, and I have extensive experience working with CodeIgniter, Laravel, Django, Phonegap and Angular.js.