Posts tagged with: php

I have been happily working as a self-employed semantic web developer for the last seven years. With steady progress, I dare to say, but the market is still evolving a little bit too slowly for me (well, at least here in Germany) and I can't invest any longer. So I am looking for new challenges and an employer who would like to utilize my web technology experience (semantic or not). I have created a new personal online profile with detailed information about me, my skills, and my work.

My dream job would be in the social and/or data web area, I'm particularly interested in front-end development for data-centric or stream-oriented environments. I also love implementing technical specifications (probably some gene defect).

The potential show-stopper: I can't really relocate, for private reasons. I am happy to (tele)commute or travel, though. And I am looking for a full-time employment (or a full-time, longer-term contract). I am already applying for jobs, mainly here in Düsseldorf so far, but I thought I'd send out this post as well. You never know :)

I created the extension file for SQLite3 (/etc/php.d/sqlite3.ini), added a pointer to the sqlite3.so, and now (after restarting Apache) SQLite 3 is available via PHP's PDO interface. Stack Overflow++ :)

Gave a talk and a workshop in NYC about SemWeb technologies for PHP developers

I'm back from New York, where I was given the great opportunity to talk about two of my favorite topics: Semantic Web Development with PHP, and (not necessarily semantic) Software Development using RDF Technology. I was especially looking forward to the second one, as that perspective is not only easier to understand for people from a software engineering context, but also because it is still a much neglected marketing "back-door": If RDF simplifies working with data in general (and it does), then we should not limit its use to semantic web apps. Broader data distribution and integration may naturally follow in a second or third step once people use the technology (so much for my contribution to Michael Hausenblas' list of RDF MalBest Practices ;)

The talk on Thursday at the NY Semantic Web Meetup was great fun. But the most impressive part of the event were the people there. A lot to learn from on this side of the pond. Not only very practical and professional, but also extremely positive and open. Almost felt like being invited to a family party.

The positive attitude was even true for the workshop, which I clearly could have made more effective. I didn't expect (but should have) that many people would come w/o a LAMP stack on their laptops, so we lost a lot of time setting up MAMP/LAMP/WAMP before we started hacking ARC, Trice, and SPARQL.

Marco brought up a number of illustrating use cases. He maintains an (inofficial, sorry, can't provide a pointer)RDF wrapper for any group on meetup.com, so the workshop participants could directly work with real data. We explored overlaps between different Meetup groups, the order in which people joined selected groups, inferred new triples from combined datasets via CONSTRUCT, and played with not-yet-standard SPARQL features like COUNT and LOAD.

And having done the workshop should finally give me the last kick to launch the Trice site now. The code is out, and it's apparently not too tricky to get started even when the documentation is still incomplete. Unfortunately, I have a strict "no more non-profits" directive, but I think Trice, despite being FOSS, will help me get some paid projects, so I'll squeeze an official launch in sometime soon-ish.

Below are the slides from the meetup. I added some screenshots, but they are probably still a bit boring without the actual demos (I think a video will be put up in a couple of days, though).

The Linked Data meme is spreading and we have strong indications that web developers who understand and know how to apply practical semantic web technologies will soon be in high demand. Not only in enterprise settings but increasingly for mainstream and agency-level projects where scripting languages like PHP are traditionally very popular.

I can't really afford travelling to promote the interesting possibilities around RDF and SPARQL for PHP coders, so I'm more than happy that Meetup master Marco Neumann offered me to come over to New York and give a talk at the Meetup on May 21st. Expect a fun mixture of "Getting started" hints, demos, and lessons learned. In order to make this trip possible, Marco is organizing a half-day workshop on May 22nd, where PHP developers will get a hands-on introduction to essential SemWeb technologies. I'm really looking forward to it (and big thanks to Marco).

So, if you are a PHP developer wondering about the possibilities of RDF, Linked Data & Co, come to the Meetup, and if you also want to get your hands dirty (or just help me pay the flight ticket ;) the workshop could be something for you, too. I'll arrive a few days earlier, by the way, in case you want to add another quaff:drankBeerWith triple to your FOAF file ;)

When it comes to server administration, I'm one of those rather incompetent persons who are used to running their apps on shared hosts, with FTP being the main deployment tool. This actually worked quite nicely for many years, but during the recent months I've slowly moved into areas where I need more powerful setups. I develop Semantic Web applications in pure PHP, so I need support for long-running PHP background processes to pull in and process data from distributed sources. For improved performance (and scalability), I've built a CMS that lets me separate application servers from RDF stores and allows me to spread the RDF stores across multiple MySQL servers.

With cloud/utility computing increasingly marketed as being ready for the masses, I started testing the various offerings, looking for a solution that would be (almost) as easy to manage as shared hosts. After experiments with Amazon EC2 (way too complicated for me, at least back then), Mosso (unfortunately only for US residents, the UI looks great), Flexiscale (found it unusable UI-wise), and Media Temple gs (great UI, but the grid was entirely unstable. Every 5th request was a 5xx), I came across GoGrid. They offer a simple control panel for managing servers and load balancers, fair pricing, various support channels (with Michael Sheehan doing a great job on Twitter), and with a bit of help from The Google, I managed to get everything up and running "in the Cloud".

I have two apps running on GoGrid servers now and didn't notice any downtimes (apart from self-caused ones): paggr doesn't have much traffic yet as it's still in private alpha, but the server has been running for 3 months now without interruption. The 2nd application is Knowee, a SemWeb system with about 500 bots running as PHP background processes and feeding data into about 200 RDF stores spread across a few servers.

This post describes how to set up a similar GoGrid system, including a Load Balancer, a main PHP App Server, and two MySQL Database Servers. I hope the hints are useful for others, too. (Note: This post is not about horizontal MySQL scaling or replication, the two DBs contain independent data).

Setting up the servers

Activating the Load Balancer, an App Server, and the 2 DB Servers is easy. Just click on "add" in the GoGrid control panel, specify RAM, OS, an IP (from the list of IPs assigned to your account) and an Image:

Save the settings and start the servers (right-click + "start"). One thing that I find a little annoying and that is hopefully going to be improved soon is that you can't change the settings of a server once it is deployed, e.g. to increase RAM. Stopping a server also keeps it being billed (apart from traffic). You always have to delete and re-build servers for changes or temporary down-scaling.

When you're done, your setup should look like this:
I'd suggest adding a Load Balancer even if you only have a single App Server. This way you can experiment with different App Servers without having to change the main public IP. And GoGrid's Load Balancing is free!

DNS

There is a page in the GoGrid Help pages about setting up DNS, but if you are maintaining your domains at an external provider, you can simply point the domain at the Load Balancer's IP and things will just work.

App Server: PHP/MySQL setup

For some reason, the server images (even the LAMP ones) don't come with MySQL client libraries, so we have to install them first. Luckily, this is simple on CentOS. Get the necessary root password by right-clicking on the server in the GoGrid control panel, then SSH into your App Server (using the Terminal on a Mac or a tool like Putty on Windows).

ssh root@server.ip.address.here

When you're logged in, install PHP with MySQL support via yum and follow the instructions:

yum install php-mysql

App Server: Optional php.ini tweaks

Should you want to change PHP settings, the php.ini is located at /etc/php.ini. I usually tweak max_execution_time and memory_limit.

App Server: httpd.conf tweaks

You'll find the Apache configuration at /etc/httpd/conf/httpd.conf. Here we have to set at least the ServerName to the site's domain. You may also want to enable .htaccess files in certain directories or disable directory browsing. When you're done, restart Apache:

service httpd restart

DB Servers: MySQL user and database setup

The default MySQL setup provides unprotected access to the database server, so the first thing we have to define is a root password. SSH into the DB server and log into MySQL (This should work without a password, i.e. don't append "-p"):

Create a user account for your DB and make MySQL accept requests from the App Server:

GRANT ALL PRIVILEGES ON db_name_here.* TO "db_user_here"@'app.server.ip.here'
IDENTIFIED BY "db_user_password_here";

The username and password can be freely defined. If you have multiple App Servers that should be able to connect to MySQL, you can define an IP pattern instead, for example app.server.ip.%, or an IP range, or use a domain name (See the MySQL docs for access options).

(By the way, if we were using a local MySQL server, with PHP and MySQL running on the same machine, the command would have been almost identical, we'd just have used "localhost" instead of the App Server's IP.)

Flush the privileges or restart MySQL, then leave the MySQL interface:

FLUSH PRIVILEGES;
exit

DB Servers: Enabling remote access

We are using dedicated MySQL serves in our setup. In order to connect to them from the PHP App Server, we have to enable remote access. There is a detailed how-to in the GoGrid Knowledge Base, but you basically just have to comment out the socket=... line in /etc/my.cnf and restart MySQL:

service mysqld restart

Done

You can now install your app and connect to MySQL from your App Server using the usual PHP commands and the DB servers' IP as MySQL host (instead of the usual "localhost").

In case of "Lost Connection"s

If you notice frequently lost MySQL connection, this might be related to some bug in the MySQL 5.0.x versions. I found a couple of forum posts suggesting to use hostnames instead of IPs to connect to the MySQL server, which indeed solved the problem for me (I also had some broken bots not closing connections, but that's another story ;). You first have to assign domain names to your DB servers (e.g. db1.yoursite.com) and can then set the host parameter to this domain in mysql_connect().

Bottom Line

You do need some terminal hacking to run your sites in the GoGrid Cloud, but I don't think it's more work than configuring a dedicated host. After getting used to the few required shell commands, I'm now able to activate additional servers in a few minutes. Compared to other services, I found GoGrid to be a very efficient and cost-effective solution for setting up and deploying a multi-server app environment.

A (personally) interesting thing about the webinale is its co-location with the International PHP Conference, and the (new) Dynamic Languages World Europe, and that registering for one conference includes free access to any of the others. It's the perfect audience to talk about practical SemWeb Scripting with ARC and PHP.

About a year ago, I received some funds which allowed me to re-write the ARC toolkit, and also to bring Trice (a semantic web application framework for PHP) to production-readiness. However, Semantic Web Development is generally still very new, especially in the Web Agency market where I'm coming from. It's not that easy yet to keep things self-sustaining.

May well be that I should blog less about bleeding-edge experiments, but rather about how RDF and SPARQL allow me to deploy extensible websites at a fraction of the time it used to take in the past. "Release Early", "Data First", "Evolve on the Fly", and all those patterns that SemWeb technology enables in a web development context.

Anyway, to keep things short: I'm actively (read: urgently ;-) looking for more paid projects. I'm a Web development all-rounder with particular interest in scripting languages and quite some experience in delivering RDF and frontend solutions (more details on my profile page). While it would of course be great to work on stuff where I can use my tools, I'm available for more general web development as well. I'm most productive when I can work from my office, but temporary travelling is basically fine, too. The Düsseldorf Airport is just minutes away.

The ARC WordPress Extension adds an RDF Store to the WordPress Blogging System

Together with Morten Frederiksen and Dan Brickley (who is revisiting his SparqlPress idea), I've created a WordPress extension (called "RDF Tools") that adds an (ARC-based) RDF Store and SPARQL Endpoint to the blogging system. The store is kept separate from the WP tables (i.e. it's not a wrapper), but you can use WP's nice admin screens to configure it (screenshot), and given the amount of developer-friendly hooks that WP offers, I'm curious what can be done now, possibly in combination with other extensions such as those Alexandre Passant is working on. It could perhaps also be handy as a deployment accelerator for knowee.

The webinale slides are online now. The session went OK, I'd say. I always make the mistake to look at the high conference prices and then end up trying to squeeze too much information into my talks to give the people some value for their money. It also was a bit hard to predict what the audience of the newly introduced webinale would be like. I did receive some great feedback from PHP coders (sneaking in from co-located IPC) who already had specific questions and asked about RAP and ARC. But I could see from many faces right after the session, that a very basic talk may have been better. Leo suggested to skip the ontology stuff entirely, the amount of different flavours (SKOS, RDF Schema, OWL Lite/DL/Full/+/-/1.1) is surely a whole mess marketing-wise. Next time I'll try to stick to the more intuitive stuff. At least I had a convincing demo about how (low-level) ontologies can be useful to greatly reduce custom application code.

I had a short chat with pageflakes' CEO Christoph Janz. Semantic Web technologies are not on their radar yet (maybe they are now ;), but we talked a bit about the possibility to add some RDF functionality to their widgets (which they call "flakes"). They may let us try some things in the context of the knowee project, e.g. a flake that could store contact data retrieved via GRDDL or a SPARQL endpoint. Might be worth checking out their SDK.

A spontaneous invitation to DrupalCon got me driving to Brussels yesterday to finally meet the CivicActions folks I've been working for during the last months. Unfortunately, I missed Jonathan Hendler's NINA presentation about adding ARC's SPARQL API to Drupal for building a faceted browser, but we chatted quite a bit about it after lunch. I still have to learn a lot about Drupal, but one of the really interesting things is that it provides an extension called Content Construction Kit (CCK) that simplifies defining flexible forms and their elements. Drupal generates an HTML page for every resource ("node" in Drupal-speak) created via CCK. The thing that's missing is mapping the structured CCK nodes to RDF to enable optimized SPARQL querying while keeping editing simple and integrated. We discussed the potential of not only ex- but also importing RDF data into CCK. And how cool it could be to directly convert RDFS/OWL to CCK field definitions. Good news is that there are several hooks to RDF-enhance Drupal without running into synchronization issues or forcing the replacement of built-in components.

CivicActions was a gold sponsor and Dan Robinson introduced me to some of the core Drupal developers. And as it turned out, some of them are already thinking about direct RDF support for Drupal (partly triggered by TimBL using Drupal for blogging, partly because Drupal's internal structure isn't really far away from a graph-based model). I'm aware of three efforts now to add RDF to Drupal in some way, there may be more.

But it's not only the Drupal crowd which is looking at SemWeb technology. At lunch, I met Johan Janssens, lead developer of the Mambo spin-off Joomla!, who told me about a SemWeb project proposal for their 2006 Google Summer of Code. (There is another one in the ideas section.) The project took more than just this summer (welcome to RDF development ;), and the outcome is not going to be added to Joomla! anytime soon, but obviously the PHP community is getting aware of RDF's potential benefits and is starting to play with RDF, OWL, and SPARQL. And it's approaching the SemWeb from a practical point of view which just can't be bad.

A first version of ARC RDF Store is now available. It's written entirely in PHP and has been optimized for basic LAMP environments where install and system configuration privileges are often not available. As with the other ARC components, I tried to keep things modular for easier integration in other PHP/MySQL-based systems.

Just in case you wondered why your PHP-driven XML parser doesn't work any more after switching to PHP5. There is a bug in the libxml2 character encoding detection that causes timeouts when the xml parser is created without providing a should-be-optional encoding. The bug seems to be fixed in PHP 5.0.4.

For the moment, I've set the encoding of the ARC parser I'm using at beta.bla.org to UTF-8, so that I could activate the RDF import feature I mentioned earlier today. Guess I'll have to put some more effort into the Web reader, so that I can detect the encoding before initializing the parser.

A couple of weeks ago I posted a hack to enable URIQA functionality on average hosted web servers. Hannes Gassert pinged me yesterday to tell me that an upcoming PHP release is going to support arbitrary HTTP verbs. Using unknown HTTP methods won't prevent PHP scripts from being called. This is basically a nice idea as it would allow me to use my standard CMS rewrite scripts to process URIQA reguests (actually it means that URIQA can easily be implemented on any PHP-enabled web server). But it also means that php sites which use default rewrite rules (e.g. where any request is passed to a central php script) will handle an MGET request just like a GET request if they don't do any method-specific checks.

URIQA's main advantage compared to other resource description discovery approaches (e.g. doing a HEAD before requesting metadata) is its efficiency: non-understood requests are meant to either return a (short) HTTP error or machine-readable, useful data. PHP seems to be adding potential ambiguity now that doesn't devaluate URIQA's utility but unfortunately it's efficiency argument.

Note to self: Don't mention that on rdf-interest. 2005 is still too young to reawake that thread. Hm, or maybe it's the right time now to propose my absolutely unambiguous offline approach again...