Blog posts tagged "php"

Spent a couple hours last night writing the core of a stripped down, PHP4 compatible API library for Amazon SimpleDB (in the style of my flickr simple library. Just not a fan of abstraction for its own sake). In the process I discovered that Amazon had revved the version on their “Signature Method”. Which is good news as SignatureVersion 1 contains a classic crypto-blunder in its design, namely it encourages collisions. (more details, also why you care about collisions) To date the solution was use SSL, and wait patiently, very patiently. So yay for Amazon fixing this! And in fairness, first couple of drafts of the OAuth spec contained a similar issue, though it got ironed out quickly. Yay for many eyes and the open web.

“OAuth-compatible” signing

Great things are more secure, good news and all, but that isn’t what caught my eye. This block of text did:

Here is what’s different about forming the string to sign for signature version 2:

You include additional components of the request in the string to sign

You include the query string control parameters (the equals signs and ampersands) in the string to sign

You sort the query string parameters using byte ordering

You URL encode the query string parameters and their values before signing the request

You really have to be an OAuth-dork to find anything special with that paragraph, but if you were, you’d notice that those 4 bullets are an incredibly succinct description of generating an OAuth signature. (in fact a more succinct description then appears anywhere in the OAuth documentation

Which meant that my SimpleDB library can reuse most of the logic from my OAuth library to do the trickiest part of the API call, namely the signing. (Additionally it means that security reviews of both protocols support each other)

So my AWS signing method is a approximately a dozen characters different then my OAuth method and as straightforward as:

(this uses my personal OAuth library, but your library should have similar methods)

Sure made my jobs of implementing a library easier. If you’re going to invent a new crypto protocol, please consider doing like Amazon, and re-using the basic building blocks. (which also happen to be best practices)

I’m a big believer in Norvig’s “Code is liability” maxim. Which is how I justify my ugly, but functional Flickr API implementation, in 40 lines of PHP (not the most expressive of languages), which I wrote in about 15 minutes one evening, and I now use for all of my Flickr side projects. And all apropos of digging through other folks Flickr API impls, trying to get them working on GAE. Thankfully blech is already there.

The docs on I found on Yahoo’s SearchMonkey were all arcane XSLT, and strange feed formats ne’er before seen on the Web. But this example from Sam shows how easy it can be – a couple of regexes, and a couple of lines of PHP.

This morning I needed to read from a file line by line from the bottom. In PHP. Perl, of course, has a module to do this. A quick view source decided that I didn’t want to get into file seeks before breakfast. Very happy with my solution:

Both are early enough in the dev cycle to be called proof of concepts.

Mostly I wrote it because I had always envisioned there being wrapper libraries around the low level OAuth implementations that wrapped the calls, and constants, and as Mike graciously went out and wrote a low level library I felt compelled to write a wrapper.

Also twittclient, an interactive client for getting an authed access token, essential to bootstrapping development.

And nota bene, HRO currently only supports the MD5 signing algorithm, which is undefined in the core spec, and subject to change. (Just in case you didn’t believe me about the early state of things.)

update 2008/4/18

This code no longer works because Twitter has taken down their (slightly non-compliant) OAuth endpoint. When they add OAuth support back in, I’ll link to it.

If you’ve played with the the F8 platform from Facebook you’ve probably found the documentation incomplete, the examples inaccurate, and a dearth of running code to learn from. F8 is not view sourceable. So Jeff’s app should help you get started.

Looking over the PHP5.2 changelog I noticed that somewhere along the way PHP5 seems to have picked up a provocatively named pair of classes, DateTime and DateTimeZone.

There is something fundamentally brash, brazen even, to releasing a class named DateTime. As a calendar geek I imagine upon seeing “new DateTime()” I feel something akin to what an old thespian feels when they see a company putting on a production of the Scottish play — it’s a decidedly mixed emotion. But I’m going to bump my way through learning how to use this new DateTime lib, bringing all my preconceptions about how it should work. The odds of this being interesting to you is probably nil unless you’re in one or two very small cliques, feel free to read on, or browse away.

I’m primarily working in PHP4 right now, so my first step was to grab a copy of MAMP 1.5b getting me a nice PHP5.2 sandbox to play with.

Hey! timezonedb! First fence cleared! A timezone database compiled into a native format based on Olson is the one true solution, and I can update it independently, the most recent release being based on 2007b. Sweet.

Constructor takes an initialization string that it passes to strtotime(), and an optional DateTimeZone obj. Defaults to “now”

$date = new DateTime();
echo $date . "\n";
> Object of class DateTime could not be converted to string

Oops, no __toString() method defined. You’ll need to use the format() instance method. If you end up using the DateTime objects, you’ll be seeing a lot of format(), more on that in a bit.

Note: thats a constant, if you pass in the string ‘DATE_RFC3339′, and you’ll get odd looking results.

Here we can see the default constructor sets both the time and a timezone — correctly, for the moment, identifying my timezone as America/New_York. That’s somewhat contentious behaviour, some people will tell you that dates with unspecified timezones should either be in UTC or be “floating”, divorced from any timezone. Why? At least in part because across platforms and boxes timezone guessing is going to be non-deterministic — the script that worked when you ran it locally on your Mac laptop in New York, might fail on your ISP’s servers. You get a hint of this reading over the timezone guessing rules on date_default_timezone_get. There is also the fact that I’m currently moving at about 400mph and will be in a different timezone real soon now. However you can set the default to something reasonable in a script, or in the php.ini. (consider this my recommendation)

Siiiigh. Not smart enough to cast strings into TimeZone objects (holds true for the constructor as well, so no new DateTime('now', 'UTC')). Now its time to learn how to use DateTimeZone.

Working with DateTimeZone, All Hail Olson

I mentioned briefly earlier that PHP is now shipping with an extension timezonedb, which is a compiled version of the Olson database. The Olson database is a massive, largely volunteer effort to catalog the various timezones both in use, and those that have been in the past. Time is a political issue, particularly day light savings, and as such the rules governing it are arbitrary, whimsical, and subject to frequent change. (p.s. gotten a panicked memo yet about new daylight savings compliance for March 11th? No? Where did you say you worked?)

Note: Olson also uses a longer form of the zone names then we usually see in the U.S., this is to combat ambiguity. See Appendix H for a list of timezone names, including some handy shortcuts.

This is starting to get long winded, but, hey, PHP5 supports object dereferencing on returns. Maybe this will work.

echo $date->setTimezone($tz)->format(DATE_RFC3339) . "\n";
> Call to a member function format() on a non-object

Nope. Oh well.

Date vs Datetime?

Say I’ve got a nice platonic date, say November 11th. There is no time element associated with this, so timezones are kind of irrelevant. I mean Nov. 11th starts at different times through out the world, but Nov. 11th is universal. (as long as you’re using the same version of Gregorian as most of the rest of us) Ideally this date would float above timezone issues, but that isn’t how PHP does it, 2007-11-11 is treated internally as midnight on the 11th, which is certainly simpler, but disappointing. You can prove this like so:

Daylight Saving, March 11th, and Why Programmers Are a Grouchy Lot

Note: getOffset, which returns a timezone’s offset in seconds from UTC, takes a DateTime obj because offsets can be date sensitive due to daylight savings. Really without daylight saving this stuff would all be pretty straightforward. Let’s test to make sure the offsets are correct at the boundary.

The Basics: Accessors and Mutators

So what are some other basic desires?

Get epoch seconds! Except for their kind of limited range epoch seconds are great, and have helped a generation of programmers put off worrying about timezones as long as possible. They’re also the backbone of PHP’s traditional date/time methods.

Alas, there isn’t an accessor method for getting epoch seconds, you’ll have to use format().

In fact DateTime doesn’t expose any of the accessors you’d expect, so you’ll be using format a lot if you want to access pieces of your date. (for you know, display purposes, or manipulation, or building queries, or pretty much doing anything you’d want to do with a date)

So what is an ISODate? I’m unclear, and so is PHP’s documentation. The docs show the call signature taking a $year, $week, and optional $day, while the description talks about $year, $month, $day. Looking at the code looks like $week is the proper call, $month is cut and paste error from setDate(). So I guess this is a method for setting day by the “week of the year” a concept more popular in Europe then in the US. Not sure what ISO has to do with it. So what is our current week of the year?

At least the relative date format is super flexible and expressive. As far as I know the closest thing to documentation is from the GNU tar manual on date input formats. (just like CVS) Btw. if you ever want nightmares, take a look at the scan method in PHP’s parse_date.c and be thankful that isn’t your job to maintain

Date Math: Comparison and Differences

Beyond adding deltas (“+7 days”), the other common date math is comparing two datetimes, to find out which is more recent, and getting the difference between them. DateTime supports no methods for comparing two datetimes. The simplest solution for doing comparison is to compare epoch seconds.

Note: This method only works for dates that can be represented by epoch seconds. PHP uses a signed int for epoch seconds, so the range is limited by the size of the max int on your platform. Generally you get approximately 138 years, 1901 to 2038. There are other schemes besides epoch seconds for mapping dates to an easily comparable number; MJDs, and Tai time being two. See also Rheingold & Dershowitz 1997

If you’re going to be comparing a large number of dates you might consider a memoization technique like the Schwartzian transform.

We can get the difference in seconds using the same hack of casting to epochs.

echo $d2->format('U') - $d1->format('U') . "\n";
> 86400

Ideally we’d then divide the difference seconds to get the difference in hours, days, weeks, or months. However the following naive solution won’t work.

$diff / (60*60*24); // calculate difference in days, **BADLY**

Why not? Because days don’t always have 24 hours. Sometimes they have 23 hours, sometimes they have 25. Daylight saving strikes again. (If you want to be even more pedantic, minutes are also not 60 seconds long, sometimes they’re 61 seconds long if we have a leap second)

Basically you need to break yourself of thinking of datetime units as being fungible. You can’t simply calculate minutes from seconds, or days from hours. Just like you can’t divide days by 30 to get an accurate number of months. There are solutions, but they’re a bit beyond this blog post.

new DateTime from Epoch Seconds

So, non-fungible, remember that.

But sometimes you’ve cast DateTimes down to epochs to do math. And then you’ll want to cast back to a DateTime.

Alas DateTime doesn’t have a constructor that takes an epoch, and passing a epoch to the default constructor will throw an exception, rather you want:

Probably surprises no one but me, but the work on Magpie has been moving slowly again lately. Rather then continue futzing with it in the rare spare moments, I’m going to push it out in a very raw state, where folks can give some feedback. (and because folks keep asking about it) This is a preview release, very alpha.

Alpha

Alpha means its broken, there are missing features (many of which are actually in the current Magpie 0.7x releases), and stuff will change. Whether you find that exciting or off-putting is a personal thing.

Getting Started

So um, yeah, as of yet no documentation. That said I’ve included the classic magpie_simple.php script, unchanged except for the require statement. For really simple scripts that is all that is required.

Grab the current code, Magpie 2.0-alpha-PR1. This isn’t its final home, but I am taking this opportunity to finally break free of Sourceforge which has been a long standing goal.

(I know, we all prefer a good var_dump() plus source reading to docs, but their current non-existence won’t continue)

Goals for 2.0

There were three over-arching design goals in this rewrite, plus a slew of secondary goals.

1. Support new namespaces and elements, easily

Rather then go on trying to push the universal rules for mapping unknown elements to datastructures (the Magpie 1.0 approach) I’ve focused on making it simple to register custom parsing logic, and having intelligent defaults. (aka more like what Feedparser.py does) I expect that we’ll handle most known namespaces in short order, and barring another total upheaval of the landscape ala the Pie/Atom project should sit us in good stead as feed use continues to become more sophisticated.

2. Pluggable components

You should be able to easily swap out the caching layer (database caches anyone?), the HTTP layer (multiplexed curl?), even the parser. Besides the added flexibility, the theory is this will make embedding simpler.

Who knows, maybe someone will even contribute a pluggable parser that can handle something other then well formed XML.

3. Mostly backwards compatible

For simple stuff, you’re scripts should go on working. There is still a 1 function interface (if you liked that), still bust everything down into a couple of nested arrays for easy looping and echoing. Even where different it should feel familiar.

Known Issues

Parser doesn’t support xml:base nor xml:lang nor Atom inheritance. It will, but these features still annoy me, and as long as no one is using the code I can’t seem to motivate to support them.

I’m also not doing all the normalization between feed types that I do in
“Magpie classic”. Again, its coming.

Not sanitizing content yet.

Not just a new parser, but a new HTTP client as well. Basic HTTP auth support is there, digest isn’t. Haven’t added back in SSL support yet.

No documentation per se.

Incompatibilities and Gotcha

Just use $item['content'] instead of $item['content']['encoded'] or $item['atom_content'].

If caching is turned on, and Magpie can’t write to it’s cache, it will throw a fatal error, rather then quietly working in a degraded state. This might change, but its been a major support issue.

Tests. Most of them based on Mark’s FP tests. More added all the time. Currently not distributing with Magpie as I haven’t really figured out the license issues.

Confusing new licensing. Stated goal is to license under a dual GPL/BSD license. That means you get to choose if you’re using the software under the GPL, or the BSD. In addition you can upgrade your license to GPL from BSD (as you can with any BSD licensed software) merely by wishing it to be so.

Realized the other morning that I had half an article written just looking at my notes from some recent PHP hacking. Looked around a bit for somewhere to publish it, but most of the places I might have sent a PHP article a few years ago don’t seem to be around, or at least not accepting submission. Suggestions on a good venue?

Afternoon

Tim’stalk on atompub was good, if basic. I’m still trying to figure out how to push it beyond basic publishing. An interesting challenge, but later Tim pointed out that there are is some intentional wiggle room left in the spec, and one or two holes that you could drive a truck through with enough determination. (don’t think there was a slides link?) Oddest new thought, we need a mime type for Markdown, or maybe a container type for the whole class of human readable markup.

On The Floor

Wandered the exhibit hall a bit. Blah. Just not feeling the vendor love. No surprise, but its slams home what a commercial conference OSCON is, with very little of the raw delite and innovation of the smaller events. All the good shirts cost $$ this year — wonder what that means for t-shirt driven development, and the t-shirt economic metric. Finally met Jason from Apress. They have a Flickr book out in August.

Too Many Codepoints

Andrei’s PHP6 and Unicode talk was impressive, and overwhelming. ICU is being baked deep into the core string object. An .ini setting to determine default behaviour, with Unicode (UTF-16) and binary string types. Automatic stream oriented encodings from input/output/file/cli etc. Look for a preview release this Fall. (Eclipse gets all the glory, but ICU is another amazing IBM open source contribution, worth checking out in its own right) I’m not sure anyone is thinking about the “How do I make charsets work across the PHP4, PHP5, PHP6 spectrum in an open source library?” Doesn’t seem like that out there a question. Looking forward to catching up with him back in Sunnyvale.

Ruby Rodeo!

FreeGeeK goes on being one of the coolest, most inspiring community projects anywhere. Packing it to gills with hyper excited [Ruby hackers] certainly didn’t detract. Lucas Carson’s talk on dRB/Rinda was cool and inspiring. Not as polished a delivery as some talks, but he coded up a server-client architecture for discovering primes and automatically deployed it to those of us in the audience running irb. In about 20 minutes. The hilight though was finally getting a chance to catch up with Scott after all these years.
(Rinda may just be good old Linda retreads, but Ruby is so damn slow that distributed computing is with the effort)

Recently started using Simple Test for a couple of projects (even slipped it in at work, but sssh! don’t tell them!). So far I’m very happy with it. Using PHPUnit was always a bit of a non-starter, it felt heavy, and even which version (fork?) to use was ambiguous.

Simple Test’s documentation beyond the basics start to trail off, but the code is eminently readable (better then docs any day!), and I found writing a harness to work with the feedparser tests pleasantly straightforward.