Wordpress Files

Working from my desk at home today, since Beth has a shop day. One fun project, make
a restful service that gives diffs of all versions of wordpress files. For any install,
for any file, determine if it’s core, give the md5 sum, and optionally give the fulltext
so a diff can be made (to patch changes back to original state, or identify malware).

First, let’s gather all the release files (there are only so many because this includes betas):

So what will this all look like? I’d like to use this as an opportunity to do something useable in rails, and rails 5 came out this week, so I think I’ll use rails 5 to make this api. The client code will probably be python, because, well, python is already installed. But the server code could be in cobol for all anyone knows, as long as the product works.

My model will be that there are wordpress versions rails g scaffold wordpress version release_date:date, and these versions have core files rails g scaffold core_file file_name md5sum size:integer content:text. Lucky for me, these are nested resources, so the following route looks correct (in config/routes.rb):

resources:wordpresses,path: "wordpress"doresources:core_filesend

So I added a release date, I could fudge and grab this from the readme, or grab the dates from WP’s Roadmap (for major releases only), or for 2.3 and later, the mtime from the releases download link matches the release date of the file (September 24, 2007). It looks like all of the earlier releases were added to the file server at that time, so the mtime on the archives is not usuable more than 9 years back. Well, I’ve seen much bigger projects lose information on a grander scale than this, so wordpress looks remarkably stable.

-rw-r--r-- 1 sysadmin wheel 870766 Sep 24 2007 wordpress-2.3.tar.gz

Let’s peel these open to get more information from them. This runs super fast upto about 2.8, until it slowed down I had imagined a bug in here was skipping the tar step and just creating directories.

extract

Cursory inspection of the file tree suggests that most of the time the archive contained a directory ‘wordpress’ at the head of the tree, but in some cases (mu releases before 3.0, and the named releases platinum, blakey, and miles, each had wordpress-mu (2.7 to 2.9.2) or wordpress-mu-VERSION (up to 2.6.5) or wordpress-version-NAME (platinum, miles), or the weirder wordpress-VERSION (blakey). I’ll fix all of these, since they bother me. Now, no matter how it was packaged, we can have a single expected directory name, wordpress-VERSION/wordpress/

tidy

Okay, so we now have what I think we want. We want to map these into a release version string (“4.5.2” or “0.7.1-gold”), and into a release date. I see ls -ld */wordpress gives a workable list of dates, lets fiddle just a bit with that. Rather than trying to force ls to be nice (there is a command line option for that), I took stat’s output for the modify time:

So now what. Rails aside, we want to get these version strings, and release dates, into the database. Directly concerning rails, while Active Record wants an integer/serial id column, we want our urls to look like /wordpress/4.5.3/core_files/index.php and not like /wordpress/201/core_files/89, which is okay if you are making a lot more round trips – fetch available versions, get id number for the one you need, fetch core files, and get details for the ones you need. For a human friendly view, wordpress/4.5/core_files/index.php looks correct

populating data by trial and error

The next step was to fire up a development copy on the server, since running this from my desk was useful, until all the data I want is on the server. So I’ll be using the server (where I have all the files) as the development instance. It’s in github now, so shuttling between the desk and server should be okay. And if you’re me and are whitelisted in the firewall, it’s online.

So how to put all this into the database? As a test I inserted the first version, and checking the logs, I see this was the transaction:

So inspecting the /wordpress/new page, there are those fields in the request, utf8, authenticity_token, and commit=”Create Wordpress”, the first two as hidden inputs , the last as the commit action (submit button). Do I need to grab this authenticity token for each post? Should I be using seeds.rb instead for this (you might say yes, you might be right). Let’s continue down the resisting path and loop this curl one more time.

or I can go back to seeds.rb, which seems like it’s going to be so much easier. So lets rewrite our loop to output a seeds.rb style list of version and dates, rather than watching the stack traces continue. So here’s a cheap (and horribly unattractive seeds generation:

This did work, and rails db:seed populated the versions at my development site. Now we can focus on nesting the core_files resource into the view, and controller.

interlude

Since I’m about 3 hours in right now, time to get some weekend beach hours in. My next steps will be validation (wordpress must have version), remapping urls to version (wordpress/4.5.3/ instead of wordpress/215/), nesting resources so that files belong to wordpress, and making an even bigger, uglier seeds.rb file.

validation

Let’s just follow the guide. Set WordPress to require version to be not null.

Let’s take a first pass at core_files now (as I think about uniqueness, I’m tempted to think file_name is a misnomer, and file_path is correct). What’s missing here is an indication that there should be a valid version that a file belongs to. But while I’ve still got that nagging feeling in my mind, I’ll migrate the schema to rename the column.

using version string as url

nesting files

So once you change the path for the resource, a lot of the generated scaffolding breaks, since it had attached top-level paths in the index, form, and redirects. For example:

undefined local variable or method `new_core_file_path' for #<#<Class:0x007f35907736f8>:0x007f3590772320>
Did you mean? new_wordpress_core_file_path

in app/views/core_files/index.html.erb

Shaking these out takes a little while, but the error messages are fairly straightforward, and give both a reasonable problem, and a file/method to fix it in, so it’s just a little work to reorganize. Critically, this does require adding a reference to the core_files table, so the request for @wordpress.core_files knows what to do.

populating files data

TBD

how to find version information in the wordpress files

From version 1.2 on, there is a file in wp-includes/version.php with a string to denote the wordpress version. For 1.0 and earlier, there are some oddities. 0.71-gold still had b2-includes/b2vars.php with $b2_version = '0.71';

Version 1.0 scrubbed the b2 names, renaming this path wp-includes/vars.php, with the new variable looking this way: $wp_version = '1.0.1';

Starting in 1.2, we see the vars split by functionality, and a version.php file containing what we are looking for: $wp_version = '1.2';