At the request of the authors of Vault, I've decided to take this content down. It was creating a lot of problems
for people who aren't using Vault in exactly the same way I was, and it was likely causing more harm than good. If you need a way to reliably do
Vault replication, you'll want to look into the Vault Premium package.

There comes a point in the lifecycle of every Puppet setup where you realize
that you’re going to be much better off utilizing other peoples' Puppet
modules whenever possible. It’s what makes OSS great – why should I reinvent
the wheel when I can help make your wheel even better? I’ve found what I
think is a very productive setup – it leverages Git (specifically branches,
submodules, and hooks), Gitolite permissions, and Puppet environments to
create a workflow that a team of admins can use to iterate over new features
on without disturbing each other.

Most of pieces to this puzzle are very well documented elsewhere, I’ll provide
links where necessary.

Step 1: Establish Dynamic Environment Workflow

The first step is to go read “Git Workflow and Puppet
Environments” written by Adrien Thebo of Puppet Labs. Once you’ve
implemented that setup, you should be able to do the following from your
workstation:

At this point, you now have a new environment named ‘mybrokenbranch’ on your
Puppetmaster. You can test the setup by ssh'ing into the client machines and
run:

puppet agent --test --environment mybrokenbranch --noop

That obviously won’t be a happy puppet run. The key
point here being that your other environments are not impacted by the work of
this one admin. Let’s delete the local and remote branch. From your
workstation:

Note that
the Puppetmaster says that it’s deleted the environment. Feel free to verify
that by running the above command on the Puppet client, it will complain about
not having an environment.

Step 2: Incorporate Git Submodules

With all that setup, let’s go ahead and implement support for git submodules.
I have a pull request off to Adrien to implement this functionality, but until
he commits it in, you can use my fork on github. Replace the update
hook with the updated version on your git server. Now, let’s try pulling a git
submodule into our repo. Again, from your workstation:

Note in the output that the Puppetmaster is checking out the
git submodule into the new environment. Go ahead and log into the
Puppetmaster, and look in your firewall environment, you should see all the
manifests and whatnot there.

Here’s where I need to stamp a disclosure notice – git submodules aren’t all
milk and honey. There’s some funky situations you can get yourself into if
you’re not careful. Thankfully, there’s not many of those situations you can’t
get yourself out of. I highly recommend reading the Pro Git chapter on submodules before doing anything with
them.

Step 3: Implement Access Controls on Gitolite

This next step is entirely optional, but works out well for us. We have a
setup where I’m the only admin that can write to the master and testing
branches of our git repo, but any sysadmin can create their own branch, test
it, and delete it if need be. Setting up gitolite is far beyond the scope of
this post, but if you have about an hour of free time, you can have it setup
and running. However, below I’ve pasted the relevant snippet from
gitolite.conf that enforces those permissions.

Step 4: Profit!

To summarize it all, here’s the workflow for an admin to add a new feature in
our Puppet setup:

Create a new VM which will be the testing ground for the new feature.

Create a local feature branch to implement the new feature in. The admin iterates over this branch (pushing the branch to origin) getting things working with his VM.

Once he’s happy with the results on his VM, he’s required to login to another sandbox VM, and run it against the same puppet branch with the ‘–noop’ flag to ensure nothing unintended happens.

At this point, the positive and the negative have been tested, and he then asks me to merge the feature branch into master.

I then do a

git diff ...origin/newfeature

We go over any changes, and I merge it in.
1. From there, we follow our normal deployment method of tagging a release, and manually checking out the tag on the Puppetmaster.

While it’s certainly not perfect, this workflow setup has allowed us to work
together as a team while still implementing some best practices. In
particular, the dynamic environments allow us to test our features extensively
before releasing them into production. This is especially important in a team
where the admins aren’t Ruby programmers that can write puppet-rspec tests.

Before the integration of git submodules with the dynamic environment
workflow, we were manually merging external repos into our own setup, and it
was an absolute nightmare. Now, to update our repo to use a new version of
someone else’s module, we just create a new feature branch, update the
submodule, test, and merge.

What workflows do you and your team use that make life with Puppet better?
Please share below.

I’ve been running a single node from VPS.net for about a
year now. Please note that my specific experience has been in their “Chicago
Zone D data center”, but if you check out their status
page or search
Twitter, you’ll find a lot of others
having the same issues. While there’s a lot of good things to write about,
where they fail is the most important area to me: availabilty. The pros of
using VPS.net include pricing, control panel, and console level access. As is
typical for a VPS provider, they give you many “add-on” options such as
backup, etc that you can enable – I’ve not investigated them myself. Perhaps
the one of the nicest features is the ability to add server resources or
“nodes” on the fly with minimal downtime. However, it seems that VPS.net has
made a horrible choice in selecting what SAN they use to back their VM’s.
Examine the graphic below: As you
can see, I’m getting less than 2 nines worth of uptime from my node. Each and
everytime there’s been an issue, support has been quick to point out that
they’ve had some sort of SAN issue, and that the SAN is ‘resyncing’. The
problem is that while the SAN is resyncing, I/O to my node is so horrible, I
can’t cat a 500 byte file to stdout in less than 10 seconds. So, the node
will respond to a ping, but it can’t serve up a static image via Apache. For
all intents and purposes, that’s down in my book. The last SAN
synchronization took the better
part of two days, during which time my node was unusable. In my experience,
the SAN is the most important building block when architecting a service
that’s meant to be highly available. Until VPS.net can address their SAN
issues, they are likely to continue to have prolonged downtimes. Until that’s
been fixed, there’s just no way I can recommend their services to anyone.

There’s only a certain amount of bandwidth in a person’s day. As you get
older, that bandwidth seems to become more and more constrained. Kids are
extreme bandwidth hogs :) Over the years I’ve found that I have enough
bandwidth in my life to deal with one obsession that’s not part of my day job
at any given time. For the last couple years, that obsession has been with
Drupal and specifically with Node
Gallery. In my very biased opinion,
it’s the most user-friendly and integrated gallery experience you can have
with Drupal 6.x. Also IMHO, there’s a huge void in Drupal 7 with respect to
butt-kicking gallery modules, one that’s begging to be filled with a Node
Gallery 7.x branch. But I just can’t bring myself to that one simple git
command. I’ve had several changes at work in the past year, and I’m no longer
working with Drupal and PHP on a regular basis. I’ve become enthralled with
Puppet as of late, and that’s proven to be the gateway drug to the devops
movement for me. I’m reading books on Kanban, learning a bit of Ruby, building
deployment pipelines, and soaking up anything I can on devops. It seems
sysadmins who can code really do have a place in the world, and it appears to
be in devops. It’s not burnout, it’s simply a matter of prioritization on
demands for a limited resource. There’s just no time left over for Drupal
anymore. Back to the point of this post – Node Gallery needs a co-maintainer
who can take the module into the 7.x branch. The recently released 6.x-3.x
branch has proven to be quite stable, and would likely require only very
minimal maintenance. You can take it for a spin on the demo
site, or read all about it’s features on
the project page. Here’s some quick
points:

It has a reported user base of just under 3,800 sites, which puts it at right around #400 on the top modules list.

It has a great user base that’s proven to be active in the issue queue. Many of the support requests have been resolved by members of the community whom have never written a line of code. It has a strong German presence, and has been translated.

It integrates very tightly with Views, and supports bulk uploading with Plupload. It has it’s own access module in Node Gallery Access, as well as a handful of other modules (all of which are listed on the project page) it integrates with very well.

It’s been engineered to perform well from the start. If your server can handle the load of 100,000 nodes, there’s no reason it should be able to handle 100,000 Node Gallery image nodes – even if those are all in one gallery.

The administration UI aims to provide a working gallery setup out-of-the-box that works for 90% of the users, yet provide enough buttons and knobs for the remaining 10% to be able to tweak what they need.

It runs the gamut of technologies in Drupal; making use of caching, Views integration, jQuery and jQuery UI, CCK, Node Access, Batch API, etc.

What differentiates Node Gallery from most other gallery modules is that each and every image in a gallery isn’t just a field, it’s an entire node. This opens up huge possibilities for interactions with other contrib modules. The original reason for me selecting Node Gallery was because it was the only way I could sell individual images using Ubercart.
Who I’m looking for:

This module is likely a bit complex for someone who’s never maintained a module before. If you’ve maintained your own Drupal module (either privately or on d.o), take a look at the code and make sure you can understand what’s going on.

Drupal 7 API experience is a must; experience in migrating D6 modules to D7 is a plus.

Ideally, you need to have an “itch that needs scratching” – in other words, you should probably have a need for an image gallery.
If you’d like to take a crack at bringing Node Gallery to Drupal 7, contact
me, or file an
issue in the issue
queue.

Heroku has been around for awhile now, but has been primarily a rails host. Well, until recently anway. With the announcement of their Facebook integration, many others have noted that *any* PHP app can at least parse on Heroku’s cedar stack. I’ll be honest, it took me longer to get ruby+rails setup on my Macbook than it did to get a proof-of-concept installation of Drupal up and running. Here’s what I did:

Get ruby, rails, and the heroku gem installed and running. This page had me up and running pretty quickly on my Mac.

Here’s what makes all this proof-of-concept only. Many of the features used in Drupal core’s .htaccess file assume that the webhost has enabled the “AllowOverride All” option. Heroku doesn’t allow this, it only allows a small subset of overrides. DOING THIS WILL MORE THAN LIKELY COMPROMISE THE SECURITY OF YOUR DRUPAL INSTALL. Open up .htaccess in your editor, and comment out any line that starts with these strings:

Order

Options

DirectoryIndex

php_value

Add Drupal to git, and commit:

git add .; git commit -m 'initial commit'

Create your heroku application. You’ll need to have signed up for a free account on http://www.heroku.com and give the following command your login credentials:

heroku create --stack cedar

Push your code up to heroku (note the URL it gives you back):

git push heroku master

Now, we need to setup the Postgres instance:

heroku addons:add shared-database

Let’s display our Postgres credentials:

heroku config

You can now hit your Drupal instance at the URL given to you by your last git push. Install as you normally would, selecting Postgres as your database, and filling in the user, password, database, and host given to you by ‘heroku config’. Make sure to change the host from localhost under the “Advanced” fieldset.

At this point, you can poke around your install, and start seeing what all else is broken :) ‘heroku logs -t’ is your friend. If you don’t believe me, here’s a D7 instance, and here’s a D6 one.

Seriously, the .htaccess point is a deal-breaker. Unless someone with more time on their hands than I do can suggest a more secure configuration (or heroku allows Drupal to override all), there’s some serious security ramifications to commenting out the lines in .htaccess.

Drupal is definitely slow on the free plan for Heroku, but I mean, it’s free; what did you expect? Drupal 6 seemed to work throughout, but I noticed when getting D7 up and running that I couldn’t hit some “heavy” URL’s like /admin/configure and /admin/reports/status. I could get into other sub-menus such as /admin/configure/development/performance. We all know D7 takes a fair amount of horsepower to run, and horses aren’t free :). The whole point of heroku is being able to scale your app by dragging a slider in a web ui, and there’s no reason to believe that Drupal wouldn’t start running much faster given more resources from a non-free plan.

The point of this blog post was to just jot down my notes and save someone else a little time in getting started – hopefully the community can come up with some ideas so we have another awesome choice in Drupal hosting!

At one of my jobs, we recently went through the process of selecting a
CDN (Content Delivery Network) for our site. While the first rule of
CDN’s is that “any CDN is better than no CDN”, it can be argued that
certain CDN’s are a better fit in certain situation than others. This
post is basically a summary of the process we went through when
selecting our CDN. By no means is this a statement of “XYZ is better
than ABC”, it’s simply documentation of the process we went through in
order to select the right one for our business. While most CDN’s are
compatible with Drupal via excellent contrib modules such as CDN,
this information presented in this article is relative to any website
and isn’t Drupal-specific.

To illustrate the importance of a CDN using real numbers, one image
being fetched from the data center to our office takes about 323ms. That
same image fetched from Seattle is 483ms, and from Washington DC takes
599ms. The worst cases appear when coming overseas - to fetch the same
image from Paris it takes on average 1,141ms for just that one image.

A Content Delivery Network (CDN) shortens that distance between your
static content and the end-user. While the text on most web pages is
dynamic, most images, JavaScript, and CSS are static. These static
objects make up a large percentage of the total bytes downloaded for
each page view. By using a CDN, you place static content as close to the
end-user as possible. In turn this decreases the page load time a
end-user experiences by leaps and bounds.

Pre-selection Criteria

There’s a plethora of CDN’s to choose from, and if you don’t filter the
initial list down to five or fewer providers, you’ll end up spending
months in evaluation time. By defining specific must-have features, we
were able to limit the initial number of companies to compare to four.
Many CDN’s provide value-add services above and beyone static objects,
such as “Dynamic Site Acceleration” – this evaluation looked solely at
serving up static file content, e.g. JPEG, GIF, CSS, and Javascript.

The filtering properties we used to limit scope were:

The CDN must provide “origin pull” or “reverse proxy” support. If
the CDN receives a request for a file that doesn’t exist at the edge, it
applies a customer-defined URL rewrite to the request, and proxies the
request to the origin site. If the image exists at the origin, the edge
server caches the image locally and serves it to the client from there.
For example, the CDN host name might be cdn.example.com (which points to
the edge), and the origin site (my server) would be www.example.com. If
I point my browser to http://cdn.example.com/logo.gif, and that file
doesn’t exist at the edge, the CDN will make a request for
http://www.example.com/logo.gif. If that file exists, it is fetched and
cached. If it doesn’t exist, a 404 is returned to the client. The trade
off is that you don’t have to pre-seed static content to the CDN, but
the first user request for a static object takes a bit longer to
complete (because it results in two requests instead of one). Once the
edge network’s cache is primed, there is no performance difference
between origin pull and CDN origin.

The CDN must propagate cache-related HTTP headers from the origin to
the end-user We’ve went to great lengths to use versioning of
filenames so that we can set far-future expires headers on 99% of our
static content as recommended by Yahoo’s “Best Practices for Speeding Up Your Website”.
This results in far fewer HTTP requests to render a
page that has already been requested by the end-user previously,
ultimately decreasing page response time. Some CDN’s that offer origin
pull do not proxy these headers back.

The CDN must use GZip compression on text-based content Most CDN’s
support this, but it’s something you definitely want to check. When
serving up static text-based content such as CSS or Javascript, the CDN
can and should compress it for you before sending it to the end-user.
Compression makes the overall page content smaller, and therefore faster
to render.

Response time must be consistent and fast Performance is a tricky
thing. While having the fastest response time overall didn’t guarantee
that a CDN would “win”, having consistent relative poor performance
would guarantee a CDN would “lose”. Try not to focus too much on
performance numbers – most of the CDN’s will have a standard deviation
less than ten milliseconds between each other. In our research we found
out quickly that there’s a lot of features more important to us than 5ms
worth of response time.

100% Uptime SLA Since a CDN is at it’s most basic level a
geographically distributed cluster of cache servers, it should be
implied that a CDN can provide 100% uptime. If one POP goes down,
requests should be automatically routed to the next nearest POP. If your
CDN doesn’t offer this, you need a new CDN.

Company financial strength and solvency This is something often
overlooked when people evaluate, but was very important to us. There are
a lot of CDN’s out there, and we found only 2 or 3 that could put in
writing that they are a profitable corporation. Our implementation
required a fair amount of work, and it would take us some time to switch
to another CDN. If your CDN goes dark in the middle of the night, how
long will it take you to switch?

Important Features

Whereas not meeting any of the above requirements would result in being
excluded from our comparison, the following features were key points of
consideration. Not meeting them all wouldn’t exclude a CDN, but on the
flip side, implementing them all would put the CDN in very good
standing.

Price. While high prices weren’t going to scare us away, bang
for the buck played a large part in our decision. We weren’t
interested in paying a premium for brand recognition.

Strong international presence. Our guests include international
clients, and poor static object performance for those clients was
the key motivation for implementing a CDN in the first place.

Contract terms. Some CDN’s do month-to-month, some do 12 month,
others require longer as you negotiate price.

Overage fees. CDN’s meter you on the amount of bandwidth you
consume. You pay for a “bucket”. No CDN’s turn your service off
after you exceed that bucket, they just bill you for overages. The
good CDN’s will bill you at the same per-GB rate that you pay for
your monthly bucket. Some CDN’s charge as much as 2x for overages.
Avoid those.

Traffic accounting. One other thing often overlooked with origin
pull CDN’s is whether or not the traffic between the edge POP’s and
the customer origin is counted as traffic against your total. Some
CDN’s count it against your bucket, others don’t.

Setup fees. CDN’s vary wildly on their setup fees. Some are
free, some charge more than $5,000. Make sure you incorporate that
cost into your decision.

User Interface. All CDN’s offer some form of web-based
interface. The quality of the interface greatly differs between
CDN’s. I could swear that some of the interfaces I saw were written
in CGI Perl in the late 90’s. Others interfaces offered everything a
customer could ever want, including detailed analytics and
reporting. Key questions to ask are “If I get a bad image out on the
edge, how do I purge it?”, and “How do I tell how much bandwidth is
being consumed throughout the CDN at any particular point in time?”

External Reporting Data

We chose to invest in one month’s worth of reporting from
CloudHarmony’s CloudReports service. This gave us a quick way to
examine performance of CDN’s to the actual end-user browser behind a
real cable/dsl/dialup connection (not to a datacenter somewhere). While
some might view those reports expensive, we found it quite a bargain to
have another independent view into the performance of a vast majority of
CDN’s.

The Contenders

Given the above requirements, coupled with the performance data provided
from CloudHarmony we were able to refine our list of CDN’s to consider.
In alphabetical order:

First elimination: Akamai

Akamai is to the CDN market what Bose is to the home audio market. While
it’s not inherently a bad product, you’re paying a huge premium for the
brand name. While we never got so far as to setup a demo account, the
performance data provided by CloudHarmony and other sources didn’t favor
them well at all. My personal opinion (which is little more than a wild
guess) on why Akamai doesn’t perform as well is because of their
product’s age. Their network is by far the largest one out there, and I
can guess that keeping up with the latest optimizations and protocols is
a huge undertaking.

When speaking with Akamai, I got the impression that they really don’t
care to sell their static object delivery product by itself. Their reps
focused mostly on trying to upsell their Dynamic Site Acceleration
product. While DSA might indeed be a great product, it wasn’t what we
were interested in.

In the end, the best price I could get out of Akamai was more than twice
that of the next most expensive CDN in our comparison, and they wanted a
3 year contract at that price. I’m just not that into paying twice as
much for an equal product, so they were eliminated. If we should move to
a Dynamic Site Acceleration type of service later, Akamai will
definitely be re-evaluated at that time.

Second elimination: LimeLight

LimeLight Networks is the 2nd largest CDN provider, behind Akamai. It’s
fitting that they are right behind Akamai, because they came across like
a smaller Akamai to me. Their pricing is much more competitive than
Akamai, and performance appeared to be quite good across the board. They
supposedly have a nice web and reporting interface, but I was unable to
get a demo setup without filling out paperwork that would have required
approval from our legal department. Therein lies the problem with
LimeLight – getting them to do anything outside the everyday norm was
like pulling teeth. Like Akamai, LimeLight also is focused on the upsell
and seemed to me generally disinterested in selling their static
delivery service.

If for some reason, we had to switch away from our primary choice, my
second choice would likely be LimeLight Networks, but only after I was
able to obtain a demo account so that I could verify their performance
was within acceptable range and the functionality of their user
interface.

Independent Performance comparisons

I was able to easily procure demo accounts from EdgeCast and CacheFly,
so I set up some performance testing of our own using Pingdom to
download a typical JPEG image from each Pingdom POP using the origin
pull setup. Note that since Pingdom’s servers are in data centers and
not in actual residences; this isn’t a measure of end-to-end
performance, rather a way to compare apples to apples response time from
various regions around the world. The executive summary here is that
while EdgeCast “edged” out CacheFly, the real message is that any CDN is
so much better than none at all:

CDN

US/Non-US

Location

# of Polls

Avg Response Time

Max Response Time

StdDev

CacheFly

Non-US

Amsterdam 2, Netherlands

289

68

4202

285.98

Copenhagen, Denmark

259

158

461

36.02

Frankfurt, Germany

287

41

567

32.38

London 2, UK

290

29

2489

145.26

London, UK

284

29

127

11.30

Madrid, Spain

259

201

586

31.36

Manchester, UK

281

129

1709

184.87

Montreal, Canada

286

105

3084

250.63

Paris, France

286

143

521

60.11

Stockholm, Sweden

286

54

882

80.88

Non-US Total

2807

94

4202

157.88

US

Atlanta, Georgia

289

16

398

23.52

Chicago, IL

288

56

2615

158.33

Dallas 4, TX

286

40

960

74.61

Dallas 5, TX

289

26

1506

89.08

Dallas 6, TX

291

47

1473

132.25

Denver, CO

289

216

925

72.18

Herndon, VA

288

473

3472

196.13

Houston 3, TX

289

107

382

18.15

Las Vegas, NV

288

74

3044

180.60

Los Angeles, CA

289

12

92

11.52

New York, NY

289

175

2571

152.29

San Francisco, CA

287

28

231

24.17

Seattle, WA

288

174

1083

108.41

Tampa, Florida

267

68

3048

214.49

Washington, DC

286

163

1547

141.67

US Total

4303

112

3472

170.11

CacheFly Total

7110

105

4202

165.61

EdgeCast Small

Non-US

Amsterdam 2, Netherlands

284

62

381

27.49

Copenhagen, Denmark

254

126

1148

87.72

Frankfurt, Germany

284

40

318

19.05

London 2, UK

284

26

975

59.59

London, UK

283

23

191

14.38

Madrid, Spain

252

176

1174

112.31

Manchester, UK

275

86

1494

118.26

Montreal, Canada

283

163

601

59.56

Paris, France

283

94

1537

140.76

Stockholm, Sweden

271

162

967

81.87

Non-US Total

2753

94

1537

99.35

US

Atlanta, Georgia

284

129

523

34.51

Chicago, IL

284

26

463

35.86

Dallas 4, TX

277

30

339

25.79

Dallas 5, TX

284

26

581

50.32

Dallas 6, TX

284

23

430

33.68

Denver, CO

281

244

2169

150.12

Herndon, VA

280

24

301

20.44

Houston 3, TX

281

115

441

40.02

Las Vegas, NV

281

56

559

34.32

Los Angeles, CA

283

11

94

8.45

New York, NY

284

72

1134

161.16

San Francisco, CA

280

23

118

11.01

Seattle, WA

282

131

3571

333.38

Tampa, Florida

260

166

4977

303.29

Washington, DC

282

83

686

111.97

US Total

4207

77

4977

148.63

EdgeCast Small Total

6960

84

4977

131.64

Data Center

Non-US

Amsterdam 2, Netherlands

292

837

1344

35.72

Copenhagen, Denmark

262

990

4195

297.90

Frankfurt, Germany

291

867

1533

57.14

London 2, UK

291

725

1065

25.95

London, UK

290

811

1114

49.50

Madrid, Spain

262

1005

1765

75.84

Manchester, UK

281

899

8928

580.11

Montreal, Canada

291

342

412

11.52

Paris, France

293

1128

2680

230.78

Stockholm, Sweden

292

1063

4056

367.89

Non-US Total

2845

864

8928

326.71

US

Atlanta, Georgia

291

316

1017

63.48

Chicago, IL

290

170

253

7.02

Dallas 4, TX

292

191

3214

253.67

Dallas 5, TX

292

145

263

14.52

Dallas 6, TX

291

147

358

22.93

Denver, CO

291

71

272

14.63

Herndon, VA

293

316

487

15.43

Houston 3, TX

293

177

372

19.66

Las Vegas, NV

290

246

3194

392.02

Los Angeles, CA

291

303

1188

57.60

New York, NY

290

346

1120

123.55

San Francisco, CA

293

229

519

22.28

Seattle, WA

290

489

1078

170.33

Tampa, Florida

270

331

4105

247.99

Washington, DC

290

595

1511

235.84

US Total

4347

271

4105

208.20

Data Center Total

7192

506

8928

390.52

… and to really drive the point home for the PHB’s, we consolidate the
data and give a very telling graph:

Third elimination: CacheFly

CacheFly is an up-and-comer in the CDN arena. They have very
aggressive pricing, and have very good performance as well. If the site
in question was a popular blog or community website and was very price
sensitive, I would select CacheFly as my first choice CDN. However,
where they fall short is in reporting and their web interface. The best
way to contact their support department is via email or web-based form.
Their web interface left a huge amount to be desired, and they have very
little documentation on how to use it. There is no reporting whatsoever
– you get raw log files and have to write our own reporting scripts on
top of that data. I couldn’t help but wonder about all the “what ifs”.
What if we get an incorrect image cached and need to have it cleared
from their network? If we see a DDoS at the CDN, how do we know? These
and other similiar questions are what ultimately eliminated CacheFly.

In CacheFly’s defense, I was told that they were working on a complete
refactor of the user interface and was offered a chance to help beta it,
but I was under time constraints and declined. The issues I had with the
UI may or may not be present at the time of this writing.

The winner (for us): EdgeCast

It will appear when reading this post that I used the process of
elimination to find the “lesser of all evils”, but understand that’s
just the writing style I chose to convey the process. It wasn’t that
EdgeCast didn’t lose, it’s that they won. Here’s why:

EdgeCast is routinely in the top tier of CDN’s in terms of
performance.

Their support is very knowledgeable and responsive.

The sales reps care about your business and are willing to work with
you.

They offer the most features of any CDN I evaluated. One such
feature is “rollover” where if you don’t use all your allotted
transfer for one month, the remainder gets added to next months
allotment. This is perfect for a business with holiday traffic
spikes such as ours.

While they aren’t the cheapest CDN, they are certainly affordable,
and offered the best “bang for the buck” for the feature set we
needed.

Their UI is fully functional, offering configuration, reporting, and
analytics in an easy to use fashion. The UI includes a fully
functional rules engine (for additional charge) that allows you to
apply actions such as cache purge, header change, etc based upon
conditions like client IP, HTTP request header, etc.

Last but certainly not least, the company is one of only two
profitable CDN’s in the market today.

IT’S NOT THE DESTINATION, IT’S THE JOURNEY!!!

Please don’t read this article and walk away saying “Justin recommends
EdgeCast, that’s who we’re going with”. For one, if you’re letting my
blog posts make business decisions for you instead of doing due
diligence, then you’re doing it wrong.

For our very specific needs EdgeCast was the best fit. For your
needs, you will very possibly arrive at a completely different decision,
and that’s great. By all means, blog about it. What I’m trying to convey
is that there are a lot of points of comparison when going through your
evaluation, and not all of them are obvious. It’s hard to get an
objective point of view when doing this on your own – this is my best
attempt at documenting what I came across.

Hopefully if you haven’t implemented a CDN for your busy sites, this
post will motivate you to do so. If you’re unhappy with your current
CDN, perhaps this post has given you some insight on how to find a
replacement. If you’re happy with your current CDN, please leave
comments as to why.

Lastly, I was in no way influenced monetarily or otherwise by any
vendors, and none of the links in this article contain referral ID’s.
This is all my personal opinion and in no way represents the opinion of
my employers.

There’s a blog post to follow with when/why, etc., but without
further ado: I’m moving to a new position at Buckle, and that means
we need a new Lead SysAdmin. It’s a great job at a great company,
in a great place to raise a family (Kearney, NE). You get paid
well, get a good yearly budget for new toys, and equipment, and
it’s overall a very fun position. If interested, drop me a
line, and I’ll make sure your
resume gets the proper attention. To apply online, click
here,
and search for jobs within 5 miles of zip code 68845. The job
title is “Web Development - Lead Systems Administrator”. Here’s the
job posting: JOB DETAIL Job Title: ** Web Development - Lead
Systems Administrator**
Location: Buckle Corporate Office & Distribution Center 2407 W
24th Street KEARNEY, Nebraska 68845-0000
Job Description: ### Lead Systems Administrator **Position
Summary:** The Lead Systems Administrator will be responsible for
the deployment and maintenance of Unix/Linux systems and
application software in multiple environments. The ideal candidate
will possess a deep understanding of large scale Unix deployments
and will lead the team responsible for the infrastructure serving
all e-commerce and intranet applications. Additionally, this person
must be able to function effectively in a fast-paced environment
where projects range from maintenance to upgrades to new
deployments and technologies. Our Systems Administrators also serve
as Network Administrators for the smaller networks their systems
reside in, so strong knowledge of ethernet, TCP/IP, and network
security is required. **Responsibilities:** • Maintain all
servers and workstations on WSD team, including production,
development, and staging of servers for the e-commerce platform and
company intranet • Setup, maintain, and manage an enterprise-class
backup strategy for WSD team servers and workstations • Automate
tasks via custom scripting • Assist in architecting and designing
solid server solutions **Requirements:** • Expertise in setting
up robust and reliable server architectures. Additionally, a large
appetite for automating the mundane is preferred and will be
encouraged. ● In-depth knowledge of technologies that include but
are not limited to: Linux systems administration, Java VM tuning,
Weblogic Administration, Apache HTTPD/NGINX Administration, All
layers of TCP/IP, subnetting, etc., IPSEC VPN’s, ISC BIND, Load
balancing and clustering technologies, Shell scripting, Nagios
monitoring and RRD administration, RPM packaging format and
patching best practices • A bachelor’s degree in Computer Science
or other discipline • Minimum 4-5 years of previous
system-administration experience in a professional environment
**Compensation:** Market/negotiable, relocation assistance is a
possibility for the right candidate.

Sorry for the lack of posts as of late – a massive upgrade operation at
$DAYJOB has had me out of commission for a few weeks. Also, I’ve had the great
fortune to be able to be part of a migration to Drupal which exposed me to
migrate and friends. Yes, I said “great
fortune” in the same sentence with “migration” without using a negative -
that’s just how awesome this module is. My first impression when looking at
the documentation for migrate was that it didn’t seem complete. While it’s
true that the documentation could be better (what module couldn’t use better
documentation?), the problem is that no two migrations are alike. Because of
this, the best documentation is not going to be written by the module authors,
it will be written by the module users - they are the ones that come up with
the recipies to fill the cookbook. There are several good reasons why there
aren’t many recipes available:

Developers don’t like doing migrations. It can be painful, and often takes quite a bit of time.

Users don’t like migrations. They see a migration of data as something easily done, and they often get sticker shock when presented with estimates for a large migration.

Migration code is written in a flurry before the site is active. Right before launch, development crescendos, and then is often never used again (because no two migrations are the same).
This being my first migration, I vowed that I would document my experience,
because I learned so much from it. In this particular migration, we had to
migrate a huge XML file into about 2,200 nodes in 3 content types. Read on for
my contribution to the cookbook! First, some discussion on the general
workflow and some design decisions. Since I had to get XML into the database
before I could run the migrate, I wrote a command line script to do just that.
When you need to manipulate data between your source and destination (i.e.,
change all references to www.olddomain.com to www.newdomain.com), you usually
have to do this via the hooks that the migrate module provides. In my case,
there were a few cases where doing the data munging in the command line script
was much easier than doing it within the hooks. The problem with making
transformations within the command line script is that with every change, I
had to re-run the script. This wasn’t a big deal, as the XML to MySQL script
took around 15 seconds to complete. I also quickly discovered that if you have
less than 10 entities of one type (Story content type, user, etc), it’s
usually better to just hand-migrate them. The most straight-forward migration
will take 1 hour at a minimum to setup and test – if it will take less than
that to copy/paste, save your time and do it the less sexy way. Since we had
to transform the XML into MySQL tables, and there was a lot of data in the XML
that we didn’t need, I decided the best way to dynamically change what we
import and what we didn’t was by using hook_install() and Drupal’s DB schema
API. By naming the MySQL table columns the same as the XML attributes, we can
add and remove data to be transformed quite easily. Lastly, I need to re-
iterate that this was my first migration. What I describe here works for me,
but may very well not be the best way to do it. Also, I will not duplicate
what you can learn from the migrate module documentation, so make sure to read
that first. Let me know any suggestions you may have in the comments.

Install Module Dependencies

The first step is to install module dependencies. You’ll need
Views,
Schema, Table Wizard(tw). You’ll also want to install
Migrate, and Migrate Extras if you want to do any work
with CCK fields. I must admit that I hadn’t seen Table Wizard before this
project, but it will always be present in my dev installs from here out. If
you find yourself using SQLYog, PHPMyAdmin, or some other tool to simply look
at data in the database, be sure to check it out.

Create Our Custom Module

As I mentioned above, we are relying on the Drupal Schema API to make a lot of
this easy, so let’s make a custom module that sets up our schemas for us.
We’ll call this module my_import. Create a new directory in your modules
directory, and name it my_import. First, create my_import.info with this
inside:

When I created the schema, I took care to make sure
that the column names in my table exactly matched the attributes and elements
I was looking to pull out of the XML file. This saves a lot of coding later.
Any time we change the schema, you can create a hook_update_N() function, or
just change the schema and disable+uninstall+install the custom module. I did
the latter with a drush alias and it worked well. The hook_install() and
hook_uninstall() functions simply add and remove the tables.

Setup the Command Line Script to Import the XML into the DB

Create the file myimport.php in your module directory, and paste in the
following:

#!/usr/bin/php<?php// get the path to our XML file$args=getopt("f:");// Bootstrap Drupalrequire_once'./includes/bootstrap.inc';drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);// Make sure my_import is enabledif(!module_exists('my_import')){echo"I need the my_import module enabled! Exiting.\n";exit(1);}/* * Make sure our media directory exists. * We will import from this directory into whatever directory filefield is configured for * so we should remove this dir when done with the migration. */$media_dir=file_directory_path().'/migrated';echo"Media dir = $media_dir\n";if(!is_dir($media_dir)){mkdir($media_dir);}// Slurp in our XML file. If your XML file is huge, watch your PHP memory limits$xml=simplexml_load_file($args['f']);echo"XML Loaded.\n";$rowcount=0;// Here we iterate over each child of the root of the XML, which in our case is a Articleforeach($xml->children()as$content){// Setup our $obj object which represents a row in the DB, and use some caching to // not abuse drupal_get_schema().$obj=newstdClass;static$schema=array();// Dereference our child from the parent $xml, or xpath performance sucks hard$content=simplexml_load_string($content->asXML());$table=NULL;$content_type=NULL;switch((string)$content['type']){// Add more case statements for more content types as neededcase'Article':$table='clickability_articles';$content_type='article';break;// All cases below are silently ignored - we are not importing themcase'Book Reviews':case'Blog Topic':case'Event':case'Job':// Ignoredbreak;default:// Any content type not accounted for gets reportedecho"Warning: unknown content of type ".$content['type']."\n";}if(isset($table)){if(!isset($schema[$table])){// Get the table schema from Drupal$schema[$table]=drupal_get_schema($table);// On first run, truncate the table$sql="truncate table {$table}";db_query($sql);echo"$table truncated.\n";}// This function does the heavy lifting, creating the $obj object from the XML data$obj=xml2object($content,$schema[$table],$content['type']);// There are some cases where $obj is intentionally null, only write to the db if not nullif($obj){$ret=drupal_write_record($table,$obj);if($ret){$rowcount++;}}}}echo"Inserted $rowcount rows.\n";functionxml2object($xml,$tableschema,$content_type){global$media_dir;$obj=newstdClass;// Our main iterator is the column names in the tableforeach(array_keys($tableschema['fields'])as$field){switch($field){case'master_image_byline_title':// This field is populated when we work with the images later onbreak;case'id':$obj->$field=$xml[$field];break;case'status':$obj->$field=(string)$xml->$field;break;// A Clickability placement roughly corresponds to a Drupal termcase'placement':$element=array_pop($xml->xpath("//field[@name='$field']"));$obj->$field=(string)$element->row->value;$obj->$field=map_taxonomy($obj->$field,$content_type);break;case'author':$element=array_pop($xml->xpath("//field[@name='$field']"));$obj->$field=(string)$element->value;break;case'image2':case'image3':// Combine image2 and image3 elements in Clickability into our multivalue filefield as csvif($content_type=="Article"){$mediaplacement=array_pop($xml->xpath("//mediaPlacement[@name='$field']"));// migrate module requires full path to filefield source$obj->$field=getcwd().'/'.$media_dir.'/'.(string)$mediaplacement->media->path;if(substr($obj->$field,-1,1)=='/'){$obj->$field=NULL;}else{if(!empty($obj->image)){$obj->image.=",";}$obj->image.=$obj->$field;}}break;case'thumbnail':case'image':$mediaplacement=array_pop($xml->xpath("//mediaPlacement[@name='$field']"));// migrate module requires full path to filefield source$obj->$field=getcwd().'/'.$media_dir.'/'.(string)$mediaplacement->media->path;// Check the schema. If the field is required, then fill in a default, otherwise wipe it$required=$tableschema['fields'][$field]['not null'];// If the file path ends in a /, then the XML did not have an image for this article// -- if we require one, make a defaultif(substr($obj->$field,-1,1)=='/'){if($required){echo"$content_type with ID of ".$obj->id." does not have a $field. Adding test.gif.\n";$obj->$field.="test.gif";touch($obj->$field);}else{// NOTE: We need this patch for this to work: http://drupal.org/node/780920$obj->$field=NULL;}}else{// Transfer the caption on the image in the XML to the CCK byline accreditation$obj->master_image_byline_title=(string)$mediaplacement->caption;// See if the file exists on the filesystemif(!file_exists($obj->$field)){// Nope, let's fill it in with our default imageecho$obj->$field." does not exist, replacing with test.gif.\n";$obj->$field=preg_replace('#^(.*)/(.*)$#','\1/test.gif',$obj->$field);}// Replace .bmp with .jpg$jpg=preg_replace('/\.bmp$/','.jpg',$obj->$field);if($jpg!=$obj->$field){if(file_exists($jpg)){$obj->$field=$jpg;}else{// Tell the user what to do to create the image and exit.echo"ID ".$obj->id." has a image of type bmp, and no jpg found on the file system.\n";echo"Create them by running 'mogrify -format jpg /path/to/*.bmp' and re-run this script.\n";exit(1);}}}break;// Any DB column not explicity defined above maps cleanly with the code belowdefault:$obj->$field=(string)array_pop($xml->xpath("//field[@name='$field']"));break;}}// We assume it does not need imported until we prove otherwise$needs_imported=FALSE;$tags=array();$websitePlacements=array();foreach($xml->xpath("//websitePlacement")as$websitePlacement){// Only if the XML says the domain is www.newdomain.com do we need to importif($websitePlacement->domain=='www.newdomain.com'){$needs_imported=TRUE;// Convert the old "sections" into tag taxonomy$tags[]=substr($section,1,strlen($section));// Grab the old URLs from websitePlacement, and place them on an array$section=(string)$websitePlacement->section;$oldurl=$section.'/'.$obj->id.'.html';$websitePlacements[]=$oldurl;// If we do not have a placement yet, we try to set some form of taxonomyif(!isset($obj->placement)){$taxo=map_taxonomy($section,$content_type);// NOTE: We need this patch for this to work: http://drupal.org/node/780920$obj->placement=$taxo;}// If the XML did not explicity tell us the createDate, we use the start date from the webSitePlacementif(empty($obj->createDate)){$date=(string)$websitePlacement->startDate;$obj->createDate=substr($date,0,strlen($date)-4);$obj->editDate=$obj->createDate;}}}$obj->websitePlacements=implode(',',$websitePlacements);$obj->tags=implode(',',$tags);// Return the object only if we need it importedreturn$needs_imported?$obj:NULL;}functionmap_taxonomy($oldtext,$content_type){// Simple maps of Clickability placements to Drupal termsif($content_type=='Job'){returnNULL;}if(preg_match('/building/i',$oldtext)){return"Green Building";}if(preg_match('/(clean|energy)/i',$oldtext)){return"Clean Energy";}if(preg_match('/financ/i',$oldtext)){return"Finance";}if(preg_match('/food/i',$oldtext)){return"Food & Farms";}if(preg_match('/marketing/i',$oldtext)){return"Green Marketing";}if(preg_match('/recycled/i',$oldtext)){return"Recycled Markets";}if(preg_match('/technol/i',$oldtext)){return"Technology";}if(preg_match('/leaders/i',$oldtext)){return"Business Leaders";}if(preg_match('/transportation/i',$oldtext)){return"Transportation";}returnNULL;}?>

Lines 1-26: Nothing too fancy here. I should note that the script expects to be executed from your Drupal root directory. It grabs the path to the XML file from the command line and does some sanity checking.

Lines 28-30: Here we use PHP’s SimpleXML API to load the entire XML file into an object. If you have a huge XML file and/or small PHP memory limits, you will likely have to use XML Parser or another library. The power and convenience of SimpleXML is a pretty convincing argument to temporarily upping your memory limits in this case.

Lines 34-82: This is the main loop which iterates over each Article in the XML file. By looking at the content type in the XML record, we determine what table and content type to use for Drupal. The first time a schema is loaded, we truncate the table in the database. Once we determine some metadata about the record, we call xml2object() on line 72 which does most of the work for us. Once we have an object, we store it to the database.

Lines 84-222: Here we have the xml2object() function, and yes, it’s way too long and should be broken up. But hey, it’s migration code, who else will ever see it??? We’ll break it down more below.

Lines 89-182: This code runs a for loop around each column in the table. Since we’re using the Schema API here, we can safely assume that the column order specified in our install file will be duplicated when we fetch it in our script. For each column type in the table, it attempts to pull the data needed from the XML record, transform it if necessary, and store it in our $obj object. Read the code for details on what is happening to each field on the way in.

Lines 184-218: Now that we have iterated over all the fields in the schema, we can use the data stored in $obj to calculate other fields we need. Again, read the code for details, but here we are setting taxonomy terms, URLs for use with path_redirect, and filling in other data that may have been missing from the XML.

Lines 224-257: Is a simple example on how to statically map some data in the XML to return taxonomy terms in Drupal
Now that we’ve got that out of the way, let’s create our module file.

Create my_import.module

Now, create a file in your module directory named my_import.module. This file
will contain the actual module used by Drupal and will implement some of the
migrate modules hooks. You might ask, why not deal with everything in the
command line script? There are two primary reasons why:

You may come across a condition where you need the nid of the node (i.e. create path redirects), or otherwise interact with the $node object. You can only get this information from implementing migrate’s hooks.

While I personally found it easier to manipulate taxonomy terms via the command line script and then rely upon the out-of-the-box code supplied with migrate to setup the node for me, this has a drawback. Any time you change the command line script, you must “clear” your imported data, re-run the command line script, and re-import your data using the migrate module. If you make changes to your module, you only have two steps to test (clear and re-import).
Paste this code into my_import.module:

<?phpdefine(NUM_PARAGRAPHS_PER_PAGE,6);functionmy_import_migrate_prepare_user(&$user,$tblinfo,$row){// Randomly assign passwords to users, forcing them to reset their password$errors=array();$user['pass']=preg_replace("/([0-9])/e","chr((\\1+112))",rand(100000,999999));return$errors;}functionmy_import_migrate_prepare_node(&$node,$tblinfo,$row){$errors=array();// In Clickability, there were multiple states that represented "Published", here we map them.$status=$tblinfo->view_name.'_status';switch($row->$status){case'live':case'APPROVED':$node->status=1;break;default:$node->status=0;break;}if($node->type=='article'){// Paginate articles by inserting a pagebreak tag every 6th paragraph to emulate Clickability's pagination$paragraphs=preg_split('#<br />\s*<br />#s',$node->body);if(count($paragraphs)>NUM_PARAGRAPHS_PER_PAGE){$node->body='';$i=1;foreach($paragraphsas$paragraph){if(($i%NUM_PARAGRAPHS_PER_PAGE)==0){$node->body.=$paragraph."\n[pagebreak]\n";}else{$node->body.=$paragraph."<br />\n<br />\n";}$i++;}}}return$errors;}functionmy_import_migrate_complete_node(&$node,$tblinfo,$row){$errors=array();// Create redirects for old URLs$field=$tblinfo->view_name.'_websitePlacements';foreach(explode(',',$row->$field)as$oldurl){// Delete any old redirectsif(substr($oldurl,0,1)=='/'){$oldurl=substr($oldurl,1);}path_redirect_delete(array('source'=>$oldurl));$redirect=array('source'=>$oldurl,'redirect'=>'/node/'.$node->nid,'type'=>301,);path_redirect_save($redirect);}return$errors;}

Here’s the high-level breakdown, check the code+comments for
the details.

Lines 5-10: Just a quick example of how to set a random password on any user that is imported.

Lines 12-24: hook_migrate_prepare_node() is executed before the node has been saved to the database, and should be where the majority of your code is at. These lines set any article with a status of ‘live’ or ‘APPROVED’ to published in Drupal.

Lines 26-41: This code uses some regex magic to create a pagebreak every 6th paragraph. This is what Clickability did, and the client wanted to keep this on their migrated articles.

Lines 47-65: hook_migrate_complete_node() is called after the node has been saved to the database, and it has a nid at this point. The client wished to migrate their old URL’s to Drupal – in order to do that we must have the nid to know where to redirect to.

Create sample XML

Finally, let’s create some sample data so we can see how this all meshes
together. Create the file content.xml in your module directory, and paste this
in it:

Enable the my_import Module and Run the Command Line Script

Now (finally), it’s time for some action. Enable your newly created my_import
module, and jump out to the shell. Assuming your Drupal root is at
/var/www/drupal, cd into that directory. Create the new directory
sites/default/files/migrated/images, and place a jpg named 1.jpg in that
directory. Now run the import script:

With luck, the script will succeed, and you will have 1 row of data in your
clickability_articles table! If not, fix the error (if you’re using the sample
data, let me know what went wrong and I’ll fix it). Next up, Table Wizard
configuration.

Expose the Table to Table Wizard

All the hard work is done now - we can use a web UI from here on out. Visit
/admin/content/tw in your browser, and under the “Add Existing Tables”
fieldset, and select the tables you imported with myimport.php. If your tables
are huge (50K+ rows), you may want to select “Skip full analysis”. Click the
“Add tables” button. At this point, that’s all we need from Table Wizard, but
I strongly encourage you click around a bit. The table analysis can tell you
some handy things about your data.

Create the Migrate Content Set

In the previous step, we essentially built a view that we can provide to the
Migrate module. Now we need to tell Migrate how to use the view. Visit the
Migrate settings at /admin/content/migrate/settings. If you can, implement the
changes it recommends to .htaccess as it will speed up the import
considerably. Also, make sure to expand the “Migration support implemented in
the XYZ module” fieldsets and enable the support you need for your import.
Now, visit the dashboard at /admin/content/migrate. Expand the “Add a content
set” fieldset, and fill in the values. When choosing the value for “Source
view from which to import content”, scroll down towards the bottom of the
list. All Table Wizard views are prefixed with “tw: ”, so the one we’re
looking for here is “tw: clickability_issues (clickability_issues)”. You can
leave “View arguments” and “Weight” to defaults. The next screen is where the
real magic happens. By interrogating the view, Migrate presents you with a map
fields form that allows us to select our source column from a dropdown to
assign to various node elements. If you have a setting that should remain
constant across all imported records (“Node: Input format” is usually a good
example), you can type in a default value here. The rest should be fairly self
explanatory. Click “Submit changes”, and you’ll be taken back to the
dashboard.

Run the Import, Clear the Import, Wash, Rinse, Repeat

Now, the way I did my testing was to choose one row from the source table to
import. Grab its primary key and copy it to the clipboard. Check the box under
“Import” for our content set, then expand the “Execute” fieldset. Paste the ID
into the “Source IDs:” text field, and click the Run button. With any luck,
you will be returned to the dashboard, but the content set will show 1
imported. Hopefully there will be no errors, but if there are, find and fix
the problem. You can view the old primary key mapping to the node ID by going
back to /admin/content/tw and looking for a view named
migrate_map_si_articles. This table is created by the Migrate module – it
uses this table to track what has been imported, and what NID the imported
nodes have. Grab that nid, and load up /node/[nid]. If it looks good, then we
can to a bigger import. Go back to the Migrate dashboard, and this time click
the “Clear” checkbox next to the content set. Expand “Execute”, make sure all
fields are blank, and click the Run button. This process will “unimport” the
row we just imported. Now, depending on your row count, you may want to import
all rows and see what happens. Since I was dealing with thousands of nodes, I
did an import of just 100 nodes to make sure things were okay. To do this,
instead of specifying “Source IDs”, place the number 100 in “Sample Size”, and
click Run. To import everything, leave all fields blank. The power to quickly
and easily remove all changes made by the migrate module is huge. Because of
this “safety net”, it lets you work on the import within the same development
sandbox as your designers and themers. They’ll appreciate having something
other than “Lorem Ipsum” to look at!

Run to the Nearest Pub and Celebrate the Completion of Your Migration

If I have to explain this to you, you’re in the wrong field of work!

Summary

This post is my longest to date, and there’s a good chance I missed some
things. By all means, let me know in the comments if you find any holes and
I’ll get it corrected. I hope this case study helps some other Drupalers out
there - when I first started this project I couldn’t find any examples on how
to get XML into Drupal using the Migrate module. Now Google has some spider
food :)

I had the great pleasure of attending my first
DrupalCon this week. Held in downtown San
Francisco at the Moscone Center, it was my opinion that this was Drupal’s
“homecoming”. While Drupal wasn’t “born” in San Francisco, it seems to be the
city that has the strongest following. The attendance numbers didn’t lie - I’m
pretty sure they broke 3,000 geeks attendees. I made this trip solo
– I only knew three people that were going, and those three were only
acquaintances I’d met via email/IM a few months before. When I left, I didn’t
come home with “leads” or “contacts”, I came home with friends and role
models, many of whom I plan on staying in touch with. I met most of the
authors of the Drupal books I’ve read, associated faces to the podcasts and
RSS feeds I subscribe to, and I even had the opportunity to quickly say thanks
to Dries and shake his hand.

For those who didn’t know, archive.org has made the sessions available for
download,
so be sure to check those out. Read on for my “takeaways” from DCSF2010.

Please note that these are just what come to my mind, I’m sure I’m forgetting
huge topics. Please forgive me in advance for those!

Larry Garfield is my favorite presenter of the conference

Larry Garfield works for
Palantir.net, and is one of the few people that I’ve
listened to that is immensely intelligent, yet speak well and even make a
crowd genuinely laugh out loud. I attended his “Objectifying PHP” and “Views
for Developers” sessions, and left feeling motivated and enlightened. My
thanks go out to him, as he very obviously put a lot of preparation time into
his presentations.

Drupal is methodically (pun intended) implementing OO

As evidenced by Larry Garfield’s “Objectfying PHP” and John VanDyk’s “Batch vs
Queue” session, Drupal is refactoring portions of core into classes and
methods where it fits. I’m part of the camp that welcomes the change, and
can’t wait. I can’t help but wonder if we’ll alienate some contrib module
authors in the process, but I’m sure that it will bring the overall quality of
contrib modules up a few notches.

David Strauss knows what he’s talking about

I’ve been in IT/Networking/Programming/etc for about 20 years now. While I
don’t claim to be the smartest person in the group at any point in time, I
consider myself pretty well rounded. It’s been a long time since someone was
able to truly talk so far over my head that I couldn’t keep up, but David
Strauss of Four
Kitchens did just that at the Chapter Three
open house party. We discussed HipHop PHP, operating
systems, configuration management, and god knows what else. I had to look like
a deer in headlights!

HipHop PHP will eventually run Drupal

I can say this because David Strauss is the one working on it. Enough said.

Microsoft is playing it smart

Instead of trying to compete with Drupal, they’re finally trying to help
Drupal. I’m a hardcore anything-but-Microsoft OS kinda guy, but I can’t
dispute that there’s a lot of shops out there that already have well versed
SQL Server and IIS admins. Microsoft announced that they now have a native SQL
Server driver for PHP, and that Drupal can now run on MS SQL Server. This will
be a huge boon for getting Drupal into the Microsoft-centric enterprises -
there’s no longer a need to have a MySQL guy. Oh, and giving away free
alcohol
never hurt either :)

MongoDB will have a large impact on Drupal 7

Chx gave an excellent presentation - “MongoDB:
Humongous Drupal”. He covered a lot about SQL, and how over the years it’s
become “best practice” to de-normalize tables to improve performance. We’ve
all done that, but have you ever pondered that you’re breaking one of the
fundamental rules of relational databases when you’ve done that? While MongoDB
is perfectly suited for logging and caching in Drupal, the biggest win is with
Fields in Drupal 7. Each field you create results in a new table that must be
added to a JOIN when building a node. Shops with a lot of fields on their
nodes will likely see huge gains in performance by moving to MongoDB for those
tables.

Big Drupal is Big

Hey, did you hear that Drupal powers whitehouse.gov? Seriously, there’s been a
lot of progress in the past year with regards to making Drupal scale. Project
Mercury from the great folks
at Chapter Three makes Big Drupal easy, and is
now supported on Amazon’s EC2, Rackspace, and
Linodethanks to my
stackscript. There was a huge amount of interest in Mercury and how
it all works at the conference. The BOF session was great - unfortunately I
missed the sessions where it was discussed in more detail.

Chapter Three rocks

Two out of the three people I knew coming into DCSF currently work for Chapter
Three, and the third person used to work for them. Special thanks to Greg
Coit and Kevin
Montgomery for taking me under their
wing and introducing me to all their colleagues. I also had the pleasure to
meet Josh Koenig, albeit
briefly. Seems the partner/CTO of one of the leading Drupal shops is a little
busy at a DrupalCon. I ended up meeting a few other guys that I clicked really
well with and hope to keep tabs on: Jeff Graham of
FunnyMonkey, Rob Wohleb of
Xomba.com, and Aaron Levy of Chapter
Three - thanks for the beer and discussion!

Git will change Drupal.org

The migration to Git can’t happen fast enough for me. Aside from the ability
to commit code on a plane, contrib modules will benefit greatly. When all is
said and done, every new issue on drupal.org will have it’s own repository
that any user will be able to commit to. Once the issue is resolved, the fix
will be merged back into the main module repo. That should break down even
more barriers for new contrib authors getting into Drupal development.

Dries Buytaert is really tall

Yes, Dries is very tall - at least 6'4" if I had to guess, but this is
actually just a way for me to remind you that I shook Dries' hand :) I was
more than a little starstruck!

Overall, I had a blast, and can’t wait for the next DrupalCon in the states. I
heard it’s in Chicago – count me in! If you ever get the chance, I absolutely
recommend that you attend.

I would take personal time off from work to attend DrupalCon (and not regret it)

I would attend a Microsoft party
I would have told you where to shove it. Well:
For all those wondering, I went
to the party as a saboteur on a mission: to drink as much as I could to try
and directly impact their bottom line. Mission accomplished :) Good times @
DCSF2010!