Micro-Project - A HubSpot friend asked me for a script to parse XML sitemaps, so I built him a web app...

A good friend of mine, Craig Ellis, is a Sales Engineer in HubSpot. I recently built him a project that helped showcase the power of HubSpot’s API to potential customers, so when he noticed he and his team were spending quite a bit of time manually retrieving URLs from XML sitemaps, he came to me to see if I could help.

I love automation, and I love distractions even more. Naturally, I said I’d give it a go.

To Python or not to Python

My mind immediately went to Python, and within minutes I had a StackOverflow question and answer with a few quick lines of Python code that would take in a file and spit out the URL’s within it.

Ok cool. Craig is fairly technical, he can probably pop a terminal open and run the script manually. But what if his team isn’t as technical. Also, wouldn’t this be a little more impressive if it was web-based… Yes, the answer to that is yes. 🙂

Do I need node.js?

My mind shifted to node. Maybe I could spin up a simple server on Heroku that handled requests to site URL’s and processed the response. I’d already used different parsing libraries for Jobb.ie, so it wasn’t totally new to me. With XML being a staple language of the internet for decades, I was confident there’d be a suitable parsing option.

However, I didn’t particularly want to build a backend and frontend. This is an unpaid gig, so efficiency is key. Also, wouldn’t handling it all on the frontend be more impressive? Yes, yes it would. 🙂

Can this actually be *entirely *client-side??

I wasn’t sure. I mean, it’s just a request. We send an AJAX request to the appropriate URL, we get back the XML, and we parse out the URLs. It seemed simple, a little *too *simple. I was sceptical. A proof of concept was in order.

jsFiddle to the rescue

jsFiddle is magnificent, and the perfect place to experiment and prove or disprove my thought process. I could even potentially share it with Craig for real-time feedback. I added jQuery as a resource via CDN and got to work.

Blast! CORS thwarting my efforts. When will we learn that internet security just doesn’t work! I joke, always be secure in your online endeavours kids. But wait, maybe we can get around this, with some JSONP… nope! The browser won’t let us load non-https on a https site. This is for security reasons and although frustrating now, I’m glad it’s a thing. Good browser.

YO! Build me a server

Yeoman is a generator I use to quickly spin up the necessary files and folder structure for different types of projects. Anything from a chrome extension to an Express.js server in node (which is what I’m doing now). It can all be done via the command line with yeoman. It’s a pretty nifty piece of kit. I pop it open and run a few commands, and voila, a server.

For quick and dirty projects like this, I love using the Skeleton CSS boilerplate code, from getskeleton.com, a tiny, responsive CSS framework that looks really fresh and simple.

Wrangling with Cheerio.js

There was some…. back and forth, lets say. It’s been a couple of months since I worked in node, and about 8 months since I worked with Cheerio, so it took a little longer than it should have to wrestle it into working. But, eventually, I got it working veeery basically. But hey.