Latest Posts

Some advice for junior developers new on the job by Henrik on November 15th, 2017First of all, the below applies to a quite high stakes setting in a financially related company, a place were we don't move fast and break things if we want to continue being in business.

Setting up Ansible for MySQL by Henrik on September 14th, 2016In this how to we're going to manage in total 16 different LXC nodes on two different host machines.

A Function Browser for Emacs by Henrik on April 9th, 2016In my emacs init file for 2015 post I state:
However it would be kind of nice to be able to run a command to open a new buffer with links to line numbers for all definitions in the current file, shouldn’t be too hard to implement either, we’ll see if I manage in 2015.

Hacking Wordpress The Ugly And Quick Way by Henrik on August 4th, 2015Recently I've started a little project to see how I can do with IDN affiliate sites where there is a lot less competition than in English.

Functional HTML Rendering with PHP by Henrik on August 4th, 2015When you're working with a programming language that doesn't have templating per default and you're not in the mood - or don't see the need - for templating your first course of actions is to write something to obviate having to print and concatenate everything.

Finally we’re getting to the uploading part. Handling uploads (at least not big ones) is no biggie in Ruby on Rails. No most of the code in the listing below has to do with inserting the data we’ve uploaded.

As noted in the previous article we will upload a zip containing a bunch of html documents. We will use the document name as title for the inserted article, the document contents will of course become the article body and the article language will be chosen in the upload form.

With the help of the chosen language we will retrieve all blogs in that same language and portion out all articles evenly but randomly between the blogs, more on that below. We will also use a predefined set of words (tags and category words) to use, those too, will be retrieved based on the chosen language. The article bodies will be searched for these predefined words which will be counted and the ones who occur the most times will be selected to actually become the future tags and categories in the wordpress posts.

Update: Note that I’m using str.chars here and there, for instance when creating a new article with tidy_bytes, this is to make sure that I don’t have any issues with utf-8 characters.

Big method here, no doubt purists would cry “Shorten it damn you!”. I would, I really would, if parts of it were used elsewhere…

So this is the method that gets run after the form has been submitted.

1.) First of all note that we import find and fileutils, that might or might not be overkill but I’ve become accustomed to them after creating that file renamer thing.

2.) If we have a proper country code we proceed with assigning the contents of the zip_file parameter to zipf, we’re working with ActionController::UploadedFile and we use the original_filename method to get at the file.

3.) We then copy the file to /tmp/uploads and unpack it there after which we delete it, end of story for the upload there. We now have a lot of HTML documents in /tmp/uploads.

4.) We get all tags, categories and blogs associated with the selected country code by way of active record’s find.

5.) The file reading can begin by looping through /tmp/uploads, note that we begin with a check to see if cur_sites is empty. The logic here is that if we left the article to blog assignment completely up to chance we might end up with blog A getting 10 articles and blog B getting none. That won’t do so we assign one article to a random blog in cur_sites after which we delete that blog from the cur_sites array. When there are no blogs left in cur_sites we again populate it with all blogs, that will allow for an even (but random) spread.

6.) Next we strip the file name of the prefix and all underscores and hyphens to create the article title.

7.) Next we down case the whole article body and loop through it using get_cats_or_tags. Note how we keep track of how many occurrences we have of each string using a regex, don’t try the count string method. I did that mistake before reading up on it more closely, it won’t do what we want here. We finish off with getting rid of all words that didn’t exist in the content and making sure we don’t try and slice by a bigger length than the array we’re slicing.

8.) Finally we create a new article and save it + output some feedback to the SEO guy.