Scraping

I already did some research on the subject when I was playing around with my raspberry pi. There is a lot out there, especially for python.

But what about my favourite programming language Haxe?

Again this is a quick search! And this is what I found.

A (very?) old project from Jonas Malaco Filho on github. Check out this code : jonas-haxe and specificly the scraper part of it. Written for Neko, with primarily undocumented classes like neko.vm.Mutex Once you have the html page you can start getting the data from it!

Update #2

The htmlparser doesn’t work with the html code I am scraping. So I need to focus the parts I want to use. Regular expressions are the way to go, and I suck at them. Luckily I found a online tool that helps with testing the regex: http://www.regexr.com/ from an old flash hero gskinner.

Another thing I ran into, was the data from https sites. You need something “extra” to download html files from there: install hxssl via haxelib haxelib install hxssl and add it to your build.hxml -lib hxssl

Update #1

I am coding this with openfl/regular expressions, but perhaps a better way to-go is node.js! And you can use node.js with Haxe (perhaps not completely ready: hxnodejs but probably good enough for the examples below).

I love to create projects where you take stuff from the digital world (temporarily, intangible) and drag it into the “real” world. This project is a good example of that. I create a papertoy generated in code and cut by a machine.

Besides filling my blog with new content, I have two other reasons to write this post:

You can use Haxe/Openfl for something else then game-developement! I know I am not the only one, but this group of developers are not as present as the game-defs.

It’s a long and complicated process to get to the end result: it’s difficult to explain this in detail to others, so I wrote down the whole story for interested friend/family/colleagues/fans???

If I ever grow a pair, this post be one of the two talks I would give during wwx2015 just to balance the all tech talks during the event.
But nothing is growing besides my hair, so instead I will write about the process and end-result.

When I started programming for as3, I started to collect little snippets of code. Some to explain the transition from as2 to as3. And some just to have one place to go to when I forgot how it worked… I decided to do it also for Haxe and Openfl. For some reason I forget some stuff easily. This is one I need often and forget easily: