A Blog about PHP, Javascript, node, Spring, Startups and Stumbles

Menu

I just pulled my hair for a day or two about a ridiculously simple problem: we’re converting PDF files to a series of images using imagick. First, we initiated the conversion process as soon as a user had uploaded a file so the conversion ran within the preforked Apache child. That’s obviously not a good idea because those processes can run for quite a while and PHP will terminate sooner or later consuming quite a bit of our web server’s precious resources.

So we chose to decouple the processing from the upload action. I’m simplifying the outline quite a bit but it’s good enough to give you an idea what went wrong. We’re storing the original file in some location and have a job running that’s converting open processing requests. This solution worked fine on our dev boxes and our staging server that resembles our prod environment quite well. Without further ado we deployed the new solution to soon find out our solution was breaking.

when run on command line this works great. When run from cron it fails giving only one useless exception message: “Unable to read file …“. We were comparing pecl-imagick versions and supported file formats but everything seemed very legit, tried to fumble with ImageMagick’s resource settings, played with the command line tools when we finally noticed that our PATH on cron differs from our stage’s PATH.

Imagick seems to execute the Ghostscript binary for PDF conversion instead of using it as a library so Ghostscript had to be on our path which wasn’t the case on our live box’ cron. Adding PATH=$PATH:/usr/local/bin to our crontab has solved the issue.

Like this:

Today morning I opened my Chrome browser on Ubuntu 14.04 the way I do every morning. But all saved passwords were seemingly lost. Not that I could recover them from various lists, reminders and pure memory and logic but nevertheless I consider this the worst nightmare a usual computer user these days can be confronted with.

So I started googling the problem an within a blink I found this horrifying error report from 2012 that describes how Google Chrome on iOS syncs an empty profile over your good one. Since I opened Chrome on my iPhone after quite a long time of absence my first thought was, you can’t be serious and started to get angry.

My searches went on. Just for those of you that come along a similar problem and land here: make sure to read the last paragraph, first (like: now)! Here are some more obvious search hits:

And while I was reading all that stuff and crying and pulling my hair, I thought: hey! What if the sync is currently running and caught up while I tried to find an obfuscated solution for a problem that actually doesn’t exist? Went back to my Chrome profile settings, searched Passwords and… voila. While searching for a solution, Google has recovered all of them in the background, it just took a while.

Phew.

Teilen Sie dies mit:

Like this:

Quite a lot of friend developers that make their first baby steps with node.js keep asking me about the “right” way to share database resources respectively dependencies in general among controllers, business logic or service objects. Now, there’s actually one universal wisdom in the world of node.js: there is no “right” way of doing things (have a look at the controller samples from express.js ).

There are two good reasons why: First, most node.js frameworks including its core are very basic, offering modular and highly simplistic, fundamental but reusable code. They don’t make any assumptions about how you will use them, leaving it up to you to wire everything together as you want. Second, Javascript as a functional language offers many more options to shoot yourself in your foot than class oriented / C-like languages. What’s missing (and maybe even unwanted) in the node.js universe is a sophisticated framework like Spring, Rails or Symfony that offers best practices under the hood you could rely on.

In this article I’d like to tell my story of how I first tried to transfer my knowledge from these “sophisticated” frameworks (usually IoC containers) to express.js just to recognize the far simpler, nodejs-“native” approach to achieve the same results. The first three examples I’m presenting are just evolutions of ideas I had until I noticed that in node.js you can take much simpler ways – so be warned that some of the upcoming code might seem unnecessarily blown up: the resolution you might want to comment on can be found at the very end.

The old world

Connecting to some kind of server-side database layer is a pretty straight forward task in your favorite non-JS language. Have a look at some pseudo-code:

If you’re a real node.js greenhorn, let me quickly explain why this kind of code will never work on node. Since Javascript (really!) executes in one thread at a time lengthy operations like connecting or querying a database must not block the main execution loop – otherwise the application won’t be able to respond other incoming requests. Instead of waiting for the database to return a connection (like e.g. Java does), V8 will immediately execute the subsequent code. Regarding the example above con might not have been initialized yet when `con.query is executed. In Javascript you handle this kind of asynchronous events using callbacks, so in node.js the above example could be pseudo-coded as:

You immediately notice the main problem in functional code: some call it the “Javascript pyramid of death”. It’s built on subsequently registered callbacks. I won’t dig deep into solutions to that issue here (promises are currently the best solution and they’re widely adapted ) but I want you to have a look at the first line that connects to the database and serves as the root of our pyramid. On platforms like PHP or Java you would have a single place where you connect to your database. Then you would either wire that connection to clients that want to use it or ask some container to hand it over back to you (and initialize it if it wasn’t before). Let’s have a look at that pattern in a container managed environment (far from being exactly IoC, but you should get the idea):

The idea behind this pattern is: there’s some godlike mega-registry (the IoC-Container) knowing, configuring and instantiating all your dependencies. If you need something you either annotate your dependencies and let Mr Registry inject it at startup time or call Mr Registry and ask for a fully configured and initialized resource (service, bean, you name it).

IoC-like coding in express.js / node.js

Adapting this pattern in node.js leads to rather uncomfortable code. Lets start with an app.js to illustrate that (Again: beware that I wouldn’t recommend to use this kind code but you could definitely do so).

If you run this example and access GET /foo it’ll be called back with the error message “Error: SQLITE_ERROR: no such table: testing“. Notice that initializeDb has been called by getDb but while it tried to execute the initial CREATE statement, the single thread already went on and executed the index action. Since the initializeDb callback has not been handled yet (it’s going to be handled right after the index action has finished) the SELECT statement cannot find the table yet.

To remedy that situation, we could use callbacks in our client code, like this:

That way we’re littering the container’s getDb interface with a callback parameter. Instead, people choose to use the so called promise pattern at this point. There are some libraries out there that get the job done; Q is one of the most powerful of them and besides many other features it offers a deferred interface that deals with that problem.

controller.js

Starting the app after container setup has finished

That looks only little better but definitely not really usable yet. So lets take a last attempt to fix things up. Let’s tell the container to initialize everything and start the application once the basic initialization has finished.

This looks like a good start for configuring action controllers with container based dependencies on a very low level. This concept can be extended to have a configurable container, to load a dependency tree and prepare singleton services, even to lazy load services using proxy objects. Et voila: welcome back to the good old Spring / Symfony world. You can write your code that way, it’s going to work pretty much as expected (I worked that way for quite some time)

The “magic” that might seem unusual to mature developers coming from class oriented environments lies in the “module” concept that’s one of node’s cornerstones. A module is the encapsulation of state and behavior, it can be used in a service-like fashion like I did in the last example. Notice, that I’m requiring DB.js in app.js without even using it. That way node.js executes the code inside once and keeps the reference in module.exports – the database therefore is prepared when a controller is using it. Well, not exactly: if the preparation / setup of resources takes really long, the application is up before initialization has finished (try wrapping the database in a timeout, I left the code as comment in the repo). But what’s more important: taking this approach you don’t have to take much care about your dependencies but rather can access them from any module you want simply by requiring them. The db variable in DB.js always refers to the very same instance.

tl/dr Lessons learnt: node.js and expressjs don’t propose a structure for managing dependencies and wiring them up with your clients. While you could write your code in a rather classical way, it’s mostly much simpler to use node’s builtin concepts.

PS. I just found this stackoverflow question that’s underlining the concept I tried to explain here in short. Good, that I’m not alone with my opinion😉

Teilen Sie dies mit:

Like this:

Spoiler Alert. I mean: if you really dare to go that movie, do it; but be warned: it’s worse I could ever have imagined.

I bought Max Brooks’ book World War Z one day after seeing the first trailer on iTunes. I had it on my wishlist anyway and the story looked promising so I spent the 10€ for an original version (as you might notice, my English is far from native, as you might’ve guessed, I’m German). I read it within 7 days and my opinion was better than the average: a Zombie novel but with a great twist. Not the usual pulp story you could’ve expected.

World War Z is a story about a world wide catastrophy, told in short stories, as seen by individuals, assembled chronologically, spanning a time frame of around 10 years. It’s told from an outsider’s perspective. There are some classical horror elements in the novel, there are some of the old fashioned Zombie cliches but what makes the book outstanding are many fine tuned innovative ideas of the genre. To give you some ideas:

Zombies freeze during winter and thew in spring. That means, winter is a good time to hunt for their heads. Zombies die when their heads are ripped off

Survivors build swimming islands out of rogue ships and all vessels that can swim. One of the book’s major boiling points is set on one of those islands.

The “battle of Yonker”, a huge military confrontation in the beginning of the Zombie war, is a repeated motive in the book. Many individuals refer to it as it is the turning point of human hope: after the army got overrun they burn the place down with a thermobare weapon. Afterwards humanity has given up themselves. The book mainly evolves around the aftermath of the Yonker confrontation: if people stand together they find a way to find their way out. As it turns out the solution is simpler and much more obvious than in any Zombie story ever: go from house to house, rip Zacks head off and proceed to the next.

What Mr Pitt and his bunch of nobrainer producers / storywriters / directors have made out of the original story is a ridiculous attempt to resemble that story in a Hollywood movie. I mean, I’ve seen many bad and even worse movies over the last decade. And don’t get me wrong. Brad Pitt is an outstanding actor as he made clear in Fight Club and Snatch. If World War Z would just be a bad movie, I’d be okay with it. But it’s worse. In the beginning 4 studios present their logo: a great indicator that there have been lots of interests involved. Then the story starts right into the spreading in Philadelphia. From one second to another Gerry (the name the studio gave the main character that stays completely unmentioned in the written story) is confronted with hordes of rogue Zombies. So far so good, I thought – it’s the way the movie could’ve been predictable.

What follows are two hours of a hide and seek ongoing slaughter mess that takes some of the fighting places from the novel (esp. Jerusalem) and mixes them with a few pictures borrowed by the original (some marine ships forming the military headquarter of the world’s last defense line). Gerry got ripped from his family – and Pitt makes clear that he’s the awesome family father he is in real live (I hope he is and I want to believe so). Hug your daughter, hug her again, tell my family I love her, calls his darling every day.

C’mon. This is a Zombie story, the world’s coming to an end, the original being written in a documentary style, just reporting. We’ve seen love and lose love stories over the past 60 decades and I can’t tell you how fed up I’m with the cliche of a neat American 4-heads family. We know that America an Mr Pitt wants to tell us that the family is the middle of your life. What’s worse is, you simply won’t buy it from these actors. And actually Gerry’s wife heats up the situation by don’t wanting her own husband to save the WORLD but to protect their two little “babies”, one badly performing an asthma attack. And for the tale to be told the family background is absolutely unnecessary it just had to be scripted to follow along the typical story line of a movie made and produced in bloody California! Just to remind you: the character of Gerry is a bare addition to the original material – there is absolutely no hint of his familiarly bonds and therefore no believable background can be seen behind Pitt’s acting. I simply couldn’t care about any of the daughters: if you’d cut them out, the story wouldn’t have lost anything!

From minute 30 onwards The movie gets worse every second. Jerusalem gets overrun because a pile of Zombie overfloods the 20m high protecting walls after hearing a loud noise from the inside. Check, done the effects rotuine, spent the money on CGI, camera runs like in Black Hawk Down (“Why are they burning tyres?”). Before that: Pitt visits South Korea, got caught up by a rogue military squad that refills his “personal” plane (I didn’t get what’s so special about that Gerry character that he’s sent alone with a couple of shitheads in uniform plus the youngest viral professor around the world to find the roots of all evil when reports just said that the president and all administration is dead, Washington’s lost) with gasoline, after the last hope for humanity – a 23 year old medical specialist for viruses shoots himself in the head – got lost (no word of that situation in the original of course, event though it’s one of the few memorable moments in the movie). Pitt seems to get a hint of a solution in the Jerusalem scene (o gosh, no, please don’t give a “magic” solution, sinking deeper into my seat) but then he has to crash the Belorussian plane he got on last minute after a flight attendant unleashes a single Zombie from the plane’s food elevator with an Israeli hand-grenade, right after tearing off his female (nominated as worst actress in a no talking role) sidekicks arm with something that looks like a dagger from the Hitlerjugend (sorry for that, that was tasteless, I’ll delete it if you urge me to).

At that point another humilating fact shows up very clear: The PG-13 rating (in Germany it got an FSK16, I still wonder why not 12). There’s nearly no blood spilled in the movie. There are lots of corpses flying through the air, there are great makeup effects and there are shootouts but besides some drops of a scatter wound on Pitt’s sleeve no blood at all. While I like that fact pretty much (gore nowadays leads to clear B-movie ratings and I’m absolutely fed up with it) it feels unrealistic in the scenes where it could’ve been helpful.

The “grand finale” ends up fulfilling the Zombie genre where Max Brooks ingeniously tried to avoid tapping into the trap of the unavaoidable “healing” theory of a Zombie “virus”: if you infect yourself with another deadly virus the Zombies will simply not be able to “see” you anymore. Guys, come on. This is even greater bullshit than any badass conspiracy theory about an super-powered umbrella corporation arising from “milestones” of B-movies called Resident Evil. Max Brooks’ Zombie contagion is meant to un-heal-able, spelt as in un-avoid-able. There is no remedy besides (I did quote that already): going from house to house and rip Zacks’ heads off (in the book some guerrilla platoon invents the “Lobotomizer“, a motive that the movie tries to pick up when Pitt arms himself with a gun, a knife and a newspaper). No need to fall back to that old explanation pattern.

The reason why I’m writing all this is not so much about the disappointing 110 minutes starting from minute 10 where all begins to drift off but mainly the last two. Pitt returns to his family that found shelter in “Nova Scotia”. If you read the book carefully you know: this place would’ve been badly devastated. Not by Zack – Zombies might freeze in that area – but by the people living there, cooking and feeding their children with American Stew prepared out of their grandparents’ remainders.

Instead the last minutes Brad “Gerry” Pitt talks about “hope” and about “fighting” and, worst of it all: about “the war goes on“. If that means, that you’re planning to create a trilogy out of the bullshit your high paid script writers made out of the novel, be sure: you will not be able to lure any additional penny out of my pocket for going to one of your movies again! What follows is a flash-cut overview of fighting scenes that could’ve been shown in the movie anyway but didn’t. The producers obviously didn’t want to put them in there in hope they could reuse them in Word War Z Part II and III. Hopefully there isn’t a “Hobbit” tale afterwards. Sigh. It’s only about the money then, isn’t it?

BTW: do WHO clinics in Wales have chargers for satellite telephones made in America in stock?

So, today I paid €22,70 + 3€ for parking fees for the shit that you made me wait for nearly half a year and made me write these words of hate (and believe me: if my English would be better you would understand my point far better). Here are 5+ options how to spend that money in a better way:

Teilen Sie dies mit:

Like this:

I’m using service objects in node.js that are responsible for database operations on business entities and also perform some kind of low level business logic if needed. Recently I was refactoring my code and came up with this pattern which I currently consider a “best practice” of doing things.

Service Objects that should perform asynchronous actions on remote services like querying a database must get their resources at some point. Naively you could instantiate each service every time you need it, provide them a (fresh) link to your database (that you might want to store globally or in an application instance that you’re handing around). Now, in Javascript, respectively in a node.js / CommonJS environment there’s a better way of doing that: the module. It is not too obvious for developers coming from a Java-like background that those modules can (but don’t have to) be used for instantiating “singleton” services and can deal as single activation points to set your service objects up with their resources. So here’s an example (please note that I’m omitting some “real world” db logic, the mongo connection is there only for illustration):

Your “service module”, responsible of getting a user from a database (“UserService.js”)

Notice that you’re initializing (“connecting” in this case) the single instance of UserService in your main module. That means that it is ready to go in any other module where you would like to use it. That’s a good solution for immutable service instances that don’t depend on any state but only on some resources like database connections or global settings.

In the rare case where you’d like to have another user service you could export the constructor from your service module as well (in UserService.js)

There’s one little twist that I found when playing around that might be helpful when you try this “pattern” on your own. You might be tempted to omit the service’s name in the module.export like so (UserService.js):

....
//don't do that
module.exports = new UserService();

because then you could (yetAnotherModule.js):

var userService = require('UserService');
userService.getUser(...)

That code is definitely working. But note, that you a) cannot export anything else (e.g. the constructor) now and b) your IDE might not be able to resolve the methods (getUser) of that instance (e.g. IntelliJ WebStorm cannot).

Teilen Sie dies mit:

Like this:

…and how we achieved it

This was a long weekend – again. Usually it takes working through the night to take the grand prize home at a Hackathon – we made it with getting some sleep at home, luckily.

But it was still quite a lot of work to do. If you’re not familiar with the concept of a Hackathon or if you want to contribute or tell others about the experience you had on EyeEm’s PhotoHackDay in Berlin feel free to use our storytelling tool Qurate:

The idea

Personally I went to the Photo Hack Day with an idea in mind that I registered on Hackersleague as “Tarantimgo”. I wanted to write an open service that offers a crowdsourced idea of what sounds and songs could match a set of pictures so photo platforms could commonly use it to offer ambient sound to their viewers (maybe a good topic for Battle Hack? Let me know if you need it😉 ).

Then I met some good old acquaintances: Stefan, Bora, Rob, Gabi and we fell into a discussion about what to hack this weekend when Stefan Hoth (who’s a rather prominent face in Berlin and works as community and technology advocate for Google Services) mentioned a guy “running around on site and desperately looking for developers”. Since Robert, me and Gabriele have taken home an award on the last Hackathon for a “project with a promising business adaption” last time we said, ok, go get him.

Turns out, this has been Albert. He’s from Armenia – quite an exotic place somewhere between Russia, Turkey and Iran -, works in London and currently lives in Berlin. What he pitched to us was:

You find many nice images of nice places on EyeEm. If you find a professional shot of a place you can adore it, favorite it, like it but wouldn’t it be nice if you could repeat it with a personal touch? Let’s write an app that displays information about how to take that shot by yourself; including EXIF data, time and season of shooting and the right place.

We thought a little (and not everyone was convinced at first) about it. Robert and me (developers) have worked together three times on Hackathons already and I knew that Gabriele could give a helping hand when it comes to Bootstrap so we finally gave in and took the job and started the project Photoration.

Bootstrapping

We came up with some scribbles of the idea that we could agree on. What we wanted to build for the hackathon was an application that

shows you which “sights” are nearby

shows you “professional” photos taken from at those sites (taken from EyeEm)

shows the position of the shooting point (the image’s GPS coordinates of the image actually) and the position of the sightseeing spot / monument on one map

shows additional meta data for the image (ideally EXIF data)

Albert’s job was to create “final” design scribbles of these ideas. Here are his results:

Development

We’re working with node.js and a web frontend. That’s actually a preset that I’m enforcing because I’m doing it all day and many people can adapt to it pretty fast. I set up a fresh Heroku app, added a remote to github, got Robert in within minutes and off we went. I was concentrating on the structuring and prerequisites, Robert’s task was to create a suitable backend.

We quickly found that the EyeEm API doesn’t allow searching for sightseeing places. Of course you can search for albums that match a certain Foursquare location. You can also search for venues nearby a location ( endpoint ) but that was not matching our criteria.

We decided to get those special locations by the Foursquare API which doesn’t require an user token to do that. Robert manually picked the Foursquare venue category ids that we could reuse in the Eyeem API calls later:

loads foursquare venues of those category IDs nearby a location the client provides

for each venue found we’re firing a query (yes, that’s a lot of simultaneous queries) to EyeEm’s /albums endpoint. You can provide a venue’s foursquare id to get matching photo albums.

for each album found we’re firing another query to get its best rated photos because we’re only interested in the “highlights” that people might find worth to be copied

That’s nearly everything our backend does. It would be helpful if the API understood arrays of IDs to reduce the amount of requests that we have to send.

Frontend

I don’t want to get into too much details here why I decided so but I’m using Backbone.js and a single page app approach for the frontend. I simply love this framework for its unobtrusiveness and I know that it can be used for mobile applications very well as long as you know what you’re doing. I also decided for Bootstrap 2.3.2 since it comes with responsive features. I created a Backbone router that dynamically creates its routes and points to handler methods in two Backbone.Views that I consider separate frontend classes. Using that approach I could delegate some frontend work to Robert ( he implemented parts of the single image view ) when I was getting tired around noon the second day. In the meanwhile Gabriele updated the bootstrap CSS with custom fonts and colors.

If you want to read through the code go ahead but please don’t blame me for the many bad styles and ignorances (you don’t put API keys on a public repo, do you? No you don’t !)

Results

We came up with a fully functional but (you might’ve guessed) rather cheezy web application that runs on mobile clients. You can check it out yourself by pointing your phone to http://photoration.herokuapp.com (give the dyno some time to wake up and wait for the many API calls to return).

After the initial loading has finished you can browse through EyeEm albums of nearby sightseeing spots. If you found a photo that looks gorgeous, tap it to get some details and to see a combined map of the spot and the photo.

Thank you

First, I’d like to give my warmest Thank you to my team: Robert, Albert, Gabriel. We 4 know that we couldn’t have made this possible without – us 4🙂 Thanks a lot to the EyeEm team: you do an incredible job and you did an amazing one on that Hackday. The spirit you get at that location is far better than on any other Hackathon I ever attended. Thanks a lot to Github, Nodejs, Backbone and Bootstrap for being there and make bootstrapping applications on a weekend dead simple (if you know what you do).

Why did we win that event? A question you have to ask the jury. But personally I have the feeling that this app (even though technically there’s absolutely no rocket science involved) really does (or tries to do) something valueable: it wants to help you to take better pictures. An iPad application that automatically takes pictures of a cat or a christmas tree that blinks in certain images’ lights are funny but they don’t bring much value to the community (but rather to world, especially for cat lovers;) )

Where to go from here?

There are lots of improvements due. Massive caching of results would be a brilliant idea. A swipe-capable frontend (as scribbled by Albert) would be real nice. A photo overlay would be even nicer: lets show the chosen picture transparently over your camera preview. A selection of really good pictures would be very helpful (plus a connection to professional photo services like 500px, fotolia e.a.). Maybe this could be achieved by rating pictures in terms of “pro shot” instead of “nice one”.

If you think that this stuff has potential: feel free to connect our team members. You find us here:

Teilen Sie dies mit:

Like this:

This weekend I attended the AngelHack Berlin that took place at the ImmobilienScout campus and has been part of a worldwide event series leading to the “crown” of hackers.

(Promotion) Personally I’m absolutely not into the “location” world but rather work on a project for social content curation, called Qurate that’s pretty close to the Value Proposition of the winning team (Edgar tells…) that won the 2w ride to San Francisco Silicon Valley. Thanks to Qurate you can relive the event again, based on Tweets and photos taken by the crowd. Here’s my personal “story” in pictures. If you want to build your own, feel free to do so by using our media table on https://www.qurate.de/angelhackber . I’ll ask the YouIsNow team to contribute their photos as well😉

This time I wanted to do something visual, something that I can show off and something that does good for those who need. Two weeks ago went the news that the “Verkehrsverbund Berlin Brandenburg” (VBB, hence the project name😉 ) has opened up its data (article on Golem) so just before I went to sleep on May, 3rd I had the idea to integrate that data somehow. Here’s what we came up with: http://veebibi.herokuapp.com

The story behind Veebibi

I’ve been born in Berlin and yet a child I often found asking myself, “Where is this bus going?” (I mean the route, not the target. Obvious). Most of us that live in the inner circle of Berlin are using the Metro or the S-Bahn for transit. It’s rather comfortable and – at least the metro – is mostly in time and runs very frequently, at least on daytime. But I found, that I had no clue where to find a good bus plan overview (obvious solution: look at the wall map, I know, I mean: on my smartphone). Actually that wouldn’t have solved my problem anyway: as said, “I want to know which route this bus is taking!“. Another example: once in January (it was very cold) I waited at the main station for the S-Bahn to come; suddenly the speaker said: “unfortunately your train will be delayed for an unknown time“. I only had to go 3 stations to Hackescher Markt and took the U55 to Brandenburger Tor. From there I had a walk to work and I nearly froze an ear and two toes off. I could’ve taken some bus for sure (the TXL maybe?) but finding out with a smartphone is not as easy as it might sound, especially not if you only have O2 at hand. You usually visit http://www.fahrinfo-berlin.de/, say where you are, where you want to go and pick a line from the search results. Google Maps isn’t showing VBB transit lines (except the trains) at all.

If that kind of app exists already or not wasn’t important for us anyway. We thought: Lets honour the effort of Berlin Brandenburg to finally open up their data set and utilize that data to draw transit lines on a Google map. Actually there is a little political background behind the data, too: the VBB never would’ve opened up without the pressure of the project OpenPlanB and such. Apps like Öffi (everyone loves you for that one, Andreas!!) had to use unofficial data sets to get transit information and as I heard the Deutsche Bahn is far from happy that suddenly thousands of hackers can write apps for their unsatisfied customers (some details at the end of the Golem article mentioned above).

How we did it

We fetched the open data set from the official web site. It happens to be in a standard format called GTFS and is globally adapted by public transit authorities. Now, we’re hackers, we wanted to learn and didn’t give a shoot on what the specification reads so we tried to import everything on our own. The data delivered by the VBB is splitted into 8 CSV files that make up a relational data structure. Relational? Come on, it’s 2013, NoSQL is the big buzz (NewSQL is, but that’s another topic) so we wanted to have the stuff in MongoDB. Don’t shake your head before you’ve seen the results🙂

Our team member Robert tried to import the data into Mongo directly but as you can imagine: that was a desaster. Lesson One (for the beginners): don’t import relational data into an objective database! You can’t join them anyway. So I suggested to write a “workaround”: first import the data into a relational system (MySQL is always a good decision), transform them into a document-like representation and import that stuff into Mongo. At that point Robert decided to skip the time table data from the set because that would’ve blown the overall result up to millions of rows (it’s basically a cross join of 7 tables, so we reduced to 5). He exported the result set to CSV and it looks like this:

The first id marks the stop, the next the position of that stop in the route. Skipping some cols we find the target of the line, the line name (“Bus TXL”), the company’s name that’s responsible for it (BVG), the stop’s name and its geocoordinates. Next we needed to transform those lines into JSON documents that fit into MongoDB. With one eye open and one half of his brain already shut down at 2am Robert hacked a PHP script that did the job pretty well. I don’t know how he made his way home alive after that (he went by bike) but I’m glad he made it! I spent one hour to fix the bugs he left over and came up with JSON data that’s compatible with mongoimport. Here’s an example document:

It has been already 5am but from now on everything went straight. I imported the JSON into a Mongo instance hosted at MongoLabs (mongoimport -d mongo -c veebibi converted.json), an addon you can get from Heroku and put an index on the stations’ loc fields:

yields an array of up to 100 lines including all their stops. The perfect foundation for Veebibi since it’s exactly what we want. Since routes are stored more than once (a bus line might fork depending on time of the day and goes in both directions) I “consolidated” the response data by picking the line with the most stops.

The frontend is a piece of cake since everything’s just JSON. We let your browser acquire your current position (navigator.geolocation.getCurrentPosition), send it to our backend and transform the result coordinates into Google Maps polylines, one for each returned line:

And that’s what you see when you click on a map on Veebibi. Interested readers will notice the usage of underscore iterators and the Google Maps V3 API.

While I was hacking the core of all that stuff our team member Gabriel (I never remembered his name on location, now I can) spent some hours on writing most of the “frontend” you see when visiting the page for the first time. He used Twitter’s Backbone.js for many elements and tried to make everything normalized and responsive. Here some learnings he had when coworking with me:

1. you should not do git push origin master if it’s not working well. Instead push a branch that the maintainer can merge. The “real way” is actually: fork the project, push to your fork’s master and create a Pull Request for the maintainer on the root project.

3. The Javascript mongodb-native driver doesn’t compile on Windows. At least not at 3am.

4. You should configure your git to not ask for username and password every time. If you reject that advise, be sure that you don’t accidentally push a new publicly visible branch when doing: git push origin my-branchgabi@somedomain.com-pA22w0rD . It’s very easy to forget pressing enter on 6am with no sleep.

Our fourth colleague Alexander did research on the Google Maps API in the meanwhile, unfortunately the results he came up with didn’t make it into the final code but he found that’s pretty simple to make polylines follow actual streets. If you have a look at the veebibi output you’ll notice that bus routes are assembled out of straight lines. Usually buses don’t go right accross the Tiergarten lawn so this obviously can be improved. He sent me this GIST around midnight. It describes how you can utilize the waypoints option to let Google Maps render a correct route across streets. For buses that might not be 100% exact but totally sufficient to render a nice view.

Our fifth colleague Zachary who went a long way from Ohio to join us on AngelHack BER (just kidding, he’s in the city for studies) was taking care of an idea that Gabriel came up with: while the bus lines are pretty uninteresting at first glance, why not pepper the view up with a heatmap that’s rendered on a Twitter search result for popular / trending hashtags (e.g. “#party” or “#bbq”) so you know where to go once you realized how to get there – we actually call that the “party mode” component of Veebibi: buy a beer, get on a bus and head to a party. In Berlin that can be really fun:)

We never integrated that stuff unfortunately but Zachary did an amazing job analysing the FusionTable concept in Google that can be used to generate datasources for maps overlays with a huge amount of location data. In our case we could simply have used the standard way of doing things (for a limited set of data the Google Maps API alone is sufficient to render heatmaps).

The Pitch

I was the lucky one to pitch that project on stage, using a presentation assembled by Alexander and I made clear on the first second that this wasn’t going to be the next “We have a brilliant business concept and here is how to make money with it” pitch ). It’s simply a product of some productive minds that used a day and a night to hack the shoot out of their brains. The audience cheered when they saw that you can actually travel from Berlin to Stralsund just using public transit lines so I’m glad we achieved our goal: we made something to make people cheer!

[tweet https://twitter.com/picsoung/status/331035922592301057 ]

Thank You All!

So I can only finish this article with an especially grateful “Thank You!” to Alexander from Westech Ventures who honored our team’s effort with a special price for “an idea that could possibly grow to a business”. The core idea to utilize GTFS data to build up a global transit information system is definitely not unique but would lead to a possible B2B-approach that could work world wide. The way we utilized the date is far from an industrial state but we showed that it’s absolutely possible. So we got away with 4 Chinese Android tablets; imho more than we could’ve expected.

I’d like to thank Robert, Gabriel, Alexander (and your girlfriend: thanks for the logo😉 ) and Zachary for making this possible! Not to forget the orga team of ImmobilienScout24 / You Is Now that offered a brilliant location for the Hackathon and done a great job to feed us over the time (I won’t have donuts for the next couple of months!).

PS: don’t forget to visit Qurate and contribute your impressions. And tell your own story if you want to🙂