Recently our teammate and Bitbucket engineer Erik Van Zijst had the opportunity to present at Euro Python 2014 in Berlin. Check out this video of his session on the Inner Guts of Bitbucket and get a detailed overview of our current architecture at all layers from Gunicorn and Django to Celery and HAProxy to NFS.

In addition to the inside scoop into Bitbucket’s inner workings, this video covers some war stories and shows how we too have to learn things the hard way sometimes.

“Sometimes we too have to learn the hard way”…? What you mean even brilliant, world-class and modest geniuses also have to learn the hard way?!

Radek

I think he means you learn from your mistakes

http://www.versioneye.com/ Robert Reiz

Thanks for the presentation. Very interesting to see how Bitbucket works. What’s the reason for running real hardware instead of virtual machines?

Anonymous

The only reason I see behind that is they omit the virtualization overhead to make things fastest. But that is just as far as I can see.

http://www.versioneye.com/ Robert Reiz

I’m aware of the virtualisation overhead. But on the other side virtualisation makes the management of the infrastructure much easier.

Anonymous

There are times when you have to chose one over another, “right tool for the job”.That’s what I tried to suggest with my thought.

http://www.versioneye.com/ Robert Reiz

That’s right! And since I know Docker I have another point of view. You can run Docker on bare metal. That’s the way in the middle. It’s dynamic and still faster than regular virtualisation.

Erik van Zijst

Bitbucket was founded in 2008 on EC2 and EBS. The switch to bare metal was made in 2010 when it joined Atlassian.

At Atlassian we managed (and continue to manage) a lot of our own hardware and moving it on there (without virtualization) gave us a huge performance boost over EC2.

Back then we only ran 4 machines, so managing things was a lot easier.

These events all predate Docker, which we now use throughout Atlassian for all kinds of things and I could see Bitbucket going virtualized again at some point in the future.

http://www.versioneye.com/ Robert Reiz

Thanks for the insights. I started VersionEye in 2012 on Heroku and after a couple months already everything slowed down and it become really expensive. 2013 I moved it to bare metal for the half of the price and the performance improved dramatically. Today (2015) it’s running on EC2 because of the Amazon Activate Programm. At a certain size it makes sense to own your hardware.

Jardel Weyrich

Great presentation Erik! I like the simplicity of your architecture

Your cache solution to minimize the impact of bcrypt-ing passwords all the time seems reasonable, although, IMHO that does not seem to tackle the underlying problem – The password shouldn’t be used to authenticate multiple times in a short period, mainly if it’s an expensive operation. You partially solve that with API rate-limiting (which I think you already have) – a decorator backed by NoSQL would do. Sites rely on cookies & sessions, and REST APIs can too. But there’s another (similar) approach: Once an account is authenticated, you could hand the client a time-limited authentication token based on HMAC. From this point, the client can rely solely on this token to perform any operation that requires authentication, until it expires. You may implement a renewal process as well. Anyway, if a server gets compromised, the intruder can always modify the authentication API to log/save plaintext passwords. Can’t fight that.

Erik van Zijst

> Once an account is authenticated, you could hand the client a time-limited authentication token based on HMAC.

For sure!

We’re actually rather behind the times with our reliance on basic auth for things like the API and we’re working on adding OAuth 2 based token authentication (we currently offer OAuth 1 which can be a little cumbersome to use for clients) as a cheaper and safer alternative to password-based authentication.