Menu

Tag Archives: heroku

A few things have changed since I last talked about hosting. Tweet Marker passed 200,000 total users to the API. There are more apps and more platforms. And of course Plus launched with new requirements for database and search indexing of tweets.

Here’s a graph showing monthly hosting costs for the last year, stopping just short of $600/month for April:

The first dip was when I moved away from Heroku’s dedicated PostgreSQL database, to Redis on EC2. The more recent increase is when I updated capacity for Tweet Marker Plus.

Tweet Marker currently runs on both Heroku and Amazon EC2. On Heroku, there are 6 dynos: 5 web dynos with 3 Unicorn processes each, and 1 background worker. I also run hourly scheduled tasks that add a small number of extra dyno-hours, and sometimes I’ll fire up additional temporary workers.

For Amazon, I run with 3 medium server instances: 1 for Redis master, 1 for Redis slave, and 1 for search with MySQL and Sphinx. The search server is partitioned across multiple EBS volumes, each one mounted as a separate MySQL database and Sphinx index. It is possible for me to move a database or search index shard to a different EC2 instance if I need to, as well as move customers between shards.

The volumes look like this in Amazon’s admin UI:

I picked 20 GB shards because it seemed like about the point where the database would be too big to be fast, given the modest hardware. It’s enough to hold several months of tweets for all the users in a shard. I estimate how many users should be in each shard, and when it reaches that number I roll new accounts to the next shard, and so on.

Backup dumps for the Redis database and MySQL get sent to S3 every hour and every day, where I keep the last 24 hourly backups and the last 31 daily backups. I also do occasional EBS snapshots.

I don’t currently need a MySQL slave for backups. If I lose a drive, when I restore the last hourly backup, Tweet Marker Plus will automatically add any missing tweets lost during the downtime to bring things back to a current state.

Overall I’m happy with this setup. It’s as simple as I could design it. Hosting is not cheap, but I think I can run for the rest of the year with very few changes and mostly fixed costs. I also plan to switch to reserved instance pricing at Amazon, which should be a significant discount.

If you’ve made it this far, you probably care enough about servers and Twitter that you should consider signing up for Tweet Marker Plus yourself. Check out the details here.

The best critique and praise for Heroku is that it’s opinionated: only deploy with Git, only use PostgreSQL. On the whole this is a good thing, because it simplifies the choices that developers have to make. Fewer choices means more time to spend on things that matter.

But at $200/month for a dedicated database, there’s a very real cost with PostgreSQL on Heroku. For a service like “Tweet Marker”:http://tweetmarker.net/ that is currently free, I wanted to save money on hosting and simplify things down to a single database. I might have preferred to streamline to PostgreSQL-only, since it’s rock solid and Heroku’s tools are excellent for managing backups and moving data around, but Resque (which Tweet Marker uses to process background work) is built on Redis. If I wanted to streamline to a single database, it had to be to Redis.

Tweet Marker as essentially an OAuth-wrapped key/value store anyway, so why burden the app with a full relational database? The handful of extra tables I had for tracking users and sessions could also be migrated to Redis. And even though it’s a memory database at heart, it is persistent, so I can use it for real data.

I conducted the transition in stages:

First, I was using Redis for stats. Because Redis has an atomic increment command, it’s trivial to use it to keep counters on various metrics: total number of reads/writes per hour, day, or month, broken down by different Twitter clients. This was my first use of Redis outside Resque tasks and it performed great.

Next, I moved the marker data. I wrote a script to dump all the data from PostgreSQL to Redis, and at the same time modified Tweet Marker to write data to both locations when requests came in. I let that run for about 2 days to confirm that the data was identical in both places before shutting PostgreSQL down and routing reads to Redis only.

Finally, I finished moving some additional tables to Redis so that I could completely shutdown PostgreSQL. Again, simple is better.

All of this was using Heroku’s “Redis To Go”:https://redistogo.com/ add-on. I love it because there’s no configuration; click a couple buttons and you have a server. But as soon as I was running full time on Redis To Go, I started to be concerned that I had given up a great amount of control over my data. While Redis To Go conveniently offers daily backup snapshots, it doesn’t appear to have any commands for restoring data or otherwise getting access to the server and Redis files. And though I could probably set up a separate Redis To Go instance and manually send it a slaveof command to turn it into a slave, not being able to edit the config file meant that if the server was rebooted I would lose the slave until it could be reset. Not good.

I also expected I would quickly outgrow the smaller Redis To Go plans, eventually forcing the monthly pricing up to the same level as I was paying for PostgreSQL, but with less control. I had to run my own database server instead. And because Heroku is all on Amazon EC2, it also had to be on EC2 for the best performance between web server and database.

I went to work provisioning an EC2 instance. Initially it ran as a Redis slave to the master on Redis To Go. After I felt comfortable with EC2 and had automated backups in place, I added a separate instance on Amazon for the master and switched the app over to it. The cost for 2 “micro” instances is about the same as I was paying for Redis To Go, but with much more available memory, hourly backups to S3, and access to the Redis aof log. And my total monthly hosting costs are about half what they were before.

I’m hoping to keep this setup for at least the rest of the year and well into next. As the Tweet Marker user base grows, I’ll bump up to higher-capacity EC2 instances. Heroku continues to be really great, so I don’t expect to change anything there.

Today was day 7 of the “first app”:http://twitterrific.com/ to ship with Tweet Marker support. Overall, things have been working great. I thought I’d write a little bit about my hosting experience so far.

First, the good news: the service is up and fast. Twitterrific is sending a lot of traffic, even with the preference off by default, but the server app is architected in such a way that things are zipping along nicely. Performance is especially important on the iPhone, where a Twitter client might be saving state on quit and any lag would be pretty obvious. Non-trivial server work, such as hitting the Twitter API, is done in the background using Resque, and the queue size there is usually small too.

There have been two distinct problems, though. Each was a good opportunity to improve the system.

On the first day, the background queue stalled and the requests piled up, so much so that my Redis instance ran out of memory and clients started receiving errors. I upgraded Redis, restarted the app, and bumped up the number of processes to quickly catch up on the backlog. Although it hasn’t resulted in errors since, it has delayed some requests on 2 more occasions after launch. I also adjusted how I’m collecting statistics to further improve performance.

More recently, I’ve noticed some sporadic SSL errors when Tweet Marker tries to verify the user account via Twitter. I added more wait-and-retry logic, and I’m keeping an eye on this to kick the background script if necessary. There have been no errors in the last couple days.

I still think it was the right decision to host on Heroku. Although I am manually monitoring the server for general health (need to automate this badly), everything is less work than I would do with a colocated box or VPS. The biggest cost is that at the last minute I upgraded to Heroku’s dedicated PostgreSQL database. I’m still evaluating that, but I would rather launch with too much horsepower and scale it down later, than not enough.

Again, thanks for your support. It is a little nerve-racking because I know that “other people”:http://www.iconfactory.com/ are depending on Tweet Marker being up and performing well. This is a different experience for me than working at a “larger company”:http://www.vitalsource.com/ with a dedicated system administrator, and also different than server backends for my Riverfold products, where it is only my own customers who will be disappointed if I fail.

So you want to sync the last-read tweet with all your different Twitter apps on iPhone, iPad, and Mac? Yeah, me too. While I hope to build a version of Tweet Library for other platforms, what I’d also love is to be able to switch between clients and know that each one will pick up where I was last reading in the timeline.

That’s why I’m introducing a new service for Twitter developers: Tweet Marker.

I’ve already showed it off to a few developers, and if you’re writing a Twitter app I’d love for you to support it too. It will be baked into the next version of Tweet Library.

There are still some unknowns (especially around whether I will need to ask for help to cover hosting costs), but I wanted to launch it now before WWDC so that other Twitter app developers meeting at the conference can give me feedback on the service. Tweet Marker has actually been running for months, and when an opportunity came along this week for a new logo (thanks Alex!), I knew it was past time to finish documenting the service and get it out.

To be successful it needs at least 2 apps to support it (I’ll supply one of those). I’ve tried to solve all the other basic problems. It’s simple, fast, scalable on Heroku, and protected so that mischief-makers can’t tamper with tweet IDs.

Send me an email or find me in person next week if you have any feedback.

Update: This post has been updated to reflect the service name change from Tweetmarks to Tweet Marker.