Archive

The talk was on best practices and some tips on looking for problems and how the panelists worked around them. There are no “magic” bullet like Ruby or RoR has 🙂

Essentials:

Watch out when using ActiveRecord. It make it too easy to use DB. It make it too easy to use DB. One more time, it make it too easy to use DB.

Essentially, ActiveRecord and DB is not always the right tool. Sometime using other tool could work better for a particular problem.

Things mentioned:

Using Redis as a queueing system, to buffer writes, which later go to DB. (this is what Blitz, Bleacher Report use to increase their performance).

Use NoSQL (CouchBase, Mongo and Cassandara were mentioned as being used by panelists).

Cache results as much as possible. Don’t hit DB all the time.

Hand optimize queries might be needed. ActiveRecord is not the best at generating optimized DB calls.

Cache as much as possible. Bleacher Reports put in caching layer everywhere, memcache, front end web cache, etc. They also have scripts that pre-warmed their cache (“goal is to never have users be the one who triggered a cache request”).

Use the cache in newer RoR (3.2).

Write code in ways that make it easy to update to latest Ruby and RoR.

Ruby EE has flags to allow you to use more memory for internal cache. Sometime it make sense to test for and try different memory configuration there (based on 2 panelists’ experiences).RoR 3.2 has good Rack/Rails cache. Read the doc and use them.

Background processes.

Use bg proc whenever possible.

Anytime you need to make calls to external website (external API), use a bg process, to not tie up your RoR web process.

Blitz put jobs into Redis queue, then bg server check Q for job, run it and put partial results back into Redis, Ajax call then check and format/display result to web client.

Bleacher Reports and Mixbooks also do similar things. They use Redis as a job queueing system, among other things (see 1 above).

They all mention using other web server for production (not using webrick). The following were mentioned as being used by panelists.

Passenger

Thin

Unicorn

Related to (ActiveRecord) above is the N+1 problem. Where you add 1 line of code and the DB calls increased manifold.

Advice essentially say to develop and use coding best practices and train developers to look out for them.

There is a possible test that can be use to automated looking out for N+1 issue.

The host of the meeting Blitz also did a marketing spiel on their tool to use for performance testing (it look really good, and available as a plugin on Heroku). I am going to test it and see about using it for performance/load testing our site.

We just migrated from Bamboo (bamboo-ree-1.8.7) to Cedar (calendon rails 3.1 stack) on Heroku. Here are the steps I took to make it work. Note that I didn’t do the code migration, my dev handle that, I do the server and infrastructure part.

Basically this mean moving from Rails 2.x (and ruby 1.8.7) to Rails 3.1 (ruby 1.9.2). Some of the changes require upgrading gem packages. Things such as images, css and js has to be put into an asset bundle.

On the site itself, I left the old site running, e.g. oldsite.heroku.com, and created a new site at newsite.herokuapp.com.

We created a new git dev branch and pushed to newsite, e.g.

git push heroku dev:master

Since we use SSL, I have to make sure custom_domains addon is there. But…

heroku addons:add custom_domains:basic custom_domains:wildcard

I can’t add the ssl addon until I am ready, because Heroku requires that I defined the domains for the app first! Chicken and egg, as it mean I have to take down the currently running production site.

So, make sure everything is running on new site first. Because the next steps mean production site will be down during the changes.

0. Make sure your SSL cert is up-to-date and you have both part, domain.cert and private.key. And most importantly, your key is passphrase-less!

1. Make sure your DNS records are updated and have shortest possible TTL. You are changing them.

2. Wait till DNS changes (TTL) have settle, could be a few hours. Then update your production site CNAME from *.yourdomain.com to newsite.herokuapp.com.

FAQs

How to fix corrupted elasticsearch translog.

In 5.0 there is a tool which can be used to truncate corrupt translog files. This doesn't exist in 2.x but there is a workaround:
POST my_index/_close
PUT my_index/_settings
{ "index.engine.force_new_translog": true }
POST my_index/_open
PUT my_index/_settings
{ "index.engine.force_new_translog": false }
NOTE: Any data in the corrupted translog will be lost.

How to size a cluster?

I want to create a new Elasticsearch cluster. What are the recommended sizing guidelines?
Answer:
This is very much a use case dependent answer. The factors that should be taken into considerations are:

How much data do you expect to index?

Frequency of new data. How often is new data to be indexed? Daily? Hourly?