Technoblog

Monday, November 10, 2008

Compiling SWFUpload on Mac OS X

SWFUpload is a great flash library for dynamic file upload progress meters, but making changes to the swf can be a pain in the ass if you don't have FlashDevelop, a windows-only app for compiling actionscript. Luckily I figured out how to do it on OS X.

Monday, April 21, 2008

DataMapper Review and Impressions

On the first day, DHH made ActiveRecord, and ActiveRecord was good. Sure it wasn't originally thread safe and it created monster SQL queries updating every single column on a save even if only one value changed. Sure it may have used a ton of RAM and made you write a bit more SQL than you would have liked. But it worked and was good.

Today, there is a lot of buzz around Merb + DataMapper. I will review Merb later but I would like to let people know a few things about DataMapper.

DataMapper has a lot of promise. It is thread safe, smart about SQL updates, has a better interface for generating conditions for SQL queries. Unfortunately, it is in a very strange place right now.

DataMapper's gem is at 0.3.1. Unfortunately, the entire 0.3 release is all but completely unsupported by any community. Everyone in the DM community is working on the 0.9 release. That would be fine except for the fact that 0.9 is currently so far from working that you couldn't use it in production even if you wanted to.

So DM 0.3 is unsupported... that's ok as long as it works right? Well unfortunately, DM 0.3 has various very severe bugs that I can't get anyone in the DM community to care about... even though I have submitted patches to their bug tracking system.

When you declare an :order in a has_many association, it does not work at all... it re-orders the entire collection arbitrarily

The belongs_to association does not work if you have changed your primary key names

External database contexts do not work at all even though there are examples of them on the home page

DM 0.3 claims to be backwards compatible with ActiveRecord, yet you can not use your :from, :joins, or :group conditions when composing SomeClass.find(...) calls

And these are only the ones I found within 1 week of using DataMapper.

The thing that really scares me is that nobody seems to care. Any time I mention anything like this in the IRC channel, the standard response is: sorry, we are all working on 0.9, we have no idea about 0.3.

So in short, 0.3 is unstable with no supporting community behind it and 0.9 is completely unusable. Unfortunately, that is a bad combination in my book. I have branched 0.3 so that I can keep a stable version for myself, and if anyone wants to contribute you can find it at github. I am using it because I like it and know my way around the 0.3 source code so I can fix the large issues I encounter, however I would not recommend DM unless you are a seasoned Ruby developer who is not afraid to roam on your own inside of an ORM framework.

Friday, March 07, 2008

Book Review- The Ruby Programming Language

David Flanagan, Matz, and _why teamed up to write The Ruby Programming Language. This book is in kind of an interesting purgatory right now. This book covers Ruby 1.8 and 1.9 which is both an asset and a problem. It is hard to recommend this book to a beginner because Ruby 1.9 will not be in mainstream use for quite some time now and learning about the new features can be more confusing than useful, but it is also hard to recommend this book for people who know Ruby well since there is no clear differentiation of 1.9 code, thus you can't use it as a reference for what's new.

The obvious comparison for this book is the seminal pickaxe v2 book. Although the Ruby Programming Language is well written and enjoyable, it does not cover Ruby as thoroughly as pickaxe. I could see the Ruby Programming Langauge v2 with a full reference section of 1.9 (or even 1.8+1.9 with some clever way of showing the differences) taking over my recommendation for #1 most important book for any Ruby programmer in a few years, but until then pickaxe is still required reading.

One thing that I found interesting was the decision not to cover many of the common standard libraries like CGI, logger, test/unit, or net/* (there is 1/2 page out of 400 dedicated to net/http). These are some of the oversights I hope will be fixed in v2, though the authors may choose to keep the discussion more pure and keep the book focused on blocks and operators. Personally, I think that would be a shame not to get into more detail.

Overall, I like this book, but feel like the only person that would benefit from it right now are those who know Ruby at a somewhat higher than beginner level already, but want to deepen their knowledge to a more advanced level and stay ahead of the curve with 1.9 at the same time.

Adding a comprehensive reference section with clear differentiation between 1.8 and 1.9, and more _why illustrations (one per chapter is way too meager) could make this book invaluable.

"Building a complete Ruby on Rails business application from start to finish"

This book delivers. One of the reasons I like this book so much is that it talks about using Rails in one of the best types of situations: internal-facing utility apps. These are the type of applications I first started building using Rails professionally and one of the places Rails shines brightest. Rails lets you quickly throw together utility applications that adds value to a business, yet maintain a clean codebase, and this book shows you how.

The book covers everything from source control, Rails basics, choosing a CGI implementation, choosing a database, AJAX development, and does a very good job of showing you both the OS X and Windows side of Rails development. It is centered around a story that is familiar to many small businesses: sharing a dynamic contact list within a sales team. It teaches you how to think about the problem in a structured way, including deciding on a database architecture and key format.

The thing I like best about this book is that it is terribly pragmatic. It steps you through a common situation and brings up the complexities just as they would show up in the real world and the best practices to handle those complexities.

I think the target audience for this book is anyone who is tired of spaghetti code but intimidated by seemingly scary terms like MVC. For years I have been begging some of my friends to try Rails instead of PHP, but for various reasons they don't get out of their comfort zone. If a friend has kept begging you to try Rails and it seems like too much effort, pick up this book.

As with most Rails books, this one does suffer from a few deprecations since publication and a few detail oversights. For example, pagination is now a plugin, so a newbie hitting that chapter could get confused quickly. In the book's coverage of various reverse proxies, it does not mention one of the key reverse proxies for Rails environments under high load: HAProxy, which can hide concurrent requests from Mongrel. A finer point is that in the example on uploading files, there is no note that when the file being uploaded is a certain small size, it is instantiated as a StringIO object, not a Tempfile object, which means that the File.basename will raise an error.

Overall, I would definitely recommend this book for people new to Rails.

This is from the standard documentation for the Ruby Mysql lib. What does it do? It grabs all items from tbl, and sets col1 and col2 to the first and second column of the LAST value of the result set. Not very useful unless you add a LIMIT 1 to your SQL statement. Much more useful would be this modified code.

Here we select all users from the table and end up with an array of full name strings. How did including Enumerable help us at all? It defined the map function for us, which natively does not exists for Mysql::Result objects. Why Enumerable is not included in the Mysql::Result class is a mystery to me.

Monday, July 09, 2007

Advanced Concepts in Ruby on Rails Hosting Part IV

In these past weeks, we have discussed the transition from a standard reverse proxy (represented by a single manager handing documents to be translated to various translators, one at a time as soon as they came in) to a system I call drproxy with clients running on all application servers buffering requests and handing them to instances of the application as soon as they were ready (represented by office managers buffering documents for the translators).

It seems as though we have created a pretty streamlined system during these weeks, but there is still a bottleneck. Most people will never reach the bottleneck, but many people will feel pain around it. That bottleneck is the request server (represented by the manager who hands documents to office managers). The request server is very fast and can take many requests per second without flinching, however it is a single point of failure. If it goes down, nothing gets through. Not only that, but adding a second request server doubles the potential number of incoming requests. In the analogy, if the main manager stays home sick, no documents get translated that day. Drproxy is built to be able to run various request servers very easily, both for load distribution and redundancy.

Drproxy has been built in the crucible, we have been using it at MOG for over a month now, tweaking and making improvements. I am working on bundling the software, which will be open source, and when we release it should be very ready to use out of the box for most Rails websites. Thanks for following this series and I really hope you enjoy drproxy.

Monday, July 02, 2007

Advanced Concepts in Ruby on Rails Hosting Part III

In our discussion of distributing web requests to different servers via the analogy of a translation company, we ended up last week with a question. To recap, the analogy compares application serving computers to translation offices and instances of the application as translators. Further, a manager (reverse proxy) sat in front of the offices distributing tasks to each office. Last week we realized that to increase efficiency of our translators, we could put a manager in each office in order to buffer requests. That way, any individual bottleneck could not hold up the queue from being processed. For distributing web requests, I created a reverse proxy called drproxy to do exactly this.

These types of systems can be found over and over again in the real world. Just this weekend I was in line to order a polish dog from Costco. There was a single line with two servers processing the line. I watched as a father couldn't get decisions from each of his three children, yet the other server's booth moved along smoothly and brought me closer and closer to the polish dog. You can find similar lines at Nordstrom Rack, Fry's Electronics, any restaurant, and many other locations.

One system that works like the less efficient "round-robin" method described a few weeks ago are lines found in grocery stores. You get in a line praying that the people in front of you don't like to write checks or count out change because if they take their time, they hold you up. How many times did you choose what looked like the fastest checkout line only to watch people go through other lines faster due to one price-check on isle five? Some grocery stores have started implementing self-checkout systems. I find that whenever given the choice, I tend to go directly to the self-checkout because it is always faster. One of the reasons it is faster is because there is a single line for four checkout machines. You could get one person counting change, one person doing a price-check and still have two machines checking people out smoothly. It is amazing that grocery stores have not realized this and implemented better line processing.

The question I posed at the end of last week was whether there was an even more efficient way to distribute requests. I propose that there is. Here is why: just as any individual translator might get a backed up queue of requests, translation offices could become overwhelmed. Based on pure probability, one office might build up a queue of 100 requests while another might sit there queue-less. There are a few ways to tackle this problem, but whichever way you choose to handle it, you must know the size of the queues in each office at any given time. Given this knowledge, you could choose to only hand requests to the least busy of the offices. I am not a big fan of this approach because it seems like you could imagine situations where large groups of documents go to the same office in a row and I like to distribute requests randomly to prevent buildups and attacks on the system. The way that I implemented drproxy to distribute requests was by randomly picking any off! ice, except for the busiest office. If one office builds up a backup of requests and the others have no requests, no new requests will be sent to the busy office until it frees up. This load balancing system works very effectively.

I hate to do this to you again, but can you think of any other major bottlenecks in our system? I can.

Monday, June 25, 2007

Advanced Concepts in Ruby on Rails Hosting Part II

Last week, we were discussing the analogy of serving websites being similar to running a translation company. A request would come in as a document, be handed to an application server as a translator, and returned to the client. We left off with a scenario of three translation offices with 10 translators in each office. One of the simplest methods to distribute work among these translators is to hand out documents one at a time in a round-robin way. However, due to inherit traits of certain documents being longer than others and certain translators being faster than others, backups build up for some of the translators, leading to a random lag and customer complaints.

Rather than the brute force method of adding more offices and translators, can you think of a better way to distribute resources?

The bottleneck in the scenario sketched above is management. Our translation company still has only one manager, thus limiting his ability to distribute resources more effectively. If we hire office managers and let the manager hand documents to the office managers, this lets us think of more interesting distribution techniques. For example, instead of overwhelming our translators with a growing pile of documents, and thus a growing pile of responsibilities, the office manager can wait until each translator has finished their job before handing them a new document.

Let us think about the consequences of this change. First some assumptions. Assume John is faster at translating than Susie because he has less on his mind (in computer lingo this would mean that Susie is experiencing a memory leak, possibly due to a bad programming library). Further assume a pile of documents comes in with this order: a 10 page document, a 2 pager, a 20 pager, a 1 pager, a 3 pager. In our original setup we could easily find ourselves in the situation where Susie gets a pile with the 10 pager, the 20 pager, and finally the 3 pager; whereas John only got the 2 pager and 1 pager. You can see that Susie's 3 pager should have been easy and fast, but was stuck behind a few bigger documents and is in the hands of the slower translator.

With the new distribution algorithm, the worst case scenario would be that Susie would be chugging away at the 20 pager, but since John quickly made chump change of the other documents, he can turn over the the 3 pager before Susie even finishes the 20 pager. This is much more streamlined because the queue was processed as quickly as the resources freed themselves up as a group, not relying on the individual translator to handle the concurrency.

The typical Rails setup of a reverse proxy handing requests to mongrel is not the most efficient use of the resources, so I built a load balancer I call drproxy which sits between the reverse proxy and the Rails dispatchers and queues up requests, handing them out in a more efficient way as each resource is freed. Furthermore, I build drproxy in Erlang, a language built from the ground up to excel at concurrency. Ruby is a slug when it comes to handling concurrency and multi-threaded environments. Erlang is like a Porsche.

There are, however, even more ways to make the system more efficient in an algorithmic way. Think about it for a while and I will tell you what I did next week.

Tuesday, June 19, 2007

Advanced Concepts in Ruby on Rails Hosting Part I

Let us imagine how a translation company starts out, lets call this company MOG Translation, Inc. At first, there might be one translator and one manager. The manager receives a document from a client and hands it to his translator. The translator might turn around the document in 1 hour, making its way back to the manager and then customer's hands. That is also the fundamentals behind hosting a website. In the most simple form, a web server acts like the manager: it takes in a request (http://mog.com/ for example) and hands it to the application server. The application server acts like a translator, it receives the request and turns it into HTML code that is passed back to the web server and shows up in your web browser.

Soon, MOG Translation, Inc. gets a good reputation and the translation documents come in faster than one per hour. Suddenly our poor lonely translator can't keep up and the papers keep piling up. We all know what to do: hire more translators. Now MOG Translation, Inc. has 10 translators on staff and the manager gets to pick how to hand out the work. One of the simplest ways to hand out documents is to pass them out one at a time to each translator. "One document for Nancy, One document for Drew ...", then when you get to the end of the line go back and starting handing out to Nancy again. Whenever the translators finish, they hand the translation back to the manager who makes sure it goes to the right client. In computer lingo, this system is called round-robin. You have one web server that distributes requests to 10 application servers in a round-robin way.

MOG Translation, Inc. gets such a good reputation that 10 translators is simply not enough. Unfortunately, the office is already a bit cramped so it is time for MOG to open a new branch, another office with another 10 translators. This is equivalent to adding a new server to handle more load. In order to make this change invisible to the customer, we do not want to change the face of our company, so the manager stays the same. But now when he gets to the end of the line in office 1, he faxes the next documents directly to the desks of the translators at office 2. This system is still round-robin, and works fine when adding even more offices.

This is the standard way that many web site's infrastructures grow. In order to handle more requests, they get more servers with more instances of the application running to handle more people visiting the site. The vast majority of web sites never need to grow beyond this point, however popular ones like MOG do. Imagine a talented manager who sends requests to three offices with 10 people in each office in a round-robin way. Problems arise because some documents are longer than others, and some translators are faster than others. This leads to congestion where even though the manager is handing the documents out evenly, some translators desks build up piles. When a pile builds up, even if a very easy document comes in, it might take a lot longer due to the other documents before it. Sure, you could just get more translators, but can you think of a better way to utilize the resources you already have?

Next week I will tell you what I did to better utilize our resources. For the more technical of my readers, the "manager" web server software we use is nginx, a very fast reverse proxy from Russia and the "translator" application software is mongrel which renders Ruby on Rails. Using a reverse proxy and mongrel is the canonical way to serve Rails web sites.

About Me

With almost 8 years of PHP and MySQL development experience, I found Rails in September of 2004 and fell in love. I have experience in a wide range of languages but spend most of my time with Ruby these days.