Posts Tagged macruby

After my recent RubyConf talk and follow up post addressing the Ruby & Python’s Global Interpreter Lock (aka GVL/Global VM Lock). a lot of people asked me to explain what I meant by “data safety”. While my point isn’t to defend one approach or the other, I spent a lot of time explaining why C Ruby and C Python use a GIL and where it matters and where it matters less. As a reminder and as mentioned by Matz himself, the main reason why C Ruby still has a GIL is data safety. But if this point isn’t clear to you, you might be missing the main argument supporting the use of a GIL.

Showing obvious concrete examples of data corruption due to unsafe threaded code isn’t actually as easy at it sounds. First of all, even with a GIL, developers can write unsafe threaded code. So we need to focus only on the safety problems raised by removing the GIL. To demonstrate what I mean, I will try to create some race conditions and show you the unexpected results you might get. Again, before you go crazy on the comments, remember that threaded code is indeterministic and the code below might potentially work on your machine and that’s exactly why it is hard to demonstrate. Race conditions depend on many things, but in this case I will focus on race conditions affecting basic data structures since it might be the most surprising.

Example:

In the above example, I’m creating an instance variable of Array type and I start 4 threads. Each of these threads adds 100,000 items to the array. We then wait for all the threads to be done and check the size of the array.

If you run this code in C Ruby the end result will be as expected:

400000

Now if you switch to JRuby you might be surprised by the output. If you are lucky you will see the following:

ConcurrencyError: Detected invalid array contents due to unsynchronized modifications with concurrent users
<< at org/jruby/RubyArray.java:1147
__file__ at demo.rb:3
each at org/jruby/RubyRange.java:407
__file__ at demo.rb:3
call at org/jruby/RubyProc.java:274
call at org/jruby/RubyProc.java:233

This is actually a good thing. JRuby detects that you are unsafely modifying an instance variable across threads and that data corruption will occur. However, the exception doesn’t always get raised and you will potentially see results such as:

335467
342397
341080

This is a sign that the data was corrupted but that JRuby didn’t catch the unsynchronized modification. On the other hand MacRuby and Rubinius 2 (dev) won’t raise any exceptions and will just corrupt the data, outputting something like:

294278
285755
280704
279865

In other words, if not manually synchronized, shared data can easily be corrupted. You might have two threads modifying the value of the same variable and one of the two threads will step on top of the other leaving you with a race condition. You only need 2 threads accessing the same instance variable at the same time to get a race condition. My example uses more threads and more mutations to make the problem more obvious. Note that TDD wouldn’t catch such an issue and even extensive testing will provide very little guarantee that your code is thread safe.

So what? Thread safety isn’t a new problem.

That’s absolutely correct, ask any decent Java developer out there, he/she will tell how locks are used to “easily” synchronize objects to make your code thread safe. They might also mention the deadlocks and other issues related to that, but that’s a different story. One might also argue that when you write web apps, there is very little shared data and the chances of corrupting data across concurrent requests is very small since most of the data is kept in a shared data store outside of the process.

All these arguments are absolutely valid, the challenge is that you have a large community and a large amount of code out there that expects a certain behavior. And removing the GIL does change this behavior. It might not be a big deal for you because you know how to deal with thread safety, but it might be a big deal for others and C Ruby is by far the most used Ruby implementation. It’s basically like saying that automatic cars shouldn’t be made and sold, and everybody has to switch to stick shifts. They have better gas mileage, I personally enjoy driving then and they are cheaper to build. Removing the GIL is a bit like that. There is a cost associated with this decision and while this cost isn’t insane, the people in charge prefer to not pay it.

Screw that, I’ll switch to Node.js

I heard a lot of people telling me they were looking into using Node.js because it has a better design and no GIL. While I like Node.js and if I were to implement a chat room or an app keeping connections for a long time, I would certainly compare it closely to EventMachine, I also think that this argument related to the GIL is absurd. First, you have other Ruby implementations which don’t have a GIL and are really stable (i.e: JRuby) but then Node basically works the same as Ruby with a GIL. Yes, Node is evented and single threaded but when you think about it, it behaves the same as Ruby 1.9 with its GIL. Many requests come in and they are handled one after the other and because IO requests are non-blocking, multiple requests can be processed concurrently but not in parallel. Well folks, that’s exactly how C Ruby works too, and unlike popular believe, most if not all the popular libraries making IO requests are non blocking (when using 1.9). So, next time you try to justify you wanting to toy with Node, please don’t use the GIL argument.

What should I do?

As always, evaluate your needs and see what makes sense for your project. Start by making sure you are using Ruby 1.9 and your code makes good use of threading. Then look at your app and how it behaves, is it CPU-bound or IO-bound. Most web apps out there are IO-bound (waiting for the DB, redis or API calls), and when doing an IO call, Ruby’s GIL is released allowing another thread to do its work. In that case, not having a GIL in your Ruby implementation won’t help you. However, if your app is CPU-bound, then switching to JRuby or Rubinius might be beneficial. However, don’t assume anything until you proved it and remember that making such a change will more than likely require some architectural redesign, especially if using JRuby. But, hey, it might totally be worth it as many proved it in the past.

I hope I was able to clarify things a bit further. If you wish to dig further, I would highly recommend you read the many discussions the Python community had in the last few years.

During RubyConf 2011, concurrency was a really hot topic. This is not a new issue, and the JRuby team has been talking about true concurrency for quite a while . The Global Interpreter Lock has also been in a subject a lot of discussions in the Python community and it’s not surprising that the Ruby community experiences the same debates since the evolution of their implementations are somewhat similar. (There might also be some tension between EngineYard hiring the JRuby and Rubinius teams and Heroku which recently hired Matz (Ruby’s creator) and Nobu, the #1 C Ruby contributor)

The GIL was probably even more of a hot topic now that Rubinius is about the join JRuby and MacRuby in the realm of GIL-less Ruby implementations.

During my RubyConf talk (slides here), I tried to explain how C Ruby works and why some decisions like having a GIL were made and why the Ruby core team isn’t planning on removing this GIL anytime soon. The GIL is something a lot of Rubyists love to hate, but a lot of people don’t seem to question why it’s here and why Matz doesn’t want to remove it. Defending the C Ruby decision isn’t quite easy for me since I spend my free time working on an alternative Ruby implementation which doesn’t use a GIL (MacRuby). However, I think it’s important that people understand why the MRI team (C Ruby team) and some Pythonistas feels so strongly about the GIL.

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.) [...] The GIL is controversial because it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. Note that potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.

The same basically applies to C Ruby. To illustrate the quote above, here is a diagram representing two threads being executed by C Ruby:

Such a scheduling isn’t a problem at all when you only have 1 cpu, since a cpu can only execute a piece of code at a time and context switching happens all the time to allow the machine to run multiple processes/threads in parallel. The problem is when you have more than 1 CPU because in that case, if you were to only run 1 Ruby process, then you would most of the time only use 1 cpu at a time. If you are running on a 8 cpu box, that’s not cool at all! A lot of people stop at this explanation and imagine that their server can only handle one request at a time and they they rush to sign Greenpeace petitions asking Matz to make Ruby greener by optimizing Ruby and saving CPU cycles. Well, the reality is slightly different, I’ll get back to that in a minute. Before I explain “ways to achieve true concurrency with CRuby, let me explain why C Ruby uses a GIL and why each implementation has to make an important choice and in this case both CPython and C Ruby chose to keep their GIL.

Why a GIL in the first place?

It makes developer’s lives easier (it’s harder to corrupt data)

It avoids race conditions within C extensions

It makes C extensions development easier (no write barriers..)

Most of the C libraries which are wrapped are not thread safe

Parts of Ruby’s implementation aren’t threadsafe (Hash for instance)

As you can see the arguments can be organized in two main categories: data safety and C extensions/implementation. An implementation which doesn’t rely too much on C extensions (because they run a bit slow, or because code written in a different language is preferred) is only faced with one argument: data safety.

Don’t count the amount of pros/cons to jump to the conclusion that removing the GIL is a bad idea. A lot of the arguments for removing the GIL are related. At the end of the day it boils down to data safety. During the Q&A section of my RubyConf talk, Matz came up on stage and said data safety was the main reason why C Ruby still has a GIL. Again, this is a topic which was discussed at length in the Python community and I’d encourage you to read arguments from the Jython (the equivalent of JRuby for Python) developers, the PyPy (the equivalent of Rubinius in the Python community) and CPython developers. (a good collection of arguments are actually available in the comments related to the rubber boots post mentioned earlier)

How can true concurrency be achieved using CRuby?

Run multiple processes (which you probably do if you use Thin, Unicorn or Passenger)

Use event-driven programming with a process per CPU

MultiVMs in a process. Koichi presented his plan to run multiple VMs within a process. Each VM would have its own GIL and inter VM communication would be faster than inter process. This approach would solve most of the concurrency issues but at the cost of memory.

Note: forking a process only saves memory when using REE since it implements a GC patch that makes the forking process Copy on Write friendly. The Ruby core team worked on a patch for Ruby 1.9 to achieve the same result. Nari & Matz are currently working on improving the implementation to make sure overall performance isn’t affected.

Finally, when developing web applications, each thread spend quite a lot of time in IOs which, as mentioned above won’t block the thread scheduler. So if you receive two quasi-concurrent requests you might not even be affected by the GIL as illustrated in this diagram from Yehuda Katz:

This is a simplified diagram but you can see that a good chunk of the request life cycle in a Ruby app doesn’t require the Ruby thread to be active (CPU Idle blocks) and therefore these 2 requests would be processed almost concurrently.

To boil it down to something simplified, when it comes to the GIL, an implementor has to chose between data safety and memory usage. But it is important to note that context switching between threads is faster than context switching between processes and data safety can and is often achieved in environments without a GIL, but it requires more knowledge and work on the developer side.

Conclusion

The decision to keep or remove the GIL is a bit less simple that it is often described. I respect Matz’ decision to keep the GIL even though, I would personally prefer to push the data safety responsibility to the developers. However, I do know that many Ruby developers would end up shooting themselves in the foot and I understand that Matz prefers to avoid that and work on other ways to achieve true concurrency without removing the GIL. What is great with our ecosystem is that we have some diversity, and if you think that a GIL less model is what you need, we have some great alternative implementations that will let you make this choice. I hope that this article will help some Ruby developers understand and appreciate C Ruby’s decision and what this decision means to them on a daily basis.

My name is Matt Aimonetti, and in my free time I work on Apple’s open source Ruby implementation named MacRuby. I’m also the author of O’Reilly’s MacRuby book. As you can imagine, I’m very thankful that Apple initiated the MacRuby project a few years ago and have been an avid supporter. MacRuby is awesome to develop OS X native applications using the Ruby language and even allows you to compile down your apps to machine code. It’s a great alternative to Objective-C.

MacRuby is so awesome that Apple is even using it in its upcoming OS. The only problem is that Apple apparently decided to not share MacRuby with other OS X developers and put MacRuby in the OS private frameworks. While this doesn’t affect the project itself, it does affect OS X developers like myself who can’t link to Lion‘s private MacRuby framework and are forced to embed MacRuby with their applications.

Today I was helping someone write an Objective-C framework around cocos2d.

C/Objective-C code can be called directly from MacRuby. However the Obj-C code you would like to use might be using some ANSI C symbols that are non-object-oriented items such as constants, enumerations, structures, and functions. To make these items available to our MacRuby code, you need to generate a BridgeSupport file as explained in this section of my book.

In our case, we were working on the framework and I didn’t feel like manually having to regenerate the BridgeSupport file every single time I would compile our code. So instead I added a new build phase in our target.

If you already purchased an electronic copy (thank you!), go to your product page and download an update. Note that if you are on an iPhone or iPad and you click on the .epub link, the file will automatically load in iBooks, same thing if you have the kindle app installed and you are clicking on the .mobi file.

If you haven’t purchased an electronic version yet, you can still read the book in HTML format here. If you choose to do so, please still think about leaving a review on the O’Reilly site to show your support to myself and to my publisher.

What’s new in this update?

App built in chapter 2

I refactored the introduction to the book, you know have a new Chapter 2 which introduces you to GUI app development and lets you build a full app right away to give you an idea of what is available.

Chapter 3 covers more advanced concepts and was updated to reflect changes made on trunk.

The chapter on Core Data is now almost done, I will soon push an update with the end of the chapter, but between the chapter content and the example app, you should be good to go today.

GitHub repo with the code used in the book. (work in progress, I need to move more code there)

The book is now divided in two parts, one more theoretical and more practical. Even if part 2 isn’t written yet, the table of contents shows some of the examples I will cover. I hope to have some of these chapters written in a few weeks.

Lots of small fixes and updates thanks to the readers comments and to my editor.

App built in the Core Data chapter

Finally if you are interested in pre-ordering the book, Amazon runs a great deal on the book:

I started thinking about working on “MacRuby: The Definitive Guide” last year when I realized that the project had a great future but there was a serious lack of documentation. With the support of the MacRuby team, I worked on a table of contents and a pitch. The next step was to decide what we wanted to do with the book.

I know a lot of technical book authors and most of them will tell you the same thing: if you think that you are going to make money writing a book, you are wrong. Even if your book sells well, because of the time invested in writing the book, you are probably better off doing consulting work and charging by the hour.

So since day one, I knew that this project would not make me rich. The goal was to share knowledge not to reimburse my mortgage or save California from bankruptcy. While publishing a web book is great, distribution is quite limited, especially if you try to reach people outside of your network. That’s why I decided to start talking to a few publishers. Most publishers I talked to were interested in working on the book, however they were not really keen on publishing a Creative Commons Attribution-Noncommercial-No Derivative licensed book.

Let me explain why I think releasing technical books under a CC license is important. As you might know (or have figured out by now), I am not a native English speaker. I actually learned my first English words thanks to the computer my dad had at home. The problem when you don’t live in an English speaking country and you want to learn about the cutting edge technology is that you have to understand English. Thanks to the Internet, learning and practicing English is now much easier that it used to be. However, if you want to have access to books, most of the time you have to wait until someone translates the book and publishes it in your country or you have to manage to get an English version delivered to your country. This is often a pain because of national credit card limitations, international delivery restrictions etc… If you manage to find a way to get a copy, the book ends up costing a lot of money.

What does that mean in practice? Most of the technical books are first available in the English speaking western world, then slowly translated and/or distributed around the world. By the time you get a legal copy in Bolivia, Algeria or Vietnam, a new edition is probably out in the US probably because the technology evolved. Maybe that explains some of the book piracy worldwide?

Think about it for a minute: knowledge is power and time is money. And what do we do? We delay knowledge distribution. This is why I am a big fan of the Khan Academy and its awesome free online courses.

Turns out O’Reilly shares my vision and has already published a lot of books under various open licenses: http://oreilly.com/openbook/ I was also interested in publishing the content of my book ASAP so people could access it right away even though there would be lots of typos and missing content. This is also something O’Reilly has already done with the CouchDB and the Scala books.

Talking with Jan Lehnardt about his experience working with O’Reilly on the ‘CouchDB: The definitive guide’ book, I realized that we seem to have some shared interests. I contacted Jan’s editor and we decided to start working on the MacRuby book. The book will be available later on in all the usual commercial formats and I hope people will show their support so O’Reilly will be encouraged in their choice to continue publishing CC licensed book. At the end of the day, purchasing a CC licensed book helps supporting the authors, the publishers but also all the people who can’t have access to the latest technical books.

Finally, working on a book is not an easy thing, especially when you have to write it in a language that’s not yours. But I have to say that the community support has been amazing. Even John Gruber sent a fireball my way. And since the announcement was made, I have received a lot of comments, tweets, emails etc… It is very encouraging and it gives me the motivation needed to work on the book after a long work day.

As I’m working on my upcoming O’Reilly MacRuby book, I’m writing quite a lot of example code. I have spent the last few weeks digging through most of the Foundation framework classes to hopefully make Cocoa more accessible to Ruby developers.

In some instances things might look quite weird to someone new to Cocoa in some cases, things seem almost too easy. Here is an example implementing a undo/redo functionality using Foundations’ NSUndoManager.

>> lara = Player.new=><Player:0x200267c80 @y=0@x=0>>> lara.undo_manager.canUndo=>false# normal since we did not do anything yet>> lara.left=>-1>> lara.x# -1=>-1>> lara.undo_manager.canUndo=>true# now we can undo, so let's try>> lara.undo_manager.undo# undo back to initial position=>#<NSUndoManager:0x200257560>>> lara.x=>0>> lara.undo_manager.canUndo=>false# we can't anymore which makes sense >> lara.undo_manager.canRedo=>true# however we can redo what we just undone>> lara.undo_manager.redo# redo to before we called undo=>#<NSUndoManager:0x200257560>>> lara.x=>-1

The above example was tested in macirb but as you can see, actions can be undone and redone very very easily. This is just a quick preview of what you can do using Ruby + Cocoa and hopefully it will give you some cool ideas to implement.