Home

What are the scalability characteristics of EC2? Solid numbers on the performance of EC2 had been elusive. Max Gorbunov from GridGain executed a 512-node Monte Carlo simulation to find out how well Amazon EC2 performs and shared his results.
The test consisted of a custom setup based on open-source components including GridGain and Open MQ running on the default EC2 Fedora Core 8 distribution and using a custom test harness developed for this project.
The test showed near linear scalability from 2 to 512 nodes and good performance throughout the test. Max's article goes on to describe how the software was set up to work within the restrictions of the EC2 environment and how everything executed.

This was interesting, can you please tell me the AMI you used or was it your own version. How did you set up the instances, there's still AFAIK no way to guarantee instances on the same subnet.
Where the ActiveMQ problems related to it not scaling?
Thanks,
-John-

This was interesting, can you please tell me the AMI you used or was it your own version.

We used our private AMI shared only with GridGain. We're going to make it public soon, so you can evaluate it.

How did you set up the instances, there's still AFAIK no way to guarantee instances on the same subnet.

There is a way to co-locate instances in a single availability zone. In either case IP multicast is unavailable, so you have to use non-default DiscoverySPI to perform discovery.
Please read comments to our blog, you can find some answers there.
Best wishes,
Max

Shame you used such an old version of ActiveMQ - version 5.1 is a lot better - and its currently being used with 500+ clients per broker in production - using plain old blocking I/O - and we've scaled to a thousands - and got better performance using nio.
cheers,
Rob

Shame you used such an old version of ActiveMQ - version 5.1 is a lot better - and its currently being used with 500+ clients per broker in production - using plain old blocking I/O - and we've scaled to a thousands - and got better performance using nio.

Hi Rob,
ActiveMQ is a very nice and easy to use JMS implementation and many of our users do use our JMS Grid Node Discovery implementation with ActiveMQ in production in environments where IP Multicast is not supported.
I will make sure we download the latest ActiveMQ release and give it a shot.
Best,
Dmitriy Setrakyan
GridGain - Grid Computing Made Simple

Okay, I'm confused.
First off, it should be clear to everyone: I'm a GigaSpaces Technologies employee, and as such I'm not an entirely neutral observer.
But... this is meaningless. I think it's good that they, um, "scaled up" but... I don't quite understand what's being illustrated.
If it's something like "Yay, we can use 512 of the EC2 nodes," well, that's good - but since EC2 can go up to 550 nodes, why not use them all? All you're doing is verifying the claim that you can use that many instances.
Farming out the jobs via ActiveMQ is good - but it's also a well-accepted and well-known method for distributing jobs to worker nodes.
Monte Carlo, though, has no transactional interdependencies, so you're not actually doing anything other than spawning tasks to worker nodes. Given that methodology, a degradation of 20% over a 256x growth of the "cluster size" is rather shocking - adding consumers added that much degradation? That'd... worry me more than anything else.
Also: what was gained by the use of GridGain? Why not just set up an HTTP server to farm out requests RESTfully, and accept responses the same way? That way we'd be able to claim that HTTP can support up to 512 clients...
(And yes, that was meant sarcastically. Any HTTP server that can't handle 512 stateless clients needs to be taken out and shot.)

Joseph,
This is the usual problem with benchmarks - somebody will be unhappy. Unfortunately you seem to have lost your objective view on technology after having joined GigaSpaces... which is kind of understandable since you do now work for a competitor company.
Now, if you have to ask what benefits GridGain brought to the picture, you have not visited our website and practically know absolutely nothing about our product. I suggest you do some minimal reading before posting such inflammatory comments. How about these features just to name a few:
- Automatic node discovery
- Transparent grid-enabling of Java code with @Gridify annotation.
- One of the best MapReduce implementations in the industry
- Zero deployment with Peer Class Loading
- Automatic Task Topology Management
- Load Balancing
- Automatic Fail-Over
- Grid job collision resolution
- Job Stealing (from more busy nodes to less busy nodes)
- Over 50 up-to-date metrics for all grid nodes
- Elegance of design and ease of use
- Open Source under LGPL and Apache license
- Many, many more...
Now, as far as 20% overhead... in the grid as big as 512 nodes a lot of factors come into play. Note that JMS hub needs to manage 512 clients and significant overhead comes from that. I assume that GC comes into play as well here.
In any case, I will let readers form their own opinion rather than listening to a baseless rant from a competitor company.
Best,
Dmitriy Setrakyan
GridGain - Grid Computing Made Simple

This is the usual problem with benchmarks - somebody will be unhappy. Unfortunately you seem to have lost your objective view on technology after having joined GigaSpaces... which is kind of understandable since you do now work for a competitor company.

That's no loss - you guys didn't see me as objective before, no reason for you to see me as objective now.

Now, if you have to ask what benefits GridGain brought to the picture, you have not visited our website and practically know absolutely nothing about our product. I suggest you do some minimal reading before posting such inflammatory comments.

I know absolutely nothing? Nonsense. My point was, and is, that for this test gridgain added... nothing. It's a test that shows that you can use gridgain on 512 nodes... sort of, except that you could have done the same thing without GridGain. I was hoping to see something... more. Just because I work for GigaSpaces doesn't mean I'm not interested in the technology.

Note that JMS hub needs to manage 512 clients and significant overhead comes from that. I assume that GC comes into play as well here.

This was my point to begin with. You didn't show anything about your technology... the benchmark was just a dog and pony show.

You didn't show anything about your technology... the benchmark was just a dog and pony show.

Joe,
This bizarre overreaction is certainly hurting your employer's image. Think about it... Everyone's got your point no matter how ridiculous, in my opinion, it is. There's a full disclosure and information about this test for everyone to see and make their own conclusions.
Grid Dynamics could have performed many other tests with GridGain, of course, including with transactional data grids using JBoss Cacne, ehcache, your very own GigaSpaces, or Coherence, to name a few. But I think the choice of test was very correct as it shows basic and simple example of how GridGain can be used to achieved massive scalability with literally few lines of code - in the business case that is used by 100s of business around the globe today.
Relax and take a break :)
Nikita Ivanov.
GridGain - Grid Computing Made Simple

You didn't show anything about your technology... the benchmark was just a dog and pony show.

Joe,This bizarre overreaction is certainly hurting your employer's image. Think about it... Everyone's got your point no matter how ridiculous, in my opinion, it is. There's a full disclosure and information about this test for everyone to see and make their own conclusions.

Bizarre overreaction? What overreaction? I'm speaking as myself here, not for GigaSpaces; even if that were not so, how is GigaSpaces being affected by my questioning what a test is trying to show me?
I'm actually a little surprised - I didn't say anything negative about GridGain here, at all. Yet you seem to feel attacked. I don't know why.
I still don't see what the test was for. If you'd care to enlighten me instead of being defensive, that'd be great.

Grid Dynamics could have performed many other tests with GridGain, of course, including with transactional data grids using JBoss Cacne, ehcache, your very own GigaSpaces, or Coherence, to name a few. But I think the choice of test was very correct as it shows basic and simple example of how GridGain can be used to achieved massive scalability with literally few lines of code - in the business case that is used by 100s of business around the globe today.

There's the rub: Monte Carlo is in use by hundreds of businesses, sure. But they don't need GridGain to get the same numbers - or better! - that the test showed. (Nor do they need GigaSpaces to get the same numbers or better... or Coherence... or anything.)
That's why I wondered about the test. You didn't show me anything. I wanted to see something.
This is not an attack. If you want to see it as one, fine, go ahead - it's not like I've ever been able to stop you from deciding that if it's not overwhelmingly positive, it has to be negative.

I also don't think that Grid Dynamics claims anything beyond just this test - you can simply perform computationally intensive tasks with almost linear scalability on 512-node strong Amazon EC2 cloud.

This was my original point, and I regret not seeing you confirm this more clearly when I first read your responses. Should have seen it initially.
But that still goes back to my original question: linear scalability for computationally intensive tasks - especially when they're not interdependent - is not a real accomplishment. If I had 512 (okay, 514, including MQ and database hosts) servers in my own lab, I could do the same thing... with or without GridGain, with or without almost everything mentioned here. The inclusion of GridGain is important because that's what was used -- but I still haven't seen what GridGain added.
Was job stealing included? Were there any node failures? Were transactions a factor at all?

I also don't think that Grid Dynamics claims anything beyond just this test - you can simply perform computationally intensive tasks with almost linear scalability on 512-node strong Amazon EC2 cloud.

This was my original point, and I regret not seeing you confirm this more clearly when I first read your responses. Should have seen it initially.

But that still goes back to my original question: linear scalability for computationally intensive tasks - especially when they're not interdependent - is not a real accomplishment. If I had 512 (okay, 514, including MQ and database hosts) servers in my own lab, I could do the same thing... with or without GridGain, with or without almost everything mentioned here. The inclusion of GridGain is important because that's what was used -- but I still haven't seen what GridGain added.

Was job stealing included? Were there any node failures? Were transactions a factor at all?

I've read the description of the website and I have to say that one of the things that I believe differentiates marketing fluff from a serious study is transparency and a methodical reporting of method and results. For example, you claim linear scalability yet you don't offer any goodness of fit calculation. Also without any idea about how the Monte Carlo was setup, there is little anyone can say about the value of GridGain in this message.
I'm not trying to side with Joe here. What I am saying is that this has the potential to be a very cool useful study. The last benchmarking article that I did editorial work on took 4 months to complete. It was also a potentially cool study but it needed work before (IMHO) it could be published. I would like to offer you the same editorial advice that I gave then, please rework it to give us some useful information, code, methodology and statistics.
Regards,
Kirk

What is being demonstrated is something that is used by 100s of financial, banking and insurance companies daily around the globe. And it's used almost in a verbatim scenario.
I also don't think that Grid Dynamics claims anything beyond just this test - you can simply perform computationally intensive tasks with almost linear scalability on 512-node strong Amazon EC2 cloud. What I do like about this test (or benchmark) that unlike other "tests" it was:

independently performed

simple to understand

easily verifiable

We are actually actively working on performance improvements for large scale deployments (>1000s nodes) on clouds like EC2. You better download our latest branch from SVN and start looking very carefully if you want to have a chance to catch up - and you'll learn a lot too :)
Best,
Nikita Ivanov.
GridGain - Grid Computing Made Simple

TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations technology projects - with its network of technology-specific websites, events and online magazines.