Here's an initial attempt at this (for the Java implementation). Configuration is generated by a HadoopConfigurationBuilder, and is pushed to a file on cluster nodes using jclouds' Statements.createFile call.

HadoopConfigurationBuilder takes care of dynamic properties like fs.default.name and mapred.job.tracker which depend on the cluster object. It may be extended in future to set mapred.reduce.tasks according to the number of slots in the cluster, or mapred.tasktracker.

{map,reduce}

.tasks.maximum according to the number of CPUs on each instance.

Properties may be overridden by specifying them in the Whirr configuration. For example, to override Hadoop's dfs.replication property to 2 you would add

hadoop-hdfs.dfs.replication=2

to your Whirr properties file. The hadoop-hdfs prefix signifies that the property should go in hdfs-site.xml. (This patch also incorporates WHIRR-149.)

As a simplification, this patch also removes the webserver running on the namenode, since the URLs for the namenode and jobtracker are now logged explicitly:

Namenode web UI available at http://ec2-184-73-89-144.compute-1.amazonaws.com:50070
Jobtracker web UI available at http://ec2-184-73-89-144.compute-1.amazonaws.com:50030

Tom White
added a comment - 09/Dec/10 01:51 Here's an initial attempt at this (for the Java implementation). Configuration is generated by a HadoopConfigurationBuilder, and is pushed to a file on cluster nodes using jclouds' Statements.createFile call.
HadoopConfigurationBuilder takes care of dynamic properties like fs.default.name and mapred.job.tracker which depend on the cluster object. It may be extended in future to set mapred.reduce.tasks according to the number of slots in the cluster, or mapred.tasktracker.
{map,reduce}
.tasks.maximum according to the number of CPUs on each instance.
Properties may be overridden by specifying them in the Whirr configuration. For example, to override Hadoop's dfs.replication property to 2 you would add
hadoop-hdfs.dfs.replication=2
to your Whirr properties file. The hadoop-hdfs prefix signifies that the property should go in hdfs-site.xml. (This patch also incorporates WHIRR-149 .)
As a simplification, this patch also removes the webserver running on the namenode, since the URLs for the namenode and jobtracker are now logged explicitly:
Namenode web UI available at http://ec2-184-73-89-144.compute-1.amazonaws.com:50070
Jobtracker web UI available at http://ec2-184-73-89-144.compute-1.amazonaws.com:50030
so you can go directly to the web UIs.

One small issue: you should create ClusterSpec instances by using the factory methods ClusterSpec.withTemporaryKeys or ClusterSpec.withNoDefaults in tests to avoid re-adding the dependency on .ssh/id_rsa.

All unit tests are passing for me. Unfortunately I haven't been able to run the hadoop integration tests. They are failing with the following errors

Andrei Savu
added a comment - 07/Jan/11 10:38 Looks great!
One small issue: you should create ClusterSpec instances by using the factory methods ClusterSpec.withTemporaryKeys or ClusterSpec.withNoDefaults in tests to avoid re-adding the dependency on .ssh/id_rsa .
All unit tests are passing for me. Unfortunately I haven't been able to run the hadoop integration tests. They are failing with the following errors
channel 2: open failed: connect failed: Connection refused
-------------------------------------------------------------------------------
Test set: org.apache.whirr.service.hadoop.integration.HadoopServiceTest
-------------------------------------------------------------------------------
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 449.269 sec <<< FAILURE!
org.apache.whirr.service.hadoop.integration.HadoopServiceTest Time elapsed: 0 sec <<< ERROR!
java.io.IOException: Call to ec2-50-16-4-0.compute-1.amazonaws.com/50.16.4.0:8021 failed on local exception: java.net.SocketException: Malformed reply from SOCKS server
at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at org.apache.hadoop.mapred.$Proxy76.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
at org.apache.whirr.service.hadoop.integration.HadoopServiceController.startup(HadoopServiceController.java:89)
at org.apache.whirr.service.hadoop.integration.HadoopServiceController.ensureClusterRunning(HadoopServiceController.java:68)
at org.apache.whirr.service.hadoop.integration.HadoopServiceTest.setUp(HadoopServiceTest.java:54)
...
Is this only happening to me (I've seen integration tests failing thanks to internet connectivity issues - I have tried multiple times)?

> you should create ClusterSpec instances by using the factory methods ClusterSpec.withTemporaryKeys or ClusterSpec.withNoDefaults in tests to avoid re-adding the dependency on .ssh/id_rsa.

I'll produce a new patch for this.

I've been running Hadoop integration tests OK, but I haven't run these yet. The CDH side of this patch needs doing too. I'm tempted to leave this out of 0.3.0, but would like to hear thoughts from others.

Tom White
added a comment - 07/Jan/11 17:20 > you should create ClusterSpec instances by using the factory methods ClusterSpec.withTemporaryKeys or ClusterSpec.withNoDefaults in tests to avoid re-adding the dependency on .ssh/id_rsa.
I'll produce a new patch for this.
I've been running Hadoop integration tests OK, but I haven't run these yet. The CDH side of this patch needs doing too. I'm tempted to leave this out of 0.3.0, but would like to hear thoughts from others.

Right now I was rebasing from the trunk to rebuild the WHIRR-167 patch and when I was running the integration tests for Hadoop, it fail in the same way as for Andrei. Is this patch applied to trunk or not yet?

Tibor Kiss
added a comment - 02/Feb/11 13:39 Right now I was rebasing from the trunk to rebuild the WHIRR-167 patch and when I was running the integration tests for Hadoop, it fail in the same way as for Andrei. Is this patch applied to trunk or not yet?

This patch it's not applied to the trunk. I have just tried multiple times (even using different internet connections and cloud providers) to run the integration tests for cdh and hadoop and they always fail with the same error message:

I will try to track this down to one of the recently committed patches. We really need to setup a CI server that could run all the suite all the time. It's extremely time consuming to do this on the development machine.

Andrei Savu
added a comment - 02/Feb/11 13:54 This patch it's not applied to the trunk. I have just tried multiple times (even using different internet connections and cloud providers) to run the integration tests for cdh and hadoop and they always fail with the same error message:
-------------------------------------------------------------------------------
Test set: org.apache.whirr.service.cdh.integration.CdhHadoopServiceTest
-------------------------------------------------------------------------------
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 336.63 sec <<< FAILURE!
test(org.apache.whirr.service.cdh.integration.CdhHadoopServiceTest) Time elapsed: 336.53 sec <<< ERROR!
java.io.IOException: Call to ec2-50-16-169-138.compute-1.amazonaws.com/50.16.169.138:8021 failed on local exception: java.net.SocketException: Malformed reply from SOCKS server
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1089)
at org.apache.hadoop.ipc.Client.call(Client.java:1057)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at org.apache.hadoop.mapred.$Proxy76.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:369)
at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:486)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:471)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:456)
at org.apache.whirr.service.cdh.integration.CdhHadoopServiceTest.test(CdhHadoopServiceTest.java:87)
I will try to track this down to one of the recently committed patches. We really need to setup a CI server that could run all the suite all the time. It's extremely time consuming to do this on the development machine.

Thx.
Last time I have made successful integration tests when I applied my WHIRR-167 patch to the 1059503 revision on the trunk.
Now we are at 1065812 revision on the trunk. Somewhere in between these revision numbers has been introduced the problem. I'm sure you or somebody else can further reduce the interval of search.

Even if you have CI server, sometimes is inefficient to run integration tests on each patch you apply. Maybe a few patches together it worst, then if fails further dividing can be applied. The problem to automate the integration test while merging the patches is caused by the fact that sometimes you want to run the integration test before you are commiting. I don't know how this can be solved, or you just commit merge the patches into trunk, then one by one you will wait for the results on CI server? (sorry for offtopic).

Tibor Kiss
added a comment - 02/Feb/11 14:11 - edited Thx.
Last time I have made successful integration tests when I applied my WHIRR-167 patch to the 1059503 revision on the trunk.
Now we are at 1065812 revision on the trunk. Somewhere in between these revision numbers has been introduced the problem. I'm sure you or somebody else can further reduce the interval of search.
Even if you have CI server, sometimes is inefficient to run integration tests on each patch you apply. Maybe a few patches together it worst, then if fails further dividing can be applied. The problem to automate the integration test while merging the patches is caused by the fact that sometimes you want to run the integration test before you are commiting. I don't know how this can be solved, or you just commit merge the patches into trunk, then one by one you will wait for the results on CI server? (sorry for offtopic).

+1 The patch looks good. I have been able to run the integration tests for HBase. I'm not able to check CDH because I'm on a crappy internet connection right now. If CDH is working for you then I believe it's safe to commit.

Andrei Savu
added a comment - 10/Feb/11 18:12 +1 The patch looks good. I have been able to run the integration tests for HBase. I'm not able to check CDH because I'm on a crappy internet connection right now. If CDH is working for you then I believe it's safe to commit.