JClouds is unable to do SSH on automatically selected images

Details

Description

I'm seeing the following exception when trying to start a cluster and when running integration tests without specifying an AMI and an instance type:

org.jclouds.ssh.SshException: ec2-user@184.72.64.23:22: Error connecting to session.
at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:45)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.jcraft.jsch.JSchException: Auth fail
at com.jcraft.jsch.Session.connect(Session.java:461)
at com.jcraft.jsch.Session.connect(Session.java:154)
at org.jclouds.ssh.jsch.JschSshClient.newSession(JschSshClient.java:247)
at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:186)
... 8 more

I have been able to run the entire test suite when I changed the properties files and specified an image and a instance type (ubuntu 10.4 on machine with 2GB or more).

You can reproduce the problem by trying to run the ZooKeeper recipe:$ ./bin/whirr launch-cluster --config recipes/zookeeper-ec2.properties

I've experienced this problem with the following ami: us-east-1/ami-8e1fece7 running on a t1.micro instance type.

Adrian Cole (Inactive)
added a comment - 16/Mar/11 18:23 just ran cassandra tests from trunk, whose default test uses t1.micro and also the same ami without auth problems during bootstrap. However, it does later get auth problems during configure.
Did your auth error in ZK come during bootstrap or configure?

Andrei Savu
added a comment - 16/Mar/11 19:24 It's strange that we are seeing this behavior only for some AMIs. I have been able to run all the integration tests on us-east-1/ami-da0cf8b3 running on a m1.large instance.

The problem is that we are dependent on the state of a user we don't define during runs in configure. For example, we modify the authorized keys and private key of the default user, which varies from image to image, and also cloud to cloud. This has proven problematic, as the image can change how this user is defined. While bootstrap may work well, key installation may fail for a subtlety in how that user is configured. The real way out of this is to stop depending on the installed user and instead install our own.

Adrian Cole (Inactive)
added a comment - 16/Mar/11 19:37 The problem is that we are dependent on the state of a user we don't define during runs in configure. For example, we modify the authorized keys and private key of the default user, which varies from image to image, and also cloud to cloud. This has proven problematic, as the image can change how this user is defined. While bootstrap may work well, key installation may fail for a subtlety in how that user is configured. The real way out of this is to stop depending on the installed user and instead install our own.
https://issues.apache.org/jira/browse/WHIRR-158

I've tested the patch in WHIRR-158. I think it would be more sustainable to push this through as not only does it fix this issue, but it makes whirr easier to troubleshoot (ex. don't have to remember the login-user of the image)

Adrian Cole (Inactive)
added a comment - 16/Mar/11 19:48 I've tested the patch in WHIRR-158 . I think it would be more sustainable to push this through as not only does it fix this issue, but it makes whirr easier to troubleshoot (ex. don't have to remember the login-user of the image)