Details

Description

The Ubuntu images available in Rackspace's new "managed" cloud service do not use kernel that contains a Rackspace-specific title, and they have a different ARP MAC address. We've seen this across multiple instances, and have added a fix in our fork of Ohai.https://github.com/opscode/ohai/pull/20

Bryan McLellan [Chef]
added a comment - 28/Apr/11 9:57 PM Do they always appear to have this MAC address that is added in the patch?
Can I ask you to please fill out a CLA as described in http://wiki.opscode.com/display/chef/How+to+Contribute?

As a result of this bug, the cloud attributes aren't being set, which is causing the MySQL cookbook to fail. I'm not sure if cloud should always be present or not, so I'm uncertain if this is an Ohai or MySQL cookbook bug.

Bryan McLellan [Chef]
added a comment - 02/Dec/11 10:00 PM I opened a ticket with Rackspace Cloud to see if they can provide any help. So far their only solution is running whois to check the registrar on the netblock.

There is a fix on OHAI-313 for adding 00:00:0c:07:ac:02 to the list, but we may be chasing a ghost with the use of MAC addresses.

From Rackspace:

I understand the urgency and how critical it is to come up with a good long term solution. There are 2 recent changes that could have caused this:

1) We recently began building all servers on the Xen Server hypervisor, so the kernel is now coming from the guest itself as opposed to being provided by the host. (This would have kicked in on 11/14/11 for Ubuntu 11.04)

2) We began provisioning new servers into our DFW2 Datacenter (This would have occurred on Nov 30)

A whois lookup would be a pretty solid way to determine if it is ours.

Another option that would apply to Linux on Xen Server would be the presence of the nova-agent. The nova-agent would work for the time being, but as Openstack grows in popularity, you can expect to see people running private implementations of openstack running the nova-agent as well.

Another way that comes to mind would be to query /etc/resolv.conf. We have 2 pairs that you should see:

ORD1:
nameserver 173.203.4.8
nameserver 173.203.4.9

DFW1 and DFW2:
nameserver 72.3.128.240
nameserver 72.3.128.241

I hope that this gets you on the right track! Please let me know if there is anything else that I can do!

Bryan McLellan [Chef]
added a comment - 06/Dec/11 1:42 PM There is a fix on OHAI-313 for adding 00:00:0c:07:ac:02 to the list, but we may be chasing a ghost with the use of MAC addresses.
From Rackspace:
I understand the urgency and how critical it is to come up with a good long term solution. There are 2 recent changes that could have caused this:
1) We recently began building all servers on the Xen Server hypervisor, so the kernel is now coming from the guest itself as opposed to being provided by the host. (This would have kicked in on 11/14/11 for Ubuntu 11.04)
2) We began provisioning new servers into our DFW2 Datacenter (This would have occurred on Nov 30)
A whois lookup would be a pretty solid way to determine if it is ours.
Another option that would apply to Linux on Xen Server would be the presence of the nova-agent. The nova-agent would work for the time being, but as Openstack grows in popularity, you can expect to see people running private implementations of openstack running the nova-agent as well.
Another way that comes to mind would be to query /etc/resolv.conf. We have 2 pairs that you should see:
ORD1:
nameserver 173.203.4.8
nameserver 173.203.4.9
DFW1 and DFW2:
nameserver 72.3.128.240
nameserver 72.3.128.241
I hope that this gets you on the right track! Please let me know if there is anything else that I can do!

I replied to Rackspace indicating that methods like whois are unacceptable because they require network access, which isn't guaranteed on all systems.

ARP isn't the awesome end-all, as we've seen in OHAI-305 these tables do empty-out, but most cloud servers should have some traffic to keep them alive, being on the intertubes and all. Still, OHAI-313 should get us going again. I do want to make sure that we add some comments to these MAC addresses though so we don't end up with a long list of addresses and no way to reconcile them down the road.

The nameservers are fragile to any network design changes in Rackspace, of course we're fragile to any image and platform changes right now. Our Rackspace Managed hosting systems use the same nameservers, so these are not limited to Rackspace Cloud and thus ruled out.

Bryan McLellan [Chef]
added a comment - 06/Dec/11 2:06 PM I replied to Rackspace indicating that methods like whois are unacceptable because they require network access, which isn't guaranteed on all systems.
ARP isn't the awesome end-all, as we've seen in OHAI-305 these tables do empty-out, but most cloud servers should have some traffic to keep them alive, being on the intertubes and all. Still, OHAI-313 should get us going again. I do want to make sure that we add some comments to these MAC addresses though so we don't end up with a long list of addresses and no way to reconcile them down the road.
The nameservers are fragile to any network design changes in Rackspace, of course we're fragile to any image and platform changes right now. Our Rackspace Managed hosting systems use the same nameservers, so these are not limited to Rackspace Cloud and thus ruled out.

Jerry Chen
added a comment - 06/Dec/11 2:32 PM FWIW, I also had opened a ticket with Rackspace and they came back with two options:
detection by IP address – this seems even more brittle and more difficult to maintain than the current list of MAC addresses
add/check metadata via the Rackspace API [1] – this would be dependent on an external API, require credentials, etc
Unfortunately, neither seem like viable options.
[1] http://docs.rackspace.com/servers/api/v1.0/cs-devguide/content/Create_Server-d1e1937.html

If you're building these out via the API you may want to also consider injecting a file via the api. When you make a new build via the API you have the option to inject data into the server. This way you always have a consistent file to check on with specific data. Unfortunately there's not much we can push from our provisioning end to provide marker identifiers for our deployments, especially from just within the server.

I'm starting to think we're going to need a spamassassin type scoring system, where no one hit identifies it as a rackspace system, but a handful of them will push it over the line of being beyond reasonable doubt.

I've asked for clarification regarding the hostId field in the server metadata.

Bryan McLellan [Chef]
added a comment - 08/Dec/11 4:06 PM My pessimism toward Rackspace's offers returned this latest reply:
If you're building these out via the API you may want to also consider injecting a file via the api. When you make a new build via the API you have the option to inject data into the server. This way you always have a consistent file to check on with specific data. Unfortunately there's not much we can push from our provisioning end to provide marker identifiers for our deployments, especially from just within the server.
I'm starting to think we're going to need a spamassassin type scoring system, where no one hit identifies it as a rackspace system, but a handful of them will push it over the line of being beyond reasonable doubt.
I've asked for clarification regarding the hostId field in the server metadata.

Thinking on it, with our recent kernel changes, it isn't going to be easy to determine if a server is indeed a Rackspace server without Internet access. I will keep thinking on this to see if there is something obvious we are missing.

Bryan McLellan [Chef]
added a comment - 19/Dec/11 11:27 PM Response from a Rackspace System Administrator:
Thinking on it, with our recent kernel changes, it isn't going to be easy to determine if a server is indeed a Rackspace server without Internet access. I will keep thinking on this to see if there is something obvious we are missing.
I'll see if we can find a better support avenue.

For those experiencing this bug, one work around is to distribute your own version of the rackspace plugin using either the fix presented in OHAI-313 (if that particular fix works in your case) or by simply removing the rackspace detection altogether if you know you are on rackspace. You can distribute the plugin to your nodes using the ohai cookbook from the community site. Note: this should be considered a temporary workaround only.

Typically, to use this cookbook, you simply place your custom plugins in files/default/plugins. However, since you will be using this to override the default plugins, you will need to ensure that your custom plugin is used before the default rackspace plugin. One way to do this is by changing the following line:

Steven Danna
added a comment - 19/Dec/11 11:53 PM - edited For those experiencing this bug, one work around is to distribute your own version of the rackspace plugin using either the fix presented in OHAI-313 (if that particular fix works in your case) or by simply removing the rackspace detection altogether if you know you are on rackspace. You can distribute the plugin to your nodes using the ohai cookbook from the community site. Note: this should be considered a temporary workaround only.
Typically, to use this cookbook, you simply place your custom plugins in files/default/plugins. However, since you will be using this to override the default plugins, you will need to ensure that your custom plugin is used before the default rackspace plugin. One way to do this is by changing the following line:
https://github.com/opscode/cookbooks/blob/master/ohai/recipes/default.rb#L20
to
Ohai::Config[:plugin_path].unshift(node['ohai']['plugin_path'])
and adding a line such as the following to client.rb on the node:
Ohai::Config[:plugin_path].unshift( "/etc/chef/ohai_plugins" )

Bryan McLellan [Chef]
added a comment - 20/Dec/11 10:24 PM The MAC addresses that we've previously used to detect Rackspace Cloud, and the latest one we are seeing are listed here:
00:00:0c:07:ac:01
00:00:0c:9f:f0:01
00:00:0c:07:ac:02
These are Cisco HSRP (Hot Standby Router Protocol) virtual MAC addresses.
RFC 2281 [1] specifies 00-00-0C-07-AC-xx as the prefix for HSRP v1 while 00-00-0C-9F-Fx-xx is used for HSRP v2 [2] .
Consequently by identifying these addresses as Rackspace Cloud without a second factor, we're dooming most networks with HSRP to be falsely detected as Rackspace Cloud.
I have an email out to some corporate contacts in Rackspace in search of a better solution.
[1] http://tools.ietf.org/html/rfc2281
[2] http://www.cisco.com/en/US/docs/ios/12_3t/12_3t4/feature/guide/gthsrpv2.html#wp1027184

I just got off a conference call with some folks over at Rackspace. They didn't have a solution, but will work on it. I didn't get why, but they said that the Cisco HSRP virtual MACs will disappear soon. I assumed they're moving to some other network level solution or adding something in the middle there. So unfortunately, the number of people exposed to this may increase.

Bryan McLellan [Chef]
added a comment - 22/Dec/11 8:53 PM I just got off a conference call with some folks over at Rackspace. They didn't have a solution, but will work on it. I didn't get why, but they said that the Cisco HSRP virtual MACs will disappear soon. I assumed they're moving to some other network level solution or adding something in the middle there. So unfortunately, the number of people exposed to this may increase.

Boyd Hemphill
added a comment - 11/Apr/12 3:41 PM This is for info only:
on 4/10/2012 before noon, I was able to build with image 49 and NOT experience this issue. After that, and up to now, I am officially hosed and seeking a hack to work around the issue.
I hope the specific dates help in identifying any delta on the RS side that might prove useful.

Scott M. Likens
added a comment - 11/Apr/12 5:13 PM I just had the oh so wonderful joy of booting an Ubunty Oneric image and having to add 00:15:17:70:1b:1e to the rackspace plugin as that was not listed.
This whole arp detection scheme is a bad joke, can we get Rackspace to just deliver the JSON format of the instance creation (blob) in SMBIOS so we can just dmidecode?

I am sure there are probably caveats that I've not thought of, but to me it seems like a straightforward solution to a problem that has lots of complexities.

The idea is that you write a file at /etc/cloud. The contents of this file are a simple string with the name of the cloud - 'rackspace', 'ec2', or 'eucalyptus'. (as of now) The Ohai plugins look for the existence of this file and the contents of it before doing their other detections.

Then inside the knife-rackspace plugin, we simply inject the file at server create time. Not using the knife-rackspace plugin? `echo 'rackspace' > /etc/cloud` somewhere in your bootstrap (whatever it is)

Eric Hankins
added a comment - 12/Apr/12 1:57 PM - edited I'm putting this forth as a possible solution:
https://github.com/opscode/ohai/pull/73
I am sure there are probably caveats that I've not thought of, but to me it seems like a straightforward solution to a problem that has lots of complexities.
The idea is that you write a file at /etc/cloud. The contents of this file are a simple string with the name of the cloud - 'rackspace', 'ec2', or 'eucalyptus'. (as of now) The Ohai plugins look for the existence of this file and the contents of it before doing their other detections.
Then inside the knife-rackspace plugin, we simply inject the file at server create time. Not using the knife-rackspace plugin? `echo 'rackspace' > /etc/cloud` somewhere in your bootstrap (whatever it is)
Thoughts? Feelings? Hopes? Dreams?

I would like to confirm the same problem on the rackspace. The new instances don't have "rackspace" and "cloud" mashes, the kernel name doesn't contain 'rscloud', the mac addresses of gateway are different that listed in rackspace plugin.

Michal Semeniuk
added a comment - 12/Apr/12 2:52 PM I would like to confirm the same problem on the rackspace. The new instances don't have "rackspace" and "cloud" mashes, the kernel name doesn't contain 'rscloud', the mac addresses of gateway are different that listed in rackspace plugin.

I wonder if we should add a comment to /etc/cloud to explain what it is for and how it gets generated.

I like /etc/cloud better than a unique file. I wonder if someone would be tempted to overload this file.

Are there going to be differences between Rackspace today and their Openstack based platform? If so, we probably want to be aware of which one we're on. Is "rackspace-openstack" enough? Can we detect that we're on Openstack some other way and figure this out? Should this file have fields like:

Bryan McLellan [Chef]
added a comment - 12/Apr/12 4:11 PM Eric, a few thoughts...
The copyright date in cloud.rb in your patch is not correct.
We should probably comment around: these lines to explain what's happening.
+ cloud_file?('rackspace') || has_rackspace_mac? || has_rackspace_kernel?
I wonder if we should add a comment to /etc/cloud to explain what it is for and how it gets generated.
I like /etc/cloud better than a unique file. I wonder if someone would be tempted to overload this file.
Are there going to be differences between Rackspace today and their Openstack based platform? If so, we probably want to be aware of which one we're on. Is "rackspace-openstack" enough? Can we detect that we're on Openstack some other way and figure this out? Should this file have fields like:
provider: rackspace
platform: openstack
Can anyone think of an existing "wheel" that we are recreating here?

Bryan McLellan [Chef]
added a comment - 12/Apr/12 4:17 PM Certainly in the case of OHAI-310 and Amazon VPC we're going to want a way to differentiate between regular EC2 and VPC because you've got quite a different network.

Bryan McLellan [Chef]
added a comment - 12/Apr/12 4:22 PM We've used node attributes to store this kind of information in the past, but Ohai doesn't know anything about nodes. Are we really looking for a DSL for providing hints to Ohai?

When I saw the rackspace detection code first time, I thought that it was a joke to detect the provider based on gateway mac address
I asked the rackspace support team for any idea how to detect that node was created on Rackspace, the answer:

-----------------------
As you are aware the Newer infrastructure that we are employing no longer is using the RSKernel, and as you have stated the MAC address identification is VERY unreliable. If this was a System that I was maintaining I would use an injected file as a a point of detection. When you are deploying server you can inject a small file into the server that you would be able to query as a mode of identification.

The provided JSON file is an example on how to inject the file at build which should allow you to detect the instance.
-----------------------

I think that this file should be injected in the bootstrap phase. Probably there is no way to detect cloud provider from the node level. I was checking the dmesg output and /proc file system and I found only information about virtualization technology (rackspace uses Xen).

Michal Semeniuk
added a comment - 12/Apr/12 5:09 PM - edited When I saw the rackspace detection code first time, I thought that it was a joke to detect the provider based on gateway mac address
I asked the rackspace support team for any idea how to detect that node was created on Rackspace, the answer:
-----------------------
As you are aware the Newer infrastructure that we are employing no longer is using the RSKernel, and as you have stated the MAC address identification is VERY unreliable. If this was a System that I was maintaining I would use an injected file as a a point of detection. When you are deploying server you can inject a small file into the server that you would be able to query as a mode of identification.
## Build Server, Inject a file, JSON
{ "server" : { "name" : "$SERVERNAME", "imageId" : $IMAGEIDNUMBER, "flavorId" : $FLAVORNUMBER, "metadata" : { "My Server Name" : "$SOMEMETADATA" }, "personality" : [ { "path" : "/root/RSCLOUD.txt", "contents" : "HERE IS YOUR BASE64 ENCODED INFORMATION" } ] } }
The provided JSON file is an example on how to inject the file at build which should allow you to detect the instance.
-----------------------
I think that this file should be injected in the bootstrap phase. Probably there is no way to detect cloud provider from the node level. I was checking the dmesg output and /proc file system and I found only information about virtualization technology (rackspace uses Xen).

Michal Semeniuk we often wonder why Rackspace, Amazon and others don't bother to provide DMI information. This is pretty established and would be pretty awesome.

Part of the problem is finding a solution that is extensible and not a quick hack just for Rackspace. The ticket that Opscode filed with Rackspace Cloud was #256467 if anyone wants to reference it when talking to Rackspace.

Bryan McLellan [Chef]
added a comment - 12/Apr/12 5:29 PM Michal Semeniuk we often wonder why Rackspace, Amazon and others don't bother to provide DMI information. This is pretty established and would be pretty awesome.
Part of the problem is finding a solution that is extensible and not a quick hack just for Rackspace. The ticket that Opscode filed with Rackspace Cloud was #256467 if anyone wants to reference it when talking to Rackspace.

Boyd Hemphill
added a comment - 13/Apr/12 4:11 PM I did not read this carefully enough. It was for "managed" servers.
My problem as described above started yesterday and is for unmanaged servers. Roughly 120 of them at this time.

Boyd Hemphill, none of Rackspaces services provide a method to identify them that is reliable, so everyone is in the same boat here. A couple of us are working one testing and thinking about the patch Eric Hankins provided above.

Bryan McLellan [Chef]
added a comment - 13/Apr/12 8:20 PM Boyd Hemphill , none of Rackspaces services provide a method to identify them that is reliable, so everyone is in the same boat here. A couple of us are working one testing and thinking about the patch Eric Hankins provided above.

The idea is that we drop JSON files in /etc/chef/ohai/hints (or other paths that you might configure). The file names are keys, the contents of the file are values. You don't have to have anything inside the file - the existence of the file may be enough depending on the hint you need, but maybe you need additional data for the hint (in which case you can populate the file with some JSON). That's really up to the plugin using the hint.

For the current situation, it seemed like enough just to have a rackspace or ec2 file, but I could imagine putting some data inside rackspace that says "oh hey we are on the v2 API so do this, not that", or in the ec2 file that says "this is ec2-vpc, and this additional value will be helpful to you".

Thoughts:

I needed platform_specific_path from Chef::Config so I blindly copied it in. Can we move it to mixlib-config?

I wanted to use deep_merge to merge all possible hints for the same key together (from all the possible paths), but we don't use the deep_merge gem anymore and the deep_merge code is now locked up inside the chef code. But maybe we don't really want to merge the hints together after all.

There are probably some more specs to be written, but someone better at TDD than me will have to give me some hints on what types of specs to write.

Eric Hankins
added a comment - 14/Apr/12 2:05 PM Okay, I've reworked the patch from above to make it a (slightly?) more generic "hint system". Check: https://github.com/opscode/ohai/pull/73
The idea is that we drop JSON files in /etc/chef/ohai/hints (or other paths that you might configure). The file names are keys, the contents of the file are values. You don't have to have anything inside the file - the existence of the file may be enough depending on the hint you need, but maybe you need additional data for the hint (in which case you can populate the file with some JSON). That's really up to the plugin using the hint.
For the current situation, it seemed like enough just to have a rackspace or ec2 file, but I could imagine putting some data inside rackspace that says "oh hey we are on the v2 API so do this, not that", or in the ec2 file that says "this is ec2-vpc, and this additional value will be helpful to you".
Thoughts:
I needed platform_specific_path from Chef::Config so I blindly copied it in. Can we move it to mixlib-config?
I wanted to use deep_merge to merge all possible hints for the same key together (from all the possible paths), but we don't use the deep_merge gem anymore and the deep_merge code is now locked up inside the chef code. But maybe we don't really want to merge the hints together after all.
There are probably some more specs to be written, but someone better at TDD than me will have to give me some hints on what types of specs to write.
Okay, let me know what you think!

Bryan McLellan [Chef]
added a comment - 19/Apr/12 4:50 PM The hint system has been merged to master and will be in Ohai 0.6.14
Now if everyone is happy, we need to modify the cloud plugins to create this file on bootstrap or in the case of Rackspace possibly use the API to create the file.

Bryan McLellan [Chef]
added a comment - 19/Apr/12 4:52 PM Using this patch for rackspace means you need to create the file '/etc/chef/ohai/hints/rackspace.json' to hint to ohai that you're on the rackspace cloud.