Getting inside Cloud Foundry for debug (and profit?)

I’ve recently started to play with some more of the internals of Cloud Foundry than I’ve been used to. This has been made much easier by the advent of bosh-lite, a system for deploying all of Cloud Foundry’s components using the bosh continuous deployment and configuration tool, into a single virtual machine. bosh-lite achieves this by using containers (Cloud Foundry’s own Warden container technology) to “emulate” the individual VMs where jobs would run in a full distributed topology.

bosh-lite has actually been around for a number of months now, but I’ve not had much of a chance to play with it until recently. This is partly due to other activities, and also that my earlier attempts to get an environment up-and-running were hampered by lack of memory. It should be possible to run bosh-lite with a Cloud Foundry deployment in 8Gb of RAM, but given my laptop’s configuration and the amount of other stuff I’m usually running, that was never comfortable – now I’m rocking 16Gb in a MacBook Pro, things are running more smoothly.

I don’t intend to spend this post documenting how to install bosh-lite and get a running single-node Cloud Foundry system. I followed the instructions in the README and things went well on this occasion. One suggestion that I’d make is if you can, to use VMware Fusion (assuming like me, you’re on OS X) and the Vagrant provider for Fusion, seems quite a lot better than Virtualbox. If you do, don’t forget to pass the --provider=vmware_fusion flag when you bring your Vagrant image up (that’s something I do usually forget). One other little thing to mention is that after I started the bosh deployment, the bosh CLI gem timed out and returned a REST error – but the deployment process itself continued without any issues, and I was able to use bosh tasks to check in on the progress. If you are interested, I used cf-release-157 this time around.

Note: this is not about debugging applications on Cloud Foundry in general – a PaaS is an opinionated system and you generally shouldn’t need to poke around inside it like this. This is for debugging the Cloud Foundry runtime itself, or aspects that might run inside a container. Oh, and I’m sorry about the formatting of some of the shell output examples below!

Peeking at NATS traffic

NATS is the internal, lightweight message bus that Cloud Foundry components use to talk to one another. I’d read blog posts from Cornelia and from Dr Nic about digging into this before.

So now I’m on the NATS host – now what? well, strictly speaking I didn’t need to login to that host / container, since of course, as a messaging system, the other hosts can connect to it anyway. The reason I wanted to login to it was to find out how NATS was configured.

[update – Dr Nic has provided a more convenient method to do this, in the comments below – check out nats-sub – but this works, as well]

$ ./nats-all.sh
Msg received on [router.register] : '{"host":"10.244.0.134","port":8080,"uris":["login.10.244.0.34.xip.io"],"tags":{"component":"login"},"index":0,"private_instance_id":"e6194fe8-4910-4cb1-9f7c-d5ee7ff3f36b"}'
Msg received on [router.register] : '{"host":"10.244.0.130","port":8080,"uris":["uaa.10.244.0.34.xip.io"],"tags":{"component":"uaa"},"index":0,"private_instance_id":"7713dd5b-3613-41a6-9c67-c48f22a769b4"}'
Msg received on [router.register] : '{"dea":"0-1ba3459ea4cd406db833c1d188a78c02","app":"b8550851-37a0-4bd5-bdce-1d787b087887","uris":["andyp.10.244.0.34.xip.io"],"host":"10.244.0.26","port":61021,"tags":{"component":"dea-0"},"private_instance_id":"b52dfd91d68144cabb14b6c7bae77daae8b493acf1354c99941d49772a1f61fb"}'
Msg received on [router.register] : '{"dea":"0-1ba3459ea4cd406db833c1d188a78c02","app":"b8550851-37a0-4bd5-bdce-1d787b087887","uris":["andyp.10.244.0.34.xip.io"],"host":"10.244.0.26","port":61025,"tags":{"component":"dea-0"},"private_instance_id":"090f5c5aeee94fdfb4a4e0f0afde2553480dcd97c018431db37b4dffdc80fde4"}'
Msg received on [router.register] : '{"dea":"0-1ba3459ea4cd406db833c1d188a78c02","app":"b8550851-37a0-4bd5-bdce-1d787b087887","uris":["andyp.10.244.0.34.xip.io"],"host":"10.244.0.26","port":61028,"tags":{"component":"dea-0"},"private_instance_id":"92e10af77b274836a3f54373c9b7feee025c5b72f41a4c4982bde97d241ebd5b"}'
Msg received on [router.register] : '{"dea":"0-1ba3459ea4cd406db833c1d188a78c02","app":"b8550851-37a0-4bd5-bdce-1d787b087887","uris":["andyp.10.244.0.34.xip.io"],"host":"10.244.0.26","port":61039,"tags":{"component":"dea-0"},"private_instance_id":"86edf0c0a7f84f04b52693b489ad93b7f857f77271b84d568d8f5600b34f7054"}'
Msg received on [router.register] : '{"host":"10.244.0.26","port":34567,"uris":["8b24c0a7d28f4e03aa028a3dc89fb8c3.10.244.0.34.xip.io"],"tags":{"component":"directory-server-0"}}'
Msg received on [dea.advertise] : '{"id":"0-1ba3459ea4cd406db833c1d188a78c02","stacks":["lucid64"],"available_memory":23296,"available_disk":22528,"app_id_to_count":{"b8550851-37a0-4bd5-bdce-1d787b087887":10},"placement_properties":{"zone":"default"}}'
Msg received on [staging.advertise] : '{"id":"0-1ba3459ea4cd406db833c1d188a78c02","stacks":["lucid64"],"available_memory":23296}'
Msg received on [dea.heartbeat] : '{"droplets":[{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"d92d3c0c43ce4b6981e443e5c2064580","index":0,"state":"RUNNING","state_timestamp":1392639135.9526377},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"898e632697e246de9cf6b7330444227c","index":1,"state":"RUNNING","state_timestamp":1392639136.3117783},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"56d023e374aa49d88720daabac58e862","index":2,"state":"RUNNING","state_timestamp":1392639135.2225387},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"f11d86a7f4ad47f1ad554ae1b087d5f6","index":3,"state":"RUNNING","state_timestamp":1392639136.1042},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"c9e6de77f0484e6cae47f73ad6ca778a","index":4,"state":"RUNNING","state_timestamp":1392639135.9426212},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"924c387fc33444289b2db2762eefac42","index":5,"state":"RUNNING","state_timestamp":1392639135.940636},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"69866b260b1a49a09c03e178c4add2c5","index":6,"state":"RUNNING","state_timestamp":1392639135.944143},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"94bc605505d94dc1832e55bf2f671a99","index":7,"state":"RUNNING","state_timestamp":1392639135.4456258},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"8420df9bbe64456385dfa91285641ba4","index":8,"state":"RUNNING","state_timestamp":1392639135.9456131},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"ed9ad14f6599494c96f90296c59e6041","index":9,"state":"RUNNING","state_timestamp":1392639135.938359}],"dea":"0-1ba3459ea4cd406db833c1d188a78c02"}'
Msg received on [router.register] : '{"host":"10.244.0.10","port":8080,"uris":["loggregator.10.244.0.34.xip.io"]}'
Msg received on [router.register] : '{"host":"10.244.0.138","port":9022,"uris":["api.10.244.0.34.xip.io"],"tags":{"component":"CloudController"},"index":0,"private_instance_id":null}'
Msg received on [router.register] : '{"host":"10.244.0.134","port":8080,"uris":["login.10.244.0.34.xip.io"],"tags":{"component":"login"},"index":0,"private_instance_id":"e6194fe8-4910-4cb1-9f7c-d5ee7ff3f36b"}'

Warden containers and shells

Cloud Foundry’s native container technology is called Warden. When an application is deployed, Cloud Foundry starts up a Warden container based on the limits assigned in terms of memory etc, and the applications run inside that. How can you get “inside” the container to see what is going on?

Well, there are a couple of techniques. Cloud Foundry Loggregator provides streaming access to the standard application logs (stdout/stderr) via the cf logs command. Another option is James Bayer’s cool websocket-based method for getting access to the container. Yet another option is Warden’s own shell, wsh. This does assume you can access the DEA machine with ssh, however.

1. Login to the DEA VM (called “runner_z1/0″ in the list provided by bosh ssh).

2. Identify your Warden container… there are a lot showing below, but I happen to know that these are several instances of the same app. The important part is the instance-17ij46hadt2 – the second part or that value maps to the location of the container’s private space on disk.

4. Notice that the Warden containers are running as root. If you run wsh now as an unprivileged user, you’ll get a connect: Permission denied error. Time to switch to root, and then run wsh specifying the command to run inside the shell, as a parameter:

nice write-up andy! the instances.json file on the DEA should help you identify which warden container handle belongs to which app easily. you should be able to find that easily once on the dea by using something like “find /var/vcap -name instances.json”.

Hi, I have just a question that if the vmc-IronFoundry can only be inlltsaed on Windows.Now I have made micro cloud foundry and micro iron foundry run on a physical machine with Ubuntu. And I tried to push asp.net applications with the vmc-IronFoundry inlltsaed on Ubuntu. But it didn’t work.

There is a way to identify the exact warden container for a given app instance.

First get the GUID for the app, which can be seen by running “CF_TRACE=true cf app “. One of the requests should be something like “GET /v2/apps//summary”, where “” is the GUID for the app.

Once you have the app’s GUID, you can look in the file “/var/vcap/data/dea_next/db/instances.json” in the “runner” VM. This file contains the mapping of app instances to Warden containers. Look for the “application_id” and “instance_index” fields that match the app and instance you care about, then use the “warden_container_path” value in the same JSON hash.