[TripleO] Scaling node counts with only Ansible (N=1)

There's been a fair amount of recent work around simplifying our Heat
templates and migrating the software configuration part of our
deployment entirely to Ansible.
As part of this effort, it became apparent that we could render much
of the data that we need out of Heat in a way that is generic per
node, and then have Ansible render the node specific data during
config-download runtime.
To illustrate the point, consider when we specify ComputeCount:10 in
our templates, that much of the work that Heat is doing across those
10 sets of resources for each Compute node is duplication. However,
it's been necessary so that Heat can render data structures such as
list of IP's, lists of hostnames, contents of /etc/hosts files, etc
etc etc. If all that was driven by Ansible using host facts, then Heat
doesn't need to do those 10 sets of resources to begin with.
The goal is to get to a point where we can deploy the Heat stack with
a count of 1 for each role, and then deploy any number of nodes per
role using Ansible. To that end, I've been referring to this effort as
N=1.
The value in this work is that it directly addresses our scaling
issues with Heat (by just deploying a much smaller stack). Obviously
we'd still be relying heavily on Ansible to scale to the required
levels, but I feel that is much better understood challenge at this
point in the evolution of configuration tools.
With the patches that we've been working on recently, I've got a POC
running where I can deploy additional compute nodes with just Ansible.
This is done by just adding the additional nodes to the Ansible
inventory with a small set of facts to include IP addresses on each
enabled network and a hostname.
These patches are at
https://review.opendev.org/#/q/topic:bp/reduce-deployment-resources
and reviews/feedback are welcome.
Other points:
- Baremetal provisioning and port creation are presently handled by
Heat. With the ongoing efforts to migrate baremetal provisioning out
of Heat (nova-less deploy), I think these efforts are very
complimentary. Eventually, we get to a point where Heat is not
actually creating any other OpenStack API resources. For now, the
patches only work when using pre-provisioned nodes.
- We need to consider how we'd manage the Ansible inventory going
forward if we open up an interface for operators to manipulate it
directly. That's something we'd want to manage and preserve (version
control) as it's critical data for the deployment.
Given the progress that we've made with the POC, my sense is that
we'll keep pushing in this overall direction. I'd like to get some
feedback on the approach. We have an etherpad we are using to track
some of the work at a high level:
https://etherpad.openstack.org/p/tripleo-reduce-deployment-resources
I'll be adding some notes on how I setup the POC to that etherpad if
others would like to try it out.
--
-- James Slagle
--