While Ironic today supports Neutron provisioned network connectivity for
baremetal servers through an ML2 mechanism driver, the existing support
is based largely on configuration of TORs through vendor-specific mechanism
drivers, with limited capabilities.

There is a wide range of smart/intelligent NICs emerging on the market.
These NICs generally incorporate one or more general purpose CPU cores along
with data-plane packet processing acceleration, and can efficiently run
virtual switches such as OVS, while maintaining the existing interfaces to the
SDN layer.

The proposal is to extend Ironic to enable use of smart NICs to implement
generic networking services for Bare Metal servers. The goal is to enable
running the standard Neutron Open vSwitch L2 agent, providing a generic,
vendor-agnostic bare metal networking service with feature parity compared
to the virtualization use-case. The Neutron Open vSwitch L2 agent manages the
OVS bridges on the smart NIC.

In this proposal, we address two use-cases:

Neutron OVS L2 agent runs locally on the smart NIC.

This use case requires a smart NIC capable or running openstack control
services such as the Neutron OVS L2 agent. This use case strives to view
the smart NIC as an isolated hypervisor for the baremetal node, with the
smart NIC providing the services to the bare metal image running on the host
(as a hypervisor would provide services to a VM). While this spec initially
targets Neutron OVS L2 agent, the same implementation would naturally and
easily be extended to any other ML2 plugin as well as to additional
agents/services (for example exposing emulated NVMe storage devices
back-ended by a storage initiator on the smart NIC).

Neutron OVS L2 agent(s) run remotely and manages
the OVS bridges for all the baremetal smart NICs.

The OVS ML2 mechanism driver will determine if the Neutron OVS Agent runs
locally or remotely based on smart NIC configuration passed from ironic.
The config attribute will be stored in the local_link_information of the
baremetal port.

In the scope of this spec the smart NIC config will be set manually by
the admin.

Deployment Interfaces

Extending the ramdisk, direct, iscsi and ansible to support the smart nic
use-cases.

These network methods are currently ordinarily called when the baremetal is
powered down, ensuring proper network configuration on the TOR before booting
the bare metal.

smart NICs share the power state with the baremetal, requiring the baremetal
to be powered up before configuring the network. This leads to a potential
race where the baremetal boots and access the network prior to the network
being properly configured on the OVS within the smart NIC.

To ensure proper network configuration prior to baremetal boot, the
deployment interfaces will intermittently boot the baremetal into the BIOS
shell, providing a state where the ovs on the smart NIC may be configured
properly before rebooting the bare metal into the actual guest image or
ramdisk. The ovs on the smart NIC will get programmed after we verify that
the neutron ovs agent is alive.

The following code for configure/unconfigure network:

iftask.driver.network.need_power_on(task):old_power_state=task.driver.power.get_power_state(task)ifold_power_state==states.POWER_OFF:# set next boot to BIOS to halt the baremetal bootmanager_utils.node_set_boot_device(task,boot_devices.BIOS,persistent=False)manager_utils.node_power_action(task,states.POWER_ON)# ...# call task.driver.network method(s)# ...iftask.driver.network.need_power_on(task):manager_utils.node_power_action(task,old_power_state)

The following methods in the deployment interface are calling to one or
more configure/unconfigure networks and should be updated with the logic
above.

iscsi Deploy Interface

iscsi_deploy::prepare

iscsi_deploy::deploy

iscsi_deploy::tear_down

ansible Deploy Interface

ansible/deploy::reboot_and_finish_deploy

ansible/deploy::prepare

ansible/deploy::tear_down

ansible/deploy::prepare_cleaning

ansible/deploy::tear_down_cleaning

direct Interface

agent::prepare

agent::tear_down

agent::deploy

agent::rescue

agent::unrescue

agent_base_vendor::reboot_and_finish_deploy

agent_base_vendor::_finalize_rescue

RAM Disk Interface

pxe::deploy

Common cleaning methods

deploy_utils::prepare_inband_cleaning

deploy_utils::tear_down_inband_clean

Network Interface

Extend the base network_interface with need_power_on -
return true if any ironic port attached to the node is a smart nic

Delay the Neutron port binding (port binding means setting all the
OVSDB/Openflows config on the SmartNIC) to be performed by Neutron
later (once the bare metal is powered up). The problem with this
approach is that we have no guarantee of if/when the rules will be
programmed, and thus may inadvertently boot the baremetal while
the smart NIC is still programmed on the old network.

The port REST API will be modified to support the new is_smartnic
field. The field will be readable by users with the baremetal observer role
and writable by users with the baremetal admin role.

Updates to the is_smartnic field of ports will be restricted in the
same way as for other connectivity related fields (link local connection, etc.)
- they will be restricted to nodes in the enroll, inspecting and
manageable states.

Both use cases run infrastructure functionality on the smart NIC, with
the first use case also running control plane functionality.

This requires proper isolation between the untrusted bare metal host and the
smart NIC, preventing any/all direct or indirect access, both through the
network interface exposed to the host and through side channels such as the
platform BMC.

Such isolation is implemented by the smart NIC device and/or the hardware
platform vendor. There are multiple approaches for such isolation,
ranging from completely physical disconnection of the smart NIC from the
platform BMC to a platform with a trusted BMC wherein the BMC considers
the baremetal host an untrusted entity and restricts its capabilities/access
to the platform.

In the absence of such isolation, the untrusted baremetal tenant
may be able to gain access to the provisioning network, and in the second
may be able to compromise the control plane.

Proper isolation is dependent on the platform hardware/firmware, and cannot
be directly enforced/guaranteed by ironic. Users of smart NIC use case should
be made well aware of this via explicit documentation, and should be guided
to verify the proper isolation exists on their platform when enabling such
use cases.

Security Groups

This will allow to use Neutron OVS agent pipeline. One of the features in the
pipeline is security groups which will enhance the security model when using
baremetal in a cloud.

Security credentials

The node running the Neutron OVS agent (smart NIC or remote, according to use
case) should be configured with the message bus credentials for the Neutron
server.

In addition, for the second use case, the SSH public key and OVSDB SSL
certificate should be configured for the smart NIC port.