Navigation

The IAllocator protocol has been extended by a new allocate-secondary
request type. Currently, this new request type is only used when in disk
conversion to DRBD no secondary node is specified. As long as this new
feature is not used, a third-party IAllocator not aware of this extension can
be continued to be used.

htools now also take into account N+1 redundancy for plain and shared
storage. To obtain the old behavior, add the --no-capacity-checks option.

hail now tries to keep the overall cluster balanced; in particular it
now prefers more empty groups over groups that are internally more balanced.

The option --no-node-setup of gnt-nodeadd is disabled.
Instead, the cluster configuration parameter modify_ssh_setup is
used to determine whether or not to manipulate the SSH setup of a new
node.

Timeouts for communication with luxid have been increased. As a consequence,
Ganeti tools communicating (directly or indirectly) with luxid also time out
later. Please increase all timeouts for higher level tools interacting with
Ganeti accordingly.

Ganeti provides a RESTful control interface called the RAPI. Its HTTPS
implementation is vulnerable to DoS attacks via client-initiated SSL
parameter renegotiation. While the interface is not meant to be exposed
publicly, due to the fact that it binds to all interfaces, we believe
some users might be exposing it unintentionally and are vulnerable. A
DoS attack can consume resources meant for Ganeti daemons and instances
running on the master node, making both perform badly.

Fixes are not feasible due to the OpenSSL Python library not exposing
functionality needed to disable client-side renegotiation. Instead, we
offer instructions on how to control RAPI’s exposure, along with info
on how RAPI can be setup alongside an HTTPS proxy in case users still
want or need to expose the RAPI interface. The instructions are
outlined in Ganeti’s security document: doc/html/security.html

CVE-2015-7945

Ganeti leaks the DRBD secret through the RAPI interface. Examining job
results after an instance information job reveals the secret. With the
DRBD secret, access to the local cluster network, and ARP poisoning,
an attacker can impersonate a Ganeti node and clone the disks of a
DRBD-based instance. While an attacker with access to the cluster
network is already capable of accessing any data written as DRBD
traffic is unencrypted, having the secret expedites the process and
allows access to the entire disk.

Fixes contained in this release prevent the secret from being exposed
via the RAPI. The DRBD secret can be changed by converting an instance
to plain and back to DRBD, generating a new secret, but redundancy will
be lost until the process completes.
Since attackers with node access are capable of accessing some and
potentially all data even without the secret, we do not recommend that
the secret be changed for existing instances.

In order to improve allocation efficiency when using DRBD, the cluster
metric now takes the total reserved memory into account. A consequence
of this change is that the best possible cluster metric is no longer 0.
htools(1) interprets minimal cluster scores to be offsets of the theoretical
lower bound, so only users interpreting the cluster score directly should
be affected.

This release contains a fix for the problem that different encodings in
SSL certificates can break RPC communication (issue 1094). The fix makes
it necessary to rerun ‘gnt-cluster renew-crypto –new-node-certificates’
after the cluster is fully upgraded to 2.14.1

Ganeti provides a RESTful control interface called the RAPI. Its HTTPS
implementation is vulnerable to DoS attacks via client-initiated SSL
parameter renegotiation. While the interface is not meant to be exposed
publicly, due to the fact that it binds to all interfaces, we believe
some users might be exposing it unintentionally and are vulnerable. A
DoS attack can consume resources meant for Ganeti daemons and instances
running on the master node, making both perform badly.

Fixes are not feasible due to the OpenSSL Python library not exposing
functionality needed to disable client-side renegotiation. Instead, we
offer instructions on how to control RAPI’s exposure, along with info
on how RAPI can be setup alongside an HTTPS proxy in case users still
want or need to expose the RAPI interface. The instructions are
outlined in Ganeti’s security document: doc/html/security.html

CVE-2015-7945

Ganeti leaks the DRBD secret through the RAPI interface. Examining job
results after an instance information job reveals the secret. With the
DRBD secret, access to the local cluster network, and ARP poisoning,
an attacker can impersonate a Ganeti node and clone the disks of a
DRBD-based instance. While an attacker with access to the cluster
network is already capable of accessing any data written as DRBD
traffic is unencrypted, having the secret expedites the process and
allows access to the entire disk.

Fixes contained in this release prevent the secret from being exposed
via the RAPI. The DRBD secret can be changed by converting an instance
to plain and back to DRBD, generating a new secret, but redundancy will
be lost until the process completes.
Since attackers with node access are capable of accessing some and
potentially all data even without the secret, we do not recommend that
the secret be changed for existing instances.

The SSH security changes reduced the number of nodes which can SSH into
other nodes. Unfortunately enough, the Ganeti implementation of migration
for the xl stack of Xen required SSH to be able to migrate the instance,
leading to a situation where full movement of an instance around the cluster
was not possible. This version fixes the issue by using socat to transfer
instance data. While socat is less secure than SSH, it is about as secure as
xm migrations, and occurs over the secondary network if present. As a
consequence of this change, Xen instance migrations using xl cannot occur
between nodes running 2.14.0 and 2.14.1.

This release contains a fix for the problem that different encodings in
SSL certificates can break RPC communication (issue 1094). The fix makes
it necessary to rerun ‘gnt-cluster renew-crypto –new-node-certificates’
after the cluster is fully upgraded to 2.14.1

The build system now enforces external Haskell dependencies to lie in
a supported range as declared by our new ganeti.cabal file.

Basic support for instance reservations has been added. Instance addition
supports a –forthcoming option telling Ganeti to only reserve the resources
but not create the actual instance. The instance can later be created with
by passing the –commit option to the instance addition command.

Node tags starting with htools:nlocation: now have a special meaning to htools(1).
They control between which nodes migration is possible, e.g., during hypervisor
upgrades. See hbal(1) for details.

The node-allocation lock as been removed for good, thus speeding up parallel
instance allocation and creation.

The external storage interface has been extended by optional open
and close scripts.

Ganeti provides a RESTful control interface called the RAPI. Its HTTPS
implementation is vulnerable to DoS attacks via client-initiated SSL
parameter renegotiation. While the interface is not meant to be exposed
publicly, due to the fact that it binds to all interfaces, we believe
some users might be exposing it unintentionally and are vulnerable. A
DoS attack can consume resources meant for Ganeti daemons and instances
running on the master node, making both perform badly.

Fixes are not feasible due to the OpenSSL Python library not exposing
functionality needed to disable client-side renegotiation. Instead, we
offer instructions on how to control RAPI’s exposure, along with info
on how RAPI can be setup alongside an HTTPS proxy in case users still
want or need to expose the RAPI interface. The instructions are
outlined in Ganeti’s security document: doc/html/security.html

CVE-2015-7945

Ganeti leaks the DRBD secret through the RAPI interface. Examining job
results after an instance information job reveals the secret. With the
DRBD secret, access to the local cluster network, and ARP poisoning,
an attacker can impersonate a Ganeti node and clone the disks of a
DRBD-based instance. While an attacker with access to the cluster
network is already capable of accessing any data written as DRBD
traffic is unencrypted, having the secret expedites the process and
allows access to the entire disk.

Fixes contained in this release prevent the secret from being exposed
via the RAPI. The DRBD secret can be changed by converting an instance
to plain and back to DRBD, generating a new secret, but redundancy will
be lost until the process completes.
Since attackers with node access are capable of accessing some and
potentially all data even without the secret, we do not recommend that
the secret be changed for existing instances.

This release contains a fix for the problem that different encodings in
SSL certificates can break RPC communication (issue 1094). The fix makes
it necessary to rerun ‘gnt-cluster renew-crypto –new-node-certificates’
after the cluster is fully upgraded to 2.13.2

The SSH security changes reduced the number of nodes which can SSH into
other nodes. Unfortunately enough, the Ganeti implementation of migration
for the xl stack of Xen required SSH to be able to migrate the instance,
leading to a situation where full movement of an instance around the cluster
was not possible. This version fixes the issue by using socat to transfer
instance data. While socat is less secure than SSH, it is about as secure as
xm migrations, and occurs over the secondary network if present. As a
consequence of this change, Xen instance migrations using xl cannot occur
between nodes running 2.13.0 and 2.13.1.

Ganeti now internally retries the instance creation opcode if opportunistic
locking did not acquire nodes with enough free resources. The internal retry
will not use opportunistic locking. In particular, instance creation, even
if opportunistic locking is set, will never fail with ECODE_TEMP_NORES.

The handling of SSH security had undergone a significant change. From
this version on, each node has an individual SSH key pair instead of
sharing one with all nodes of the cluster. From now on, we also
restrict SSH access to master candidates. This means that only master
candidates can ssh into other cluster nodes and all
non-master-candidates cannot. Refer to the UPGRADE notes
for further instructions on the creation and distribution of the keys.

Ganeti now checks hypervisor version compatibility before trying an instance
migration. It errors out if the versions are not compatible. Add the option
–ignore-hvversions to restore the old behavior of only warning.

Node tags starting with htools:migration: or htools:allowmigration: now have
a special meaning to htools(1). See hbal(1) for details.

The LXC hypervisor code has been repaired and improved. Instances cannot be
migrated and cannot have more than one disk, but should otherwise work as with
other hypervisors. OS script changes should not be necessary. LXC version
1.0.0 or higher required.

A new job filter rules system allows to define iptables-like rules for the
job scheduler, making it easier to (soft-)drain the job queue, perform
maintenance, and rate-limit selected job types. See gnt-filter(8) for
details.

Ganeti jobs can now be ad-hoc rate limited via the reason trail.
For a set of jobs queued with “–reason=rate-limit:n:label”, the job
scheduler ensures that not more than n will be scheduled to run at the same
time. See ganeti(7), section “Options”, for details.

The monitoring daemon has now variable sleep times for the data
collectors. This currently means that the granularity of cpu-avg-load
can be configured.

The ‘gnt-cluster verify’ command now has the option
‘–verify-ssh-clutter’, which verifies whether Ganeti (accidentally)
cluttered up the ‘authorized_keys’ file.

Instance disks can now be converted from one disk template to another for many
different template combinations. When available, more efficient conversions
will be used, otherwise the disks are simply copied over.

Ganeti provides a RESTful control interface called the RAPI. Its HTTPS
implementation is vulnerable to DoS attacks via client-initiated SSL
parameter renegotiation. While the interface is not meant to be exposed
publicly, due to the fact that it binds to all interfaces, we believe
some users might be exposing it unintentionally and are vulnerable. A
DoS attack can consume resources meant for Ganeti daemons and instances
running on the master node, making both perform badly.

Fixes are not feasible due to the OpenSSL Python library not exposing
functionality needed to disable client-side renegotiation. Instead, we
offer instructions on how to control RAPI’s exposure, along with info
on how RAPI can be setup alongside an HTTPS proxy in case users still
want or need to expose the RAPI interface. The instructions are
outlined in Ganeti’s security document: doc/html/security.html

CVE-2015-7945

Ganeti leaks the DRBD secret through the RAPI interface. Examining job
results after an instance information job reveals the secret. With the
DRBD secret, access to the local cluster network, and ARP poisoning,
an attacker can impersonate a Ganeti node and clone the disks of a
DRBD-based instance. While an attacker with access to the cluster
network is already capable of accessing any data written as DRBD
traffic is unencrypted, having the secret expedites the process and
allows access to the entire disk.

Fixes contained in this release prevent the secret from being exposed
via the RAPI. The DRBD secret can be changed by converting an instance
to plain and back to DRBD, generating a new secret, but redundancy will
be lost until the process completes.
Since attackers with node access are capable of accessing some and
potentially all data even without the secret, we do not recommend that
the secret be changed for existing instances.

This release contains a fix for the problem that different encodings in
SSL certificates can break RPC communication (issue 1094). The fix makes
it necessary to rerun ‘gnt-cluster renew-crypto –new-node-certificates’
after the cluster is fully upgraded to 2.12.5.

Fixed Issue 1070: Upgrade of Ganeti 2.5.2 to 2.12.0 fails due to
missing UUIDs for disks

Fixed Issue 1073: ssconf_hvparams_* not distributed with ssconf

Inherited from the 2.11 branch:

Fixed Issue 1032: Renew-crypto –new-node-certificates sometimes does not
complete.
The operation ‘gnt-cluster renew-crypto –new-node-certificates’ is
now more robust against intermitten reachability errors. Nodes that
are temporarily not reachable, are contacted with several retries.
Nodes which are marked as offline are omitted right away.

Ganeti is now distributed under the 2-clause BSD license.
See the COPYING file.

Do not use debug mode in production. Certain daemons will issue warnings
when launched in debug mode. Some debug logging violates some of the new
invariants in the system (see “New features”). The logging has been kept as
it aids diagnostics and development.

Ganeti will not log private and secret parameters, unless it is running
in debug mode.

Ganeti will not save secret parameters to configuration. Secret parameters
must be supplied every time you install, or reinstall, an instance.

Attempting to override public parameters with private or secret parameters
results in an error. Similarly, you may not use secret parameters to
override private parameters.

The move-instance tool can now attempt to allocate an instance by using
opportunistic locking when an iallocator is used.

The build system creates sample systemd unit files, available under
doc/examples/systemd. These unit files allow systemd to natively
manage and supervise all Ganeti processes.

Different types of compression can be applied during instance moves, including
user-specified ones.

Ganeti jobs now run as separate processes. The jobs are coordinated by
a new daemon “WConfd” that manages cluster’s configuration and locks
for individual jobs. A consequence is that more jobs can run in parallel;
the number is run-time configurable, see “New features” entry
of 2.11.0. To avoid luxid being overloaded with tracking running jobs, it
backs of and only occasionally, in a sequential way, checks if jobs have
finished and schedules new ones. In this way, luxid keeps responsive under
high cluster load. The limit as when to start backing of is also run-time
configurable.

The metadata daemon is now optionally available, as part of the
partial implementation of the OS-installs design. It allows pass
information to OS install scripts or to instances.
It is also possible to run Ganeti without the daemon, if desired.

Detection of user shutdown of instances has been implemented for Xen
as well.

Wrong UDP checksums in DHCP network packets:
If an instance communicates with the metadata daemon and uses DHCP to
obtain its IP address on the provided virtual network interface,
it can happen that UDP packets have a wrong checksum, due to
a bug in virtio. See for example https://bugs.launchpad.net/bugs/930962

Ganeti works around this bug by disabling the UDP checksums on the way
from a host to instances (only on the special metadata communication
network interface) using the ethtool command. Therefore if using
the metadata daemon the host nodes should have this tool available.

The metadata daemon is run as root in the split-user mode, to be able
to bind to port 80.
This should be improved in future versions, see issue #949.

Ganeti provides a RESTful control interface called the RAPI. Its HTTPS
implementation is vulnerable to DoS attacks via client-initiated SSL
parameter renegotiation. While the interface is not meant to be exposed
publicly, due to the fact that it binds to all interfaces, we believe
some users might be exposing it unintentionally and are vulnerable. A
DoS attack can consume resources meant for Ganeti daemons and instances
running on the master node, making both perform badly.

Fixes are not feasible due to the OpenSSL Python library not exposing
functionality needed to disable client-side renegotiation. Instead, we
offer instructions on how to control RAPI’s exposure, along with info
on how RAPI can be setup alongside an HTTPS proxy in case users still
want or need to expose the RAPI interface. The instructions are
outlined in Ganeti’s security document: doc/html/security.html

CVE-2015-7945

Ganeti leaks the DRBD secret through the RAPI interface. Examining job
results after an instance information job reveals the secret. With the
DRBD secret, access to the local cluster network, and ARP poisoning,
an attacker can impersonate a Ganeti node and clone the disks of a
DRBD-based instance. While an attacker with access to the cluster
network is already capable of accessing any data written as DRBD
traffic is unencrypted, having the secret expedites the process and
allows access to the entire disk.

Fixes contained in this release prevent the secret from being exposed
via the RAPI. The DRBD secret can be changed by converting an instance
to plain and back to DRBD, generating a new secret, but redundancy will
be lost until the process completes.
Since attackers with node access are capable of accessing some and
potentially all data even without the secret, we do not recommend that
the secret be changed for existing instances.

The operation ‘gnt-cluster renew-crypto –new-node-certificates’ is
now more robust against intermitten reachability errors. Nodes that
are temporarily not reachable, are contacted with several retries.
Nodes which are marked as offline are omitted right away.

Important security release. In 2.10.0, the
‘gnt-cluster upgrade’ command was introduced. Before
performing an upgrade, the configuration directory of
the cluster is backed up. Unfortunately, the archive was
written with permissions that make it possible for
non-privileged users to read the archive and thus have
access to cluster and RAPI keys. After this release,
the archive will be created with privileged access only.

We strongly advise you to restrict the permissions of
previously created archives. The archives are found in
/var/lib/ganeti*.tar (unless otherwise configured with
–localstatedir or –with-backup-dir).

If you suspect that non-privileged users have accessed
your archives already, we advise you to renew the
cluster’s crypto keys using ‘gnt-cluster renew-crypto’
and to reset the RAPI credentials by editing
/var/lib/ganeti/rapi_users (respectively under a
different path if configured differently with
–localstatedir).

Improvements to KVM wrt to the kvmd and instance shutdown behavior.
WARNING: In contrast to our standard policy, this bug fix update
introduces new parameters to the configuration. This means in
particular that after an upgrade from 2.11.0 or 2.11.1, ‘cfgupgrade’
needs to be run, either manually or explicitly by running
‘gnt-cluster upgrade –to 2.11.2’ (which requires that they
had configured the cluster with –enable-versionfull).
This also means, that it is not easily possible to downgrade from
2.11.2 to 2.11.1 or 2.11.0. The only way is to go back to 2.10 and
back.

gnt-nodelist no longer shows disk space information for shared file
disk templates because it is not a node attribute. (For example, if you have
both the file and shared file disk templates enabled, gnt-nodelist now
only shows information about the file disk template.)

The shared file disk template is now in the new ‘sharedfile’ storage type.
As a result, gnt-nodelist-storage-tfile now only shows information
about the file disk template and you may use gnt-nodelist-storage-tsharedfile to query storage information for the shared file disk template.

Over luxi, syntactially incorrect queries are now rejected as a whole;
before, a ‘SumbmitManyJobs’ request was partially executed, if the outer
structure of the request was syntactically correct. As the luxi protocol
is internal (external applications are expected to use RAPI), the impact
of this incompatible change should be limited.

Queries for nodes, instances, groups, backups and networks are now
exclusively done via the luxi daemon. Legacy python code was removed,
as well as the –enable-split-queries configuration option.

Orphan volumes errors are demoted to warnings and no longer affect the exit
code of gnt-clusterverify.

RPC security got enhanced by using different client SSL certificates
for each node. In this context ‘gnt-cluster renew-crypto’ got a new
option ‘–renew-node-certificates’, which renews the client
certificates of all nodes. After a cluster upgrade from pre-2.11, run
this to create client certificates and activate this feature.

Instance moves, backups and imports can now use compression to transfer the
instance data.

Node groups can be configured to use an SSH port different than the
default 22.

Added experimental support for Gluster distributed file storage as the
gluster disk template under the new sharedfile storage type through
automatic management of per-node FUSE mount points. You can configure the
mount point location at gnt-clusterinit time by using the new
--gluster-storage-dir switch.

Job scheduling is now handled by luxid, and the maximal number of jobs running
in parallel is a run-time parameter of the cluster.

A new tool for planning dynamic power management, called hsqueeze, has
been added. It suggests nodes to power up or down and corresponding instance
moves.

Ganeti provides a RESTful control interface called the RAPI. Its HTTPS
implementation is vulnerable to DoS attacks via client-initiated SSL
parameter renegotiation. While the interface is not meant to be exposed
publicly, due to the fact that it binds to all interfaces, we believe
some users might be exposing it unintentionally and are vulnerable. A
DoS attack can consume resources meant for Ganeti daemons and instances
running on the master node, making both perform badly.

Fixes are not feasible due to the OpenSSL Python library not exposing
functionality needed to disable client-side renegotiation. Instead, we
offer instructions on how to control RAPI’s exposure, along with info
on how RAPI can be setup alongside an HTTPS proxy in case users still
want or need to expose the RAPI interface. The instructions are
outlined in Ganeti’s security document: doc/html/security.html

CVE-2015-7945

Ganeti leaks the DRBD secret through the RAPI interface. Examining job
results after an instance information job reveals the secret. With the
DRBD secret, access to the local cluster network, and ARP poisoning,
an attacker can impersonate a Ganeti node and clone the disks of a
DRBD-based instance. While an attacker with access to the cluster
network is already capable of accessing any data written as DRBD
traffic is unencrypted, having the secret expedites the process and
allows access to the entire disk.

Fixes contained in this release prevent the secret from being exposed
via the RAPI. The DRBD secret can be changed by converting an instance
to plain and back to DRBD, generating a new secret, but redundancy will
be lost until the process completes.
Since attackers with node access are capable of accessing some and
potentially all data even without the secret, we do not recommend that
the secret be changed for existing instances.

Important security release. In 2.10.0, the
‘gnt-cluster upgrade’ command was introduced. Before
performing an upgrade, the configuration directory of
the cluster is backed up. Unfortunately, the archive was
written with permissions that make it possible for
non-privileged users to read the archive and thus have
access to cluster and RAPI keys. After this release,
the archive will be created with privileged access only.

We strongly advise you to restrict the permissions of
previously created archives. The archives are found in
/var/lib/ganeti*.tar (unless otherwise configured with
–localstatedir or –with-backup-dir).

If you suspect that non-privileged users have accessed
your archives already, we advise you to renew the
cluster’s crypto keys using ‘gnt-cluster renew-crypto’
and to reset the RAPI credentials by editing
/var/lib/ganeti/rapi_users (respectively under a
different path if configured differently with
–localstatedir).

Two new options have been added to gnt-group evacuate.
The ‘sequential’ option forces all the evacuation steps to
be carried out sequentially, thus avoiding congestion on a
slow link between node groups. The ‘force-failover’ option
disallows migrations and forces failovers to be used instead.
In this way evacuation to a group with vastly differnet
hypervisor is possible.

In tiered allocation, when looking for ways on how to shrink
an instance, the canoncial path is tried first, i.e., in each
step reduce on the resource most placements are blocked on. Only
if no smaller fitting instance can be found shrinking a single
resource till fit is tried.

For finding the placement of an instance, the duplicate computations
in the computation of the various cluster scores are computed only
once. This significantly improves the performance of hspace for DRBD
on large clusters; for other clusters, a slight performance decrease
might occur. Moreover, due to the changed order, floating point
number inaccuracies accumulate differently, thus resulting in different
cluster scores. It has been verified that the effect of these different
roundings is less than 1e-12.

KVM hypervisors can now access RBD storage directly without having to
go through a block device.

A new command ‘gnt-cluster upgrade’ was added that automates the upgrade
procedure between two Ganeti versions that are both 2.10 or higher.

The move-instance command can now change disk templates when moving
instances, and does not require any node placement options to be
specified if the destination cluster has a default iallocator.

Users can now change the soundhw and cpuid settings for XEN hypervisors.

Hail and hbal now have the (optional) capability of accessing average CPU
load information through the monitoring deamon, and to use it to dynamically
adapt the allocation of instances.

Hotplug support. Introduce new option ‘–hotplug’ to gnt-instancemodify
so that disk and NIC modifications take effect without the need of actual
reboot. There are a couple of constrains currently for this feature:

only KVM hypervisor (versions >= 1.0) supports it,

one can not (yet) hotplug a disk using userspace access mode for RBD

in case of a downgrade instances should suffer a reboot in order to
be migratable (due to core change of runtime files)

A new test framework for logical units was introduced and the test
coverage for logical units was improved significantly.

Opcodes are entirely generated from Haskell using the tool ‘hs2py’ and
the module ‘src/Ganeti/OpCodes.hs’.

Constants are also generated from Haskell using the tool
‘hs2py-constants’ and the module ‘src/Ganeti/Constants.hs’, with the
exception of socket related constants, which require changing the
cluster configuration file, and HVS related constants, because they
are part of a port of instance queries to Haskell. As a result, these
changes will be part of the next release of Ganeti.

Ganeti provides a RESTful control interface called the RAPI. Its HTTPS
implementation is vulnerable to DoS attacks via client-initiated SSL
parameter renegotiation. While the interface is not meant to be exposed
publicly, due to the fact that it binds to all interfaces, we believe
some users might be exposing it unintentionally and are vulnerable. A
DoS attack can consume resources meant for Ganeti daemons and instances
running on the master node, making both perform badly.

Fixes are not feasible due to the OpenSSL Python library not exposing
functionality needed to disable client-side renegotiation. Instead, we
offer instructions on how to control RAPI’s exposure, along with info
on how RAPI can be setup alongside an HTTPS proxy in case users still
want or need to expose the RAPI interface. The instructions are
outlined in Ganeti’s security document: doc/html/security.html

CVE-2015-7945

Ganeti leaks the DRBD secret through the RAPI interface. Examining job
results after an instance information job reveals the secret. With the
DRBD secret, access to the local cluster network, and ARP poisoning,
an attacker can impersonate a Ganeti node and clone the disks of a
DRBD-based instance. While an attacker with access to the cluster
network is already capable of accessing any data written as DRBD
traffic is unencrypted, having the secret expedites the process and
allows access to the entire disk.

Fixes contained in this release prevent the secret from being exposed
via the RAPI. The DRBD secret can be changed by converting an instance
to plain and back to DRBD, generating a new secret, but redundancy will
be lost until the process completes.
Since attackers with node access are capable of accessing some and
potentially all data even without the secret, we do not recommend that
the secret be changed for existing instances.

hroller now also plans for capacity to move non-redundant instances off
any node to be rebooted; the old behavior of completely ignoring any
non-redundant instances can be restored by adding the –ignore-non-redundant
option.

The cluster option ‘–no-lvm-storage’ was removed in favor of the new option
‘–enabled-disk-templates’.

On instance creation, disk templates no longer need to be specified
with ‘-t’. The default disk template will be taken from the list of
enabled disk templates.

The monitoring daemon is now running as root, in order to be able to collect
information only available to root (such as the state of Xen instances).

The ConfD client is now IPv6 compatible.

File and shared file storage is no longer dis/enabled at configure time,
but using the option ‘–enabled-disk-templates’ at cluster initialization and
modification.

The default directories for file and shared file storage are not anymore
specified at configure time, but taken from the cluster’s configuration.
They can be set at cluster initialization and modification with
‘–file-storage-dir’ and ‘–shared-file-storage-dir’.

Cluster verification now includes stricter checks regarding the
default file and shared file storage directories. It now checks that
the directories are explicitely allowed in the ‘file-storage-paths’ file and
that the directories exist on all nodes.

The list of allowed disk templates in the instance policy and the list
of cluster-wide enabled disk templates is now checked for consistency
on cluster or group modification. On cluster initialization, the ipolicy
disk templates are ensured to be a subset of the cluster-wide enabled
disk templates.

DRBD 8.4 support. Depending on the installed DRBD version, Ganeti now uses
the correct command syntax. It is possible to use different DRBD versions
on different nodes as long as they are compatible to each other. This
enables rolling upgrades of DRBD with no downtime. As permanent operation
of different DRBD versions within a node group is discouraged,
gnt-clusterverify will emit a warning if it detects such a situation.

New “inst-status-xen” data collector for the monitoring daemon, providing
information about the state of the xen instances on the nodes.

New “lv” data collector for the monitoring daemon, collecting data about the
logical volumes on the nodes, and pairing them with the name of the instances
they belong to.

New “diskstats” data collector, collecting the data from /proc/diskstats and
presenting them over the monitoring daemon interface.

Instance policy can contain multiple instance specs, as described in
the “Constrained instance sizes” section of Partitioned Ganeti. As a consequence, it’s not possible to partially change
or override instance specs. Bounding specs (min and max) can be specified as a
whole using the new option --ipolicy-bounds-specs, while standard
specs use the new option --ipolicy-std-specs.

The output of the info command of gnt-cluster, gnt-group, gnt-node,
gnt-instance is a valid YAML object.

hail now honors network restrictions when allocating nodes. This led to an
update of the IAllocator protocol. See the IAllocator documentation for
details.

confd now only answers static configuration request over the network. luxid
was extracted, listens on the local LUXI socket and responds to live queries.
This allows finer grained permissions if using separate users.

The Remote API daemon now supports a command line flag
to always require authentication, --require-authentication. It can
be specified in $sysconfdir/default/ganeti.

A new cluster attribute ‘enabled_disk_templates’ is introduced. It will
be used to manage the disk templates to be used by instances in the cluster.
Initially, it will be set to a list that includes plain, drbd, if they were
enabled by specifying a volume group name, and file and sharedfile, if those
were enabled at configure time. Additionally, it will include all disk
templates that are currently used by instances. The order of disk templates
will be based on Ganeti’s history of supporting them. In the future, the
first entry of the list will be used as a default disk template on instance
creation.

cfgupgrade now supports a --downgrade option to bring the
configuration back to the previous stable version.

Disk templates in group ipolicy can be restored to the default value.

Initial support for diskless instances and virtual clusters in QA.

More QA and unit tests for instance policies.

Every opcode now contains a reason trail (visible through gnt-jobinfo)
describing why the opcode itself was executed.

The monitoring daemon is now available. It allows users to query the cluster
for obtaining information about the status of the system. The daemon is only
responsible for providing the information over the network: the actual data
gathering is performed by data collectors (currently, only the DRBD status
collector is available).

In order to help developers work on Ganeti, a new script
(devel/build_chroot) is provided, for building a chroot that contains all
the required development libraries and tools for compiling Ganeti on a Debian
Squeeze system.

A new tool, harep, for performing self-repair and recreation of instances
in Ganeti has been added.

New command show-ispecs-cmd for gnt-cluster and gnt-group.
It prints the command line to set the current policies, to ease
changing them.

Add the vnet_hdr HV parameter for KVM, to control whether the tap
devices for KVM virtio-net interfaces will get created with VNET_HDR
(IFF_VNET_HDR) support. If set to false, it disables offloading on the
virtio-net interfaces, which prevents host kernel tainting and log
flooding, when dealing with broken or malicious virtio-net drivers.
It’s set to true by default.

To simplify the work of packaging frameworks that want to add the needed users
and groups in a split-user setup themselves, at build time three files in
doc/users will be generated. The groups files contains, one per line,
the groups to be generated, the users file contains, one per line, the
users to be generated, optionally followed by their primary group, where
important. The groupmemberships file contains, one per line, additional
user-group membership relations that need to be established. The syntax of
these files will remain stable in all future versions.

To simplify the work of packaging frameworks that want to add the needed users
and groups in a split-user setup themselves, at build time three files in
doc/users will be generated. The groups files contains, one per line,
the groups to be generated, the users file contains, one per line, the
users to be generated, optionally followed by their primary group, where
important. The groupmemberships file contains, one per line, additional
user-group membership relations that need to be established. The syntax of
these files will remain stable in all future versions.

Add a default to file-driver when unspecified over RAPI (Issue 571)

Mark the DSA host pubkey as optional, and remove it during config downgrade
(Issue 560)

Instance policies for disk size were documented to be on a per-disk
basis, but hail applied them to the sum of all disks. This has been
fixed.

hbal will now exit with status 0 if, during job execution over
LUXI, early exit has been requested and all jobs are successful;
before, exit status 1 was used, which cannot be differentiated from
“job error” case

Compatibility with newer versions of rbd has been fixed

gnt-instancebatch-create has been changed to use the bulk create
opcode from Ganeti. This lead to incompatible changes in the format of
the JSON file. It’s now not a custom dict anymore but a dict
compatible with the OpInstanceCreate opcode.

Parent directories for file storage need to be listed in
$sysconfdir/ganeti/file-storage-paths now. cfgupgrade will
write the file automatically based on old configuration values, but it
can not distribute it across all nodes and the file contents should be
verified. Use gnt-clustercopyfile$sysconfdir/ganeti/file-storage-paths once the cluster has been
upgraded. The reason for requiring this list of paths now is that
before it would have been possible to inject new paths via RPC,
allowing files to be created in arbitrary locations. The RPC protocol
is protected using SSL/X.509 certificates, but as a design principle
Ganeti does not permit arbitrary paths to be passed.

The parsing of the variants file for OSes (see
ganeti-os-interface(7)) has been slightly changed: now empty
lines and comment lines (starting with #) are ignored for better
readability.

The setup-ssh tool added in Ganeti 2.2 has been replaced and is no
longer available. gnt-nodeadd now invokes a new tool on the
destination node, named prepare-node-join, to configure the SSH
daemon. Paramiko is no longer necessary to configure nodes’ SSH
daemons via gnt-nodeadd.

Draining (gnt-clusterqueuedrain) and un-draining the job queue
(gnt-clusterqueueundrain) now affects all nodes in a cluster and
the flag is not reset after a master failover.

Python 2.4 has not been tested with this release. Using 2.6 or above
is recommended. 2.6 will be mandatory from the 2.8 series.

New exclusive-storage node parameter added, restricted to
nodegroup level. When it’s set to true, physical disks are assigned in
an exclusive fashion to instances, as documented in Partitioned
Ganeti. Currently, only instances using the
plain disk template are supported.

The KVM hypervisor has been updated with many new hypervisor
parameters, including a generic one for passing arbitrary command line
values. See a complete list in gnt-instance(8). It is now
compatible up to qemu 1.4.

A new tool, called mon-collector, is the stand-alone executor of
the data collectors for a monitoring system. As of this version, it
just includes the DRBD data collector, that can be executed by calling
mon-collector using the drbd parameter. See
mon-collector(7).

A new user option, read, has been added
for RAPI users. It allows granting permissions to query for
information to a specific user without giving
write permissions.

A new tool named node-cleanup has been added. It cleans remains of
a cluster from a machine by stopping all daemons, removing
certificates and ssconf files. Unless the --no-backup option is
given, copies of the certificates are made.

Instance creations now support the use of opportunistic locking,
potentially speeding up the (parallel) creation of multiple instances.
This feature is currently only available via the RAPI interface and when an instance allocator is used. If the
opportunistic_locking parameter is set the opcode will try to
acquire as many locks as possible, but will not wait for any locks
held by other opcodes. If not enough resources can be found to
allocate the instance, the temporary error code
temp_insufficient_resources is returned. The operation can be
retried thereafter, with or without opportunistic locking.

New experimental linux-ha resource scripts.

Restricted-commands support: ganeti can now be asked (via command line
or rapi) to perform commands on a node. These are passed via ganeti
RPC rather than ssh. This functionality is restricted to commands
specified on the $sysconfdir/ganeti/restricted-commands for security
reasons. The file is not copied automatically.

Important behaviour change: hbal won’t rebalance anymore instances which
have the auto_balance attribute set to false. This was the intention
all along, but until now it only skipped those from the N+1 memory
reservation (DRBD-specific).

Fixed the dry-run mode for many operations: verification of
results was over-zealous but didn’t take into account the dry-run
operation, resulting in “wrong” failures.

Fixed bash completion in gnt-joblist when the job queue has
hundreds of entries; especially with older bash versions, this
results in significant CPU usage.

And lastly, a few other improvements have been made:

Added option to force master-failover without voting (issue 282).

Clarified error message on lock conflict (issue 287).

Logging of newly submitted jobs has been improved (issue 290).

Hostname checks have been made uniform between instance rename and
create (issue 291).

The --submit option is now supported by gnt-debugdelay.

Shutting down the master daemon by sending SIGTERM now stops it from
processing jobs waiting for locks; instead, those jobs will be started
once again after the master daemon is started the next time (issue
296).

Support for Xen’s xl program has been improved (besides the fixes
above).

Reduced logging noise in the Haskell confd daemon (only show one log
entry for each config reload, instead of two).

The LUXI protocol has been made more consistent
regarding its handling of command arguments. This, however, leads to
incompatibility issues with previous versions. Please ensure that you
restart Ganeti daemons soon after the upgrade, otherwise most
LUXI calls (job submission, setting/resetting the drain flag,
pausing/resuming the watcher, cancelling and archiving jobs, querying
the cluster configuration) will fail.

The current admin_up field, which used to denote whether an instance
should be running or not, has been removed. Instead, admin_state is
introduced, with 3 possible values – up, down and offline.

The rational behind this is that an instance being “down” can have
different meanings:

it could be down during a reboot

it could be temporarily be down for a reinstall

or it could be down because it is deprecated and kept just for its
disk

The previous Boolean state was making it difficult to do capacity
calculations: should Ganeti reserve memory for a down instance? Now, the
tri-state field makes it clear:

in up and down state, all resources are reserved for the
instance, and it can be at any time brought up if it is down

in offline state, only disk space is reserved for it, but not
memory or CPUs

The field can have an extra use: since the transition between up and
down and vice-versus is done via gnt-instancestart/stop, but
transition between offline and down is done via gnt-instancemodify, it is possible to given different rights to users. For
example, owners of an instance could be allowed to start/stop it, but
not transition it out of the offline state.

In previous Ganeti versions, an instance creation request was not
limited on the minimum size and on the maximum size just by the cluster
resources. As such, any policy could be implemented only in third-party
clients (RAPI clients, or shell wrappers over gnt-*
tools). Furthermore, calculating cluster capacity via hspace again
required external input with regards to instance sizes.

In order to improve these workflows and to allow for example better
per-node group differentiation, we introduced instance specs, which
allow declaring:

minimum instance disk size, disk count, memory size, cpu count

maximum values for the above metrics

and “standard” values (used in hspace to calculate the standard
sized instances)

The minimum/maximum values can be also customised at node-group level,
for example allowing more powerful hardware to support bigger instance
memory sizes.

Beside the instance specs, there are a few other settings belonging to
the instance policy framework. It is possible now to customise, per
cluster and node-group:

the list of allowed disk templates

the maximum ratio of VCPUs per PCPUs (to control CPU oversubscription)

the maximum ratio of instance to spindles (see below for more
information) for local storage

All these together should allow all tools that talk to Ganeti to know
what are the ranges of allowed values for instances and the
over-subscription that is allowed.

For the VCPU/PCPU ratio, we already have the VCPU configuration from the
instance configuration, and the physical CPU configuration from the
node. For the spindle ratios however, we didn’t track before these
values, so new parameters have been added:

a new node parameter spindle_count, defaults to 1, customisable at
node group or node level

at new backend parameter (for instances), spindle_use defaults to 1

Note that spindles in this context doesn’t need to mean actual
mechanical hard-drives; it’s just a relative number for both the node
I/O capacity and instance I/O consumption.

While live-migration is in general desirable over failover, it is
possible that for some workloads it is actually worse, due to the
variable time of the “suspend” phase during live migration.

To allow the tools to work consistently over such instances (without
having to hard-code instance names), a new backend parameter
always_failover has been added to control the migration/failover
behaviour. When set to True, all migration requests for an instance will
instead fall-back to failover.

Initial support for memory ballooning has been added. The memory for an
instance is no longer fixed (backend parameter memory), but instead
can vary between minimum and maximum values (backend parameters
minmem and maxmem). Currently we only change an instance’s
memory when:

live migrating or failing over and instance and the target node
doesn’t have enough memory

In order to control the use of specific CPUs by instance, support for
controlling CPU pinning has been added for the Xen, HVM and LXC
hypervisors. This is controlled by a new hypervisor parameter
cpu_mask; details about possible values for this are in the
gnt-instance(8). Note that use of the most specific (precise
VCPU-to-CPU mapping) form will work well only when all nodes in your
cluster have the same amount of CPUs.

Another area in which Ganeti was not customisable were the parameters
used for storage configuration, e.g. how many stripes to use for LVM,
DRBD resync configuration, etc.

To improve this area, we’ve added disks parameters, which are
customisable at cluster and node group level, and which allow to
specify various parameters for disks (DRBD has the most parameters
currently), for example:

DRBD resync algorithm and parameters (e.g. speed)

the default VG for meta-data volumes for DRBD

number of stripes for LVM (plain disk template)

the RBD pool

These parameters can be modified via gnt-clustermodify-D… and
gnt-groupmodify-D…, and are used at either instance creation (in
case of LVM stripes, for example) or at disk “activation” time
(e.g. resync speed).

The existing master IP functionality works well only in simple setups (a
single network shared by all nodes); however, if nodes belong to
different networks, then the /32 setup and lack of routing
information is not enough.

To allow the master IP to function well in more complex cases, the
system was reworked as follows:

a master IP netmask setting has been added

the master IP activation/turn-down code was moved from the node daemon
to a separate script

whether to run the Ganeti-supplied master IP script or a user-supplied
on is a gnt-clusterinit setting

Details about the location of the standard and custom setup scripts are
in the man page gnt-cluster(8); for information about the
setup script protocol, look at the Ganeti-supplied script.

It is now possible to use TLS-protected connections, and when renewing
or changing the cluster certificates (via gnt-clusterrenew-crypto,
it is now possible to specify spice or spice CA certificates. Also, it
is possible to configure a password for SPICE sessions via the
hypervisor parameter spice_password_file.

There are also new parameters to control the compression and streaming
options (e.g. spice_image_compression, spice_streaming_video,
etc.). For details, see the man page gnt-instance(8) and look
for the spice parameters.

Lastly, it is now possible to see the SPICE connection information via
gnt-instanceconsole.

The configuration query daemon (ganeti-confd) is now optional, and
has been rewritten in Haskell; whether to use the daemon at all, use the
Python (default) or the Haskell version is selectable at configure time
via the --enable-confd parameter, which can take one of the
haskell, python or no values. If not used, disabling the
daemon will result in a smaller footprint; for larger systems, we
welcome feedback on the Haskell version which might become the default
in future versions.

If you want to use gnt-nodelist-drbd you need to have the Haskell
daemon running. The Python version doesn’t implement the new call.

We have replaced the --disks option of gnt-instancereplace-disks with a more flexible --disk option, which allows
adding and removing disks at arbitrary indices (Issue 188). Furthermore,
disk size and mode can be changed upon recreation (via gnt-instancerecreate-disks, which accepts the same --disk option).

As many people are used to a show command, we have added that as an
alias to info on all gnt-* commands.

The gnt-instancegrow-disk command has a new mode in which it can
accept the target size of the disk, instead of the delta; this can be
more safe since two runs in absolute mode will be idempotent, and
sometimes it’s also easier to specify the desired size directly.

Also the handling of instances with regard to offline secondaries has
been improved. Instance operations should not fail because one of it’s
secondary nodes is offline, even though it’s safe to proceed.

A new command list-drbd has been added to the gnt-node script to
support debugging of DRBD issues on nodes. It provides a mapping of DRBD
minors to instance name.

The deprecated QueryLocks LUXI request has been removed. Use
Query(what=QR_LOCK,...) instead.

The LUXI requests QueryJobs,
QueryInstances, QueryNodes,
QueryGroups, QueryExports and
QueryTags are deprecated and will be removed in a
future version. Query should be used instead.

RAPI client: CertificateError now derives from
GanetiApiError. This should make it more easy to handle Ganeti
errors.

Deprecation warnings due to PyCrypto/paramiko import in
tools/setup-ssh have been silenced, as usually they are safe; please
make sure to run an up-to-date paramiko version, if you use this tool.

The QA scripts now depend on Python 2.5 or above (the main code base
still works with Python 2.4).

The configuration file (config.data) is now written without
indentation for performance reasons; if you want to edit it, it can be
re-formatted via tools/fmtjson.

A number of bugs has been fixed in the cluster merge tool.

x509 certification verification (used in import-export) has been
changed to allow the same clock skew as permitted by the cluster
verification. This will remove some rare but hard to diagnose errors in
import-export.

Added possibility to run activate-disks even though secondaries are
offline. With this change it relaxes also the strictness on some other
commands which use activate disks internally:
* gnt-instancestart|reboot|rename|backup|export

Made it possible to remove safely an instance if its secondaries are
offline

Additionally, a few fixes were done to the build system (fixed parallel
build failures) and to the unittests (fixed race condition in test for
FileID functions, and the default enable/disable mode for QA test is now
customisable).

The main issues solved are on the topic of compatibility with newer LVM
releases:

fixed parsing of lv_attr field

adapted to new vgreduce--removemissing behaviour where sometimes
the --force flag is needed

Also on the topic of compatibility, tools/lvmstrap has been changed
to accept kernel 3.x too (was hardcoded to 2.6.*).

A regression present in 2.5.0 that broke handling (in the gnt-* scripts)
of hook results and that also made display of other errors suboptimal
was fixed; the code behaves now like 2.4 and earlier.

Another change in 2.5, the cleanup of the OS scripts environment, is too
aggressive: it removed even the PATH variable, which requires the OS
scripts to always need to export it. Since this is a bit too strict,
we now export a minimal PATH, the same that we export for hooks.

The fix for issue 201 (Preserve bridge MTU in KVM ifup script) was
integrated into this release.

Finally, a few other miscellaneous changes were done (no new features,
just small improvements):

Fix gnt-group--help display

Fix hardcoded Xen kernel path

Fix grow-disk handling of invalid units

Update synopsis for gnt-clusterrepair-disk-sizes

Accept both PUT and POST in noded (makes future upgrade to 2.6 easier)

The default of the /2/instances/[instance_name]/rename RAPI
resource’s ip_check parameter changed from True to False
to match the underlying LUXI interface.

The /2/nodes/[node_name]/evacuate RAPI resource was changed to use
body parameters, see RAPI documentation. The server does
not maintain backwards-compatibility as the underlying operation
changed in an incompatible way. The RAPI client can talk to old
servers, but it needs to be told so as the return value changed.

When creating file-based instances via RAPI, the file_driver
parameter no longer defaults to loop and must be specified.

The deprecated bridge NIC parameter is no longer supported. Use
link instead.

Support for the undocumented and deprecated RAPI instance creation
request format version 0 has been dropped. Use version 1, supported
since Ganeti 2.1.3 and documented, instead.

On the user-visible side, the gnt-*list command output has changed
with respect to “special” field states. The current rc1 style of display
can be re-enabled by passing a new --verbose (-v) flag, but in
the default output mode special fields states are displayed as follows:

Offline resource: *

Unavailable/not applicable: -

Data missing (RPC failure): ?

Unknown field: ??

Another user-visible change is the addition of --force-join to
gnt-nodeadd.

As for bug fixes:

tools/cluster-merge has seen many fixes and is now enabled again

Fixed regression in RAPI/instance reinstall where all parameters were
required (instead of optional)

Changed query operations to return more detailed information, e.g.
whether an information is unavailable due to an offline node. To use
this new functionality, the LUXI call Query must be used. Field
information is now stored by the master daemon and can be retrieved
using QueryFields. Instances, nodes and groups can also be queried
using the new opcodes OpQuery and OpQueryFields (not yet
exposed via RAPI). The following commands make use of this
infrastructure change:

Important change: the internal RPC mechanism between Ganeti nodes has
changed from using a home-grown http library (based on the Python base
libraries) to use the PycURL library. This requires that PycURL is
installed on nodes. Please note that on Debian/Ubuntu, PycURL is linked
against GnuTLS by default. cURL’s support for GnuTLS had known issues
before cURL 7.21.0 and we recommend using the latest cURL release or
linking against OpenSSL. Most other distributions already link PycURL
and cURL against OpenSSL. The command:

python -c 'import pycurl; print pycurl.version'

can be used to determine the libraries PycURL and cURL are linked
against.

Other significant changes:

Rewrote much of the internals of the job queue, in order to achieve
better parallelism; this decouples job query operations from the job
processing, and it should allow much nicer behaviour of the master
daemon under load, and it also has uncovered some long-standing bugs
related to the job serialisation (now fixed)

Added a default iallocator setting to the cluster parameters,
eliminating the need to always pass nodes or an iallocator for
operations that require selection of new node(s)

Added experimental support for the LXC virtualization method

Added support for OS parameters, which allows the installation of
instances to pass parameter to OS scripts in order to customise the
instance

Added a hypervisor parameter controlling the migration type (live or
non-live), since hypervisors have various levels of reliability; this
has renamed the ‘live’ parameter to ‘mode’

Added a cluster parameter reserved_lvs that denotes reserved
logical volumes, meaning that cluster verify will ignore them and not
flag their presence as errors

The watcher will now reset the error count for failed instances after
8 hours, thus allowing self-healing if the problem that caused the
instances to be down/fail to start has cleared in the meantime

Added a cluster parameter drbd_usermode_helper that makes Ganeti
check for, and warn, if the drbd module parameter usermode_helper
is not consistent with the cluster-wide setting; this is needed to
make diagnose easier of failed drbd creations

Started adding base IPv6 support, but this is not yet
enabled/available for use

Rename operations (cluster, instance) will now return the new name,
which is especially useful if a short name was passed in

Added support for instance migration in RAPI

Added a tool to pre-configure nodes for the SSH setup, before joining
them to the cluster; this will allow in the future a simplified model
for node joining (but not yet fully enabled in 2.2); this needs the
paramiko python library

Fixed handling of name-resolving errors

Fixed consistency of job results on the error path

Fixed master-failover race condition when executed multiple times in
sequence

Fixed many bugs related to the job queue (mostly introduced during the
2.2 development cycle, so not all are impacting 2.1)

Fixed instance migration with missing disk symlinks

Fixed handling of unknown jobs in gnt-jobarchive

And many other small fixes/improvements

Internal changes:

Enhanced both the unittest and the QA coverage

Switched the opcode validation to a generic model, and extended the
validation to all opcode parameters

Changed more parts of the code that write shell scripts to use the
same class for this

Switched the master daemon to use the asyncore library for the Luxi
server endpoint

The node deamon now tries to mlock itself into memory, unless the
--no-mlock flag is passed. It also doesn’t fail if it can’t write
its logs, and falls back to console logging. This allows emergency
features such as gnt-nodepowercycle to work even in the event of a
broken node disk (tested offlining the disk hosting the node’s
filesystem and dropping its memory caches; don’t try this at home)

KVM: add vhost-net acceleration support. It can be tested with a new
enough version of the kernel and of qemu-kvm.

KVM: Add instance chrooting feature. If you use privilege dropping for
your VMs you can also now force them to chroot to an empty directory,
before starting the emulated guest.

KVM: Add maximum migration bandwith and maximum downtime tweaking
support (requires a new-enough version of qemu-kvm).

Cluster verify will now warn if the master node doesn’t have the master
ip configured on it.

Add a new (incompatible) instance creation request format to RAPI which
supports all parameters (previously only a subset was supported, and it
wasn’t possible to extend the old format to accomodate all the new
features. The old format is still supported, and a client can check for
this feature, before using it, by checking for its presence in the
features RAPI resource.

Now with ancient latin support. Try it passing the --roman option to
gnt-instanceinfo, gnt-clusterinfo or gnt-nodelist
(requires the python-roman module to be installed, in order to work).

The KVM hypervisor now can run the individual instances as non-root, to
reduce the impact of a VM being hijacked due to bugs in the
hypervisor. It is possible to run all instances as a single (non-root)
user, to manually specify a user for each instance, or to dynamically
allocate a user out of a cluster-wide pool to each instance, with the
guarantee that no two instances will run under the same user ID on any
given node.

An experimental RAPI client library, that can be used standalone
(without the other Ganeti libraries), is provided in the source tree as
lib/rapi/client.py. Note this client might change its interface in
the future, as we iterate on its capabilities.

A new command, gnt-clusterrenew-crypto has been added to easily
replace the cluster’s certificates and crypto keys. This might help in
case they have been compromised, or have simply expired.

A new disk option for instance creation has been added that allows one
to “adopt” currently existing logical volumes, with data
preservation. This should allow easier migration to Ganeti from
unmanaged (or managed via other software) instances.

Another disk improvement is the possibility to convert between redundant
(DRBD) and plain (LVM) disk configuration for an instance. This should
allow better scalability (starting with one node and growing the
cluster, or shrinking a two-node cluster to one node).

A new feature that could help with automated node failovers has been
implemented: if a node sees itself as offline (by querying the master
candidates), it will try to shutdown (hard) all instances and any active
DRBD devices. This reduces the risk of duplicate instances if an
external script automatically failovers the instances on such nodes. To
enable this, the cluster parameter maintain_node_health should be
enabled; in the future this option (per the name) will enable other
automatic maintenance features.

Instance export/import now will reuse the original instance
specifications for all parameters; that means exporting an instance,
deleting it and the importing it back should give an almost identical
instance. Note that the default import behaviour has changed from
before, where it created only one NIC; now it recreates the original
number of NICs.

Cluster verify has added a few new checks: SSL certificates validity,
/etc/hosts consistency across the cluster, etc.

The node evacuate command (gnt-nodeevacuate) was significantly
rewritten, and as such the IAllocator protocol was changed - a new
request type has been added. This unfortunate change during a stable
series is designed to improve performance of node evacuations; on
clusters with more than about five nodes and which are well-balanced,
evacuation should proceed in parallel for all instances of the node
being evacuated. As such, any existing IAllocator scripts need to be
updated, otherwise the above command will fail due to the unknown
request. The provided “dumb” allocator has not been updated; but the
ganeti-htools package supports the new protocol since version 0.2.4.

Another important change is increased validation of node and instance
names. This might create problems in special cases, if invalid host
names are being used.

Also, a new layer of hypervisor parameters has been added, that sits at
OS level between the cluster defaults and the instance ones. This allows
customisation of virtualization parameters depending on the installed
OS. For example instances with OS ‘X’ may have a different KVM kernel
(or any other parameter) than the cluster defaults. This is intended to
help managing a multiple OSes on the same cluster, without manual
modification of each instance’s parameters.

A tool for merging clusters, cluster-merge, has been added in the
tools sub-directory.

Added a generic debug level for many operations; while this is not
used widely yet, it allows one to pass the debug value all the way to
the OS scripts

Enhanced the hooks environment for instance moves (failovers,
migrations) where the primary/secondary nodes changed during the
operation, by adding {NEW,OLD}_{PRIMARY,SECONDARY} vars

Enhanced data validations for many user-supplied values; one important
item is the restrictions imposed on instance and node names, which
might reject some (invalid) host names

Add a configure-time option to disable file-based storage, if it’s not
needed; this allows greater security separation between the master
node and the other nodes from the point of view of the inter-node RPC
protocol

Added user notification in interactive tools if job is waiting in the
job queue or trying to acquire locks

Added support for hashed passwords in the Ganeti remote API users file
(rapi_users)

Added option to specify maximum timeout on instance shutdown

Added --no-ssh-init option to gnt-clusterinit

Added new helper script to start and stop Ganeti daemons
(daemon-util), with the intent to reduce the work necessary to
adjust Ganeti for non-Debian distributions and to start/stop daemons
from one place

Added more unittests

Fixed critical bug in ganeti-masterd startup

Removed the configure-time kvm-migration-port parameter, this is
now customisable at the cluster level for both the KVM and Xen
hypervisors using the new migration_port parameter

Added experimental support for stripped logical volumes; this should
enhance performance but comes with a higher complexity in the block
device handling; stripping is only enabled when passing
--with-lvm-stripecount=N to configure, but codepaths are
affected even in the non-stripped mode

Improved resiliency against transient failures at the end of DRBD
resyncs, and in general of DRBD resync checks

Fixed a couple of issues with exports and snapshot errors

Fixed a couple of issues in instance listing

Added display of the disk size in gnt-instanceinfo

Fixed checking for valid OSes in instance creation

Fixed handling of the “vcpus” parameter in instance listing and in
general of invalid parameters

Fixed http server library, and thus RAPI, to handle invalid
username/password combinations correctly; this means that now they
report unauthorized for queries too, not only for modifications,
allowing earlier detect of configuration problems

added -H/-B startup parameters to gnt-instance, which will
allow re-adding the start in single-user option (regression from 1.2)

the watcher writes the instance status to a file, to allow monitoring
to report the instance status (from the master) based on cached
results of the watcher’s queries; while this can get stale if the
watcher is being locked due to other work on the cluster, this is
still an improvement

the watcher now also restarts the node daemon and the rapi daemon if
they died

fixed the watcher to handle full and drained queue cases

hooks export more instance data in the environment, which helps if
hook scripts need to take action based on the instance’s properties
(no longer need to query back into ganeti)

instance failovers when the instance is stopped do not check for free
RAM, so that failing over a stopped instance is possible in low memory
situations

rapi uses queries for tags instead of jobs (for less job traffic), and
for cluster tags it won’t talk to masterd at all but read them from
ssconf

a couple of error handling fixes in RAPI

drbd handling: improved the error handling of inconsistent disks after
resync to reduce the frequency of “there are some degraded disks for
this instance” messages

fixed a bug in live migration when DRBD doesn’t want to reconnect (the
error handling path called a wrong function name)

fix gnt-clusterverify and gnt-clusterverify-disks when the
volume group is broken

gnt-instanceinfo, without any arguments, doesn’t run for all
instances anymore; either pass --all or pass the desired
instances; this helps against mistakes on big clusters where listing
the information for all instances takes a long time

Xen PVM and KVM have switched the default value for the instance root
disk to the first partition on the first drive, instead of the whole
drive; this means that the OS installation scripts must be changed
accordingly

Man pages have been updated

RAPI has been switched by default to HTTPS, and the exported functions
should all work correctly

Version 2 is a general rewrite of the code and therefore the
differences are too many to list, see the design document for 2.0 in
the doc/ subdirectory for more details

In this beta version there is not yet a migration path from 1.2 (there
will be one in the final 2.0 release)

A few significant changes are:

all commands are executed by a daemon (ganeti-masterd) and the
various gnt-* commands are just front-ends to it

all the commands are entered into, and executed from a job queue,
see the gnt-job(8) manpage

the RAPI daemon supports read-write operations, secured by basic
HTTP authentication on top of HTTPS

DRBD version 0.7 support has been removed, DRBD 8 is the only
supported version (when migrating from Ganeti 1.2 to 2.0, you need
to migrate to DRBD 8 first while still running Ganeti 1.2)

DRBD devices are using statically allocated minor numbers, which
will be assigned to existing instances during the migration process

there is support for both Xen PVM and Xen HVM instances running on
the same cluster

KVM virtualization is supported too

file-based storage has been implemented, which means that it is
possible to run the cluster without LVM and DRBD storage, for
example using a shared filesystem exported from shared storage (and
still have live migration)

new --hvm-nic-type and --hvm-disk-type flags to control the
type of disk exported to fully virtualized instances.

provide access to the serial console of HVM instances

instance auto_balance flag, set by default. If turned off it will
avoid warnings on cluster verify if there is not enough memory to fail
over an instance. in the future it will prevent automatically failing
it over when we will support that.

batcher tool for instance creation, see tools/README.batcher

gnt-instancereinstall--select-os to interactively select a new
operating system when reinstalling an instance.

when changing the memory amount on instance modify a check has been
added that the instance will be able to start. also warnings are
emitted if the instance will not be able to fail over, if auto_balance
is true.

documentation fixes

sync fields between gnt-instancelist/modify/add/import

fix a race condition in drbd when the sync speed was set after giving
the device a remote peer.

Instance allocator support. Add and import instance accept a
--iallocator parameter, and call that instance allocator to decide
which node to use for the instance. The iallocator document describes
what’s expected from an allocator script.

gnt-clusterverify N+1 memory redundancy checks: Unless passed the
--no-nplus1-mem option gnt-clusterverify now checks that if a
node is lost there is still enough memory to fail over the instances
that reside on it.

gnt-clusterverify hooks: it is now possible to add post-hooks to
gnt-clusterverify, to check for site-specific compliance. All the
hooks will run, and their output, if any, will be displayed. Any
failing hook will make the verification return an error value.

gnt-clusterverify now checks that its peers are reachable on the
primary and secondary interfaces

gnt-nodeadd now supports the --readd option, to readd a node
that is still declared as part of the cluster and has failed.

gnt-*list commands now accept a new -o+field way of
specifying output fields, that just adds the chosen fields to the
default ones.

gnt-backup now has a new remove command to delete an existing
export from the filesystem.

New per-instance parameters hvm_acpi, hvm_pae and hvm_cdrom_image_path
have been added. Using them you can enable/disable acpi and pae
support, and specify a path for a cd image to be exported to the
instance. These parameters as the name suggest only work on HVM
clusters.

When upgrading an HVM cluster to Ganeti 1.2.4, the values for ACPI and
PAE support will be set to the previously hardcoded values, but the
(previously hardcoded) path to the CDROM ISO image will be unset and
if required, needs to be set manually with gnt-instancemodify
after the upgrade.

The address to which an instance’s VNC console is bound is now
selectable per-instance, rather than being cluster wide. Of course
this only applies to instances controlled via VNC, so currently just
applies to HVM clusters.