Release notes for Gluster 3.10.0

This is a major Gluster release that includes some substantial changes. The
features revolve around, better support in container environments, scaling to
larger number of bricks per node, and a few usability and performance
improvements, among other bug fixes.

The most notable features and changes are documented on this page. A full list
of bugs that has been addressed is included further below.

Major changes and features

Brick multiplexing

Notes for users:
Multiplexing reduces both port and memory usage. It does not improve
performance vs. non-multiplexing except when memory is the limiting factor,
though there are other related changes that improve performance overall (e.g.
compared to 3.9).

Multiplexing is off by default. It can be enabled with

# gluster volume set all cluster.brick-multiplex on

Limitations:
There are currently no tuning options for multiplexing - it's all or nothing.
This will change in the near future.

Known Issues:
The only feature or combination of features known not to work with multiplexing
is USS and SSL. Anyone using that combination should leave multiplexing off.

Support to display op-version information from clients

Notes for users:
To get information on what op-version are supported by the clients, users can
invoke the gluster volume status command for clients. Along with information
on hostname, port, bytes read, bytes written and number of clients connected
per brick, we now also get the op-version on which the respective clients
operate. Following is the example usage:

# gluster volume status <VOLNAME|all> clients

Limitations:

Known Issues:

Support to get maximum op-version in a heterogeneous cluster

Notes for users:
A heterogeneous cluster operates on a common op-version that can be supported
across all the nodes in the trusted storage pool. Upon upgrade of the nodes in
the cluster, the cluster might support a higher op-version. Users can retrieve
the maximum op-version to which the cluster could be bumped up to by invoking
the gluster volume get command on the newly introduced global option,
cluster.max-op-version. The usage is as follows:

# gluster volume get all cluster.max-op-version

Limitations:

Known Issues:

Support for rebalance time to completion estimation

Notes for users:
Users can now see approximately how much time the rebalance
operation will take to complete across all nodes.

The estimated time left for rebalance to complete is displayed
as part of the rebalance status. Use the command:

# gluster volume rebalance <VOLNAME> status

Limitations:
The rebalance process calculates the time left based on the rate
at while files are processed on the node and the total number of files
on the brick which is determined using statfs. The limitations of this
are:

A single fs partition must host only one brick. Multiple bricks on
the same fs partition will cause the statfs results to be invalid.

The estimates are dynamic and are recalculated every time the rebalance status
command is invoked.The estimates become more accurate over time so short running
rebalance operations may not benefit.

Known Issues:
As glusterfs does not stored the number of files on the brick, we use statfs to
guess the number. The .glusterfs directory contents can significantly skew this
number and affect the calculated estimates.

Separation of tier as its own service

Notes for users:
This change is to move the management of the tier daemon into the gluster
service framework, thereby improving it stability and manageability by the
service framework.

This has no change to any of the tier commands or user facing interfaces and
operations.

Limitations:

Known Issues:

Statedump support for gfapi based applications

Notes for users:
gfapi based applications now can dump state information for better trouble
shooting of issues. A statedump can be triggered in two ways:

by executing the following on one of the Gluster servers,

# gluster volume statedump <VOLNAME> client <HOST>:<PID>

<VOLNAME> should be replaced by the name of the volume

<HOST> should be replaced by the hostname of the system running the
gfapi application

<PID> should be replaced by the PID of the gfapi application

through calling glfs_sysrq(<FS>, GLFS_SYSRQ_STATEDUMP) within the
application

<FS> should be replaced by a pointer to a glfs_t structure

All statedumps (*.dump.* files) will be located at the usual location,
on most distributions this would be /var/run/gluster/.

Limitations:
It is not possible to trigger statedumps from the Gluster CLI when the
gfapi application has lost its management connection to the GlusterD
servers.

GlusterFS 3.10 is the first release that contains support for the new
glfs_sysrq() function. Applications that include features for
debugging will need to be adapted to call this function. At the time of
the release of 3.10, no applications are known to call glfs_sysrq().

Known Issues:

Disabled creation of trash directory by default

Notes for users:
From now onwards trash directory, namely .trashcan, will not be be created by
default upon creation of new volumes unless and until the feature is turned ON
and the restrictions on the same will be applicable as long as features.trash
is set for a particular volume.

Limitations:
After upgrade for pre-existing volumes, trash directory will be still present at
root of the volume. Those who are not interested in this feature may have to
manually delete the directory from the mount point.

Known Issues:

Implemented parallel readdirp with distribute xlator

Notes for users:
Currently the directory listing gets slower as the number of bricks/nodes
increases in a volume, though the file/directory numbers remain unchanged.
With this feature, the performance of directory listing is made mostly
independent of the number of nodes/bricks in the volume. Thus scale doesn't
exponentially reduce the directory listing performance. (On a 2, 5, 10, 25 brick
setup we saw ~5, 100, 400, 450% improvement consecutively)

md-cache can optionally -ve cache security.ima xattr

Notes for users:
From kernel version 3.X or greater, creating of a file results in removexattr
call on security.ima xattr. This xattr is not set on the file unless IMA
feature is active. With this patch, removxattr call returns ENODATA if it is
not found in the cache.

The end benefit is faster create operations where IMA is not enabled.

To cache this xattr use,

# gluster volume set <VOLNAME> performance.cache-ima-xattrs on

The above option is on by default.

Limitations:

Known Issues:

Added support for CPU extensions in disperse computations

Notes for users:
To improve disperse computations, a new way of generating dynamic code
targeting specific CPU extensions like SSE and AVX on Intel processors is
implemented. The available extensions are detected on run time. This can
roughly double encoding and decoding speeds (or halve CPU usage).

This change is 100% compatible with the old method. No change is needed if
an existing volume is upgraded.

You can control which extensions to use or disable them with the following
command:

# gluster volume set <VOLNAME> disperse.cpu-extensions <type>

Valid values are:

none: Completely disable dynamic code generation

auto: Automatically detect available extensions and use the best one

x64: Use dynamic code generation using standard 64 bits instructions

sse: Use dynamic code generation using SSE extensions (128 bits)

avx: Use dynamic code generation using AVX extensions (256 bits)

The default value is 'auto'. If a value is specified that is not detected on
run-time, it will automatically fall back to the next available option.

Limitations:

Known Issues:
To solve a conflict between the dynamic code generator and SELinux, it
has been necessary to create a dynamic file on runtime in the directory
/usr/libexec/glusterfs. This directory only exists if the server package
is installed. On nodes with only the client package installed, this directory
won't exist and the dynamic code won't be used.

It also needs root privileges to create the file there, so any gfapi
application not running as root won't be able to use dynamic code generation.

In these cases, disperse volumes will continue working normally but using
the old implementation (equivalent to setting disperse.cpu-extensions to none).

More information and a discussion on how to solve this can be found here:

#1388010: [Eventing]: 'VOLUME_REBALANCE' event messages have an incorrect volume name

#1388062: throw warning to show that older tier commands are depricated and will be removed.

#1388292: performance.read-ahead on results in processes on client stuck in IO wait

#1388348: glusterd: Display proper error message and fail the command if S32gluster_enable_shared_storage.sh hook script is not present during gluster volume set all cluster.enable-shared-storage command