Tag: ssh

Update to the latest version. Ansible 2.0 is slower than Ansible 1.9 because it included an important change to the execution engine to allow any user to choose the execution algorithm to be used. In the versions that followed, and mostly in 2.1, big optimizations have been done to increase execution speed, so be sure to be running the latest possible version.

Profiling Tasks

The best way I’ve found to time the execution of Ansible playbooks is by enabling the profile_tasks callback. This callback is included with Ansible and all you need to do to enable it is add callback_whitelist = profile_tasks to the [defaults] section of your ansible.cfg:

SSH multiplexing

The first thing to check is whether SSH multiplexing is enabled and used. This gives a tremendous speed boost because Ansible can reuse opened SSH sessions instead of negotiating new one (actually more than one) for every task. Ansible has this setting turned on by default. It can be set in configuration file as follows:

1

2

[ssh_connection]

ssh_args=-oControlMaster=auto-oControlPersist=60s

But be careful to override ssh_args — if you don’t set ControlMaster and ControlPersist while overriding, Ansible will “forget” to use them.

UseDNS

UseDNS is an SSH-server setting (/etc/ssh/sshd_config file) which forces a server to check a client’s PTR-record upon connection. It may cause connection delays especially with slow DNS servers on the server side. In modern Linux distribution, this setting is turned off by default, which is correct.

PreferredAuthentications

It is an SSH-client setting which informs server about preferred authentication methods. By default Ansible uses:

So if GSSAPI Authentication is enabled on the server (at the time of writing this it is turned on in RHEL EC2 AMI) it will be tried as the first option, forcing the client and server to make PTR-record lookups. But in most cases, we want to use only public key auth. We can force Ansible to do so by changing ansible.cfg:

Facts Gathering

At the start of playbook execution, Ansible collects facts about remote system (this is default behaviour for ansible-playbook but not relevant to ansible ad-hoc commands). It is similar to calling “setup” module thus requires another ssh communication step. If you don’t need any facts in your playbook (e.g. our test playbook) you can disable fact gathering:

1

gather_facts:no

Fork

Until this moment we discussed how to speed up playbook execution on a given remote host. But if you run playbook against tens or hundreds of hosts, Ansible internal performance becomes a bottleneck. For example, there’s preconfigured number of forks – number of hosts that can be interacted simultaneously. You can change this value in ansible.cfg file:

1

2

[defaults]

forks=20

The default value is 5, which is quite conservative. You can experiment with this setting depending on your local CPU and network bandwidth resources.

Another thing about forks is that if you have a lot of servers to work with and a low number of available forks, your master ssh-sessions may expire between tasks. Ansible uses linear strategy by default, which executes one task for every host and then proceeds to the next task. This way if time between task execution on the first server and on the last one is greater than ControlPersist then master socket will expire by the time Ansible starts execution of the following task on the first server, thus new ssh connection will be required.

Poll Interval

When module is executed on remote host, Ansible starts to poll for its result. The lower is interval between poll attempts, the higher is CPU load on Ansible control host. But we want to have CPU available for greater forks number (see above). You can tweak poll interval in ansible.cfg:

1

2

[defaults]

internal_poll_interval=0.001

If you run “slow” jobs (like backups) on multiple hosts, you may want to increase the interval to 0.05 to use less CPU.

Hope this helps you to speed up your setup. Seems like there are no more items in environment check-list and further speed gains only possible by optimizing your playbook code.

Asynchronous Actions and Polling

By default tasks in playbooks block, meaning the connections stay open until the task is done on each node. This may not always be desirable, or you may be running operations that take longer than the SSH timeout.

To avoid blocking or timeout issues, you can use asynchronous mode to run all of your tasks at once and then poll until they are done.

The behaviour of asynchronous mode depends on the value of poll.

Avoid connection timeouts: poll > 0

When poll is a positive value, the playbook will still block on the task until it either completes, fails or times out.

In this case, however, async explicitly sets the timeout you wish to apply to this task rather than being limited by the connection method timeout.

To launch a task asynchronously, specify its maximum runtime and how frequently you would like to poll for status. The default poll value is 15 seconds if you do not specify a value for poll:

Concurrent tasks: poll = 0

When poll is 0, Ansible will start the task and immediately move on to the next one without waiting for a result.

From the point of view of sequencing this is asynchronous programming: tasks may now run concurrently.

The playbook run will end without checking back on async tasks.

The async tasks will run until they either complete, fail or timeout according to their async value.

If you need a synchronization point with a task, register it to obtain its job ID and use the async_status module to observe it.

You may run a task asynchronously by specifying a poll value of 0:

YAML

1

2

3

4

5

6

7

8

9

10

11

---

- hosts: all

remote_user:root

tasks:

-name:simulate long running op,allowtorunfor45sec,fireandforget

command:/bin/sleep 15

async:45

poll:0

Enable fact_caching

By enabling this value we’re telling Ansible to keep the facts it gathers in a local file. You can also set this to a redis cache. See the documentation for details.

Fact_caching is what happens when Ansible says, “Gathering facts” about your target hosts. If we don’t change our targets hardware (or virtual hardware) very often this can be very helpful. Enable it by adding this to your ansible.cfg file:

Enable facts caching mechanism

If you still need some of the facts groups, but at the same time the gathering process is still slow for you, you could try use fact caching.

Caching enables Ansible to cache the facts for a given host in some kind of backend.

Rsync is a fast and extraordinarily versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers a large number of options that control every aspect of its behavior and permit very flexible specification of the set of files to be copied. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use.

Rsync finds files that need to be transferred using a lqquick checkrq algorithm (by default) that looks for files that have changed in size or in last-modified time. Any changes in the other preserved attributes (as requested by options) are made on the destination file directly when the quick check indicates that the file’s data does not need to be updated.

While tar over ssh is ideal for making remote copies of parts of a filesystem, rsync is even better suited for keeping the filesystem in sync between two machines. Typically, tar is used for the initial copy, and rsync is used to pick up whatever has changed since the last copy. This is because tar tends to be faster than rsync when none of the destination files exist, but rsync is much faster than tar when there are only a few differences between the two filesystems.

Notice the trailing / on the file spec from the source side On the source specification, a trailing / tells rsync to copy the contents of the directory, but not the directory itself. To include the directory as the top level of whatever is being copied, leave off the /:

1

[root@host]# rsync -ave ssh remote_server:/home/backups .

By default, rsync will only copy files and directories, but not remove them from the destination copy when they are removed from the source. To keep the copies exact, include the — delete flag:

1

[root@host]# rsync -ave ssh -- delete remote_server:~one/reports .

If you run a command like this in cron, leave off the v switch. This will keep the output quiet (unless rsync has a problem running, in which case you’ll receive an email with the error output).

Using ssh as your transport for rsync traffic has the advantage of encrypting the data over the network and also takes advantage of any trust relationships you already have established using ssh client keys. For keeping large, complex directory structures in sync between two machines (especially when there are only a few differences between them), rsync is a very handy (and fast) tool to have at your disposal.

SSH is the most popular and secure method for managing Linux servers remotely. One of the challenges with remote server management is connection speeds, especially when it comes to session creation between the remote and local machines.

There are several bottlenecks to this process, one scenario is when you are connecting to a remote server for the first time; it normally takes a few seconds to establish a session. However, when you try to start multiple connections in succession, this causes an overhead (combination of excess or indirect computation time, memory, bandwidth, or other related resources to carry out the operation).

In this article, we will share four useful tips on how to speed up remote SSH connections in Linux.

1.Use Compression option in SSH

From the ssh man page (type man ssh to see the whole thing):

1

2

3

4

5

6

7

8

-CRequests compression of all data(including stdin,stdout,

stderr,anddata forforwarded X11 andTCP connections).The

compression algorithm isthe same used by gzip(1),andthe

“level”can be controlled by the CompressionLevel option forpro-

tocol version1.Compression isdesirable on modem lines and

other slow connections,but will only slow down things on fast

networks.The defaultvalue can be set onahost-by-host basis

inthe configuration files;see the Compression option.

1

ssh-Cusername@example.com

2.Force SSH Connection Over IPV4

OpenSSH supports both IPv4/IP6, but at times IPv6 connections tend to be slower. So you can consider forcing ssh connections over IPv4 only, using the syntax below:

1

# ssh -4 username@example.com

Alternatively, use the AddressFamily (specifies the address family to use when connecting) directive in your ssh configuration file (global configuration) or ~/.ssh/config (user specific file).

The accepted values are “any”, “inet” for IPv4 only, or “inet6”.

AddressFamily inet

3. Reuse SSH Connection

An ssh client program is used to establish connections to an sshd daemon accepting remote connections. You can reuse an already-established connection when creating a new ssh session and this can significantly speed up subsequent sessions.

using ip address is recommended so that even if you connect using different hostnames it uses same socket ( very useful when using ansible , pdsh )

4. Use Specific SSH Authentication Method

Another way of speeding up ssh connections is to use a given authentication method for all ssh connections, and here we recommend configuring ssh passwordless login using ssh keygen in 5 easy steps.

Once that is done, use the PreferredAuthentications directive, within ssh_config files (global or user specific) above. This directive defines the order in which the client should try authentication methods (you can specify a command separated list to use more than one method).

PreferredAuthentications=publickey

If you prefer password authentication which is deemed unsecure, use this.

1

ssh-o"PreferredAuthentications=password"username@example.com

5.Disable DNS Lookup On Remote Machine

By default, sshd daemon looks up the remote host name, and also checks that the resolved host name for the remote IP address maps back to the very same IP address. This can result into delays in connection establishment or session creation.

The UseDNS directive controls the above functionality; to disable it, search and uncomment it in the /etc/ssh/sshd_config file. If it’s not set, add it with the value no.