What performance should I expect from your SSH server?

Introduction

We receive many enquires from users requesting information about the performance of our server API and how it compares to other standard implementations. We decided to setup a laboratory test to provide some basic statistics and advice to users that would answer the following questions.

What's the maximum throughput can I expect for a single connection?

How does throughput scale with increasing connections?

What resources are required to support this maximum throughput.

Test Environment

Our test environment consisted of a 1U rack server running with the following specification:

Server

Operating System: Ubuntu 12.04

Processor: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz

Memory: 16GB

HDD: 2TB, SATA 6Gb/s NCQ

Network: 1 Gbps Interface

Test 1 - Throughout Requirements

The first test we setup was to determine the maximum throughput of the server for a single network interface and establish the CPU resource level required to support this throughput and how this scaled with multiple connections. As we already know that performance is heavily dependent on the type of cipher used in SSH connections we decided we would test two different ciphers, AES and Arcfour. We chose these because AES is the default cipher for our own implementation and many others whilst Arcfour is well known to be an efficient, fast cipher and should demonstrate the maximum performance of the API. We do not recommend using Arcfour in production.

We created a client script that allowed us to set the preferred cipher and initiate as many clients transferring a 500MB file as we required. We then started recording the throughput achieved by each execution of the script, increasing the number of connections with each iteration of the test.

The server was configured to only use a single transfer thread which restricted the use to a single CPU core on the server. Therefore our results would provide information on how much throughput a single transfer thread could handle

Test 1 - The Results

The table below provides the test results and the graph provides a more readable view.

As expected the Arcfour cipher was the better performing cipher with a single connection throughput of 97.2 MB/s but this comes with a compromise on security.

The AES cipher provided a single connection throughput of 53.3 MB/s.

As the connections scaled we see an even share of throughput distribution across the connection, the server (which remember is currently restricted to a single thread/CPU core) maintains a consistent throughput throughout the different iterations.

What does this data tell us?

My interpretation of the data is that we could expect a 2-core server with 2 transfer threads to handle the maximum throughput of a 1Gbps network interface. If we want to scale the server to handle more load then we need to ensure that we have 1 Gbps Network Interface to every 2 transfer threads / CPU Cores.

Arcfour with SHA1 - Not recommended for External Use / Internal Only

Connections

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

MB/s Average

97.2

48.6

33.0

24.7

20.4

16.7

14.6

12.8

11.5

10.3

9.4

8.7

8.0

7.4

6.9

6.5

6.2

5.8

5.5

5.2

Average Time

17

17

17

17

16

17

16

16

16

16

16

16

16

16

16

16

16

16

16

16

Total Time

17

34

50

67

81

99

113

129

144

161

176

191

206

222

239

255

268

284

300

317

MB/s Total

97.2

97.2

99.1

98.7

102.0

100.1

102.4

102.5

103.3

102.6

103.3

103.8

104.3

104.2

103.7

103.7

104.8

104.7

104.7

104.3

AES/128/CTR with SHA1 - Recommended for External Use

Connections

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

MB/s Average

53.3

30.0

20.4

15.6

12.2

10.3

8.8

7.7

6.9

6.3

5.7

5.3

4.8

4.4

4.0

3.7

3.5

3.3

3.1

3.0

Average Time

31

28

27

27

27

27

27

27

27

26

27

26

26

27

28

28

28

28

28

28

Total Time

31

55

81

106

135

160

187

215

239

263

292

312

344

376

414

444

469

503

531

554

MB/s Total

53.3

60.1

61.2

62.4

61.2

62.0

61.9

61.5

62.2

62.8

62.2

63.6

62.4

61.5

59.9

59.5

59.9

59.1

59.1

59.7

Test 2 - Comparison against OpenSSH Server

The next test we setup using the same client script and process was to perform the same test against our own server as well as an OpenSSH native server. This would provide a benchmark as to where our performance is compared to the most widely deployed SSH server.

This test repeats the same process as Test 1, however we placed no restriction on our own server and configured the number of transfer threads to 4 to match the number of cores available to OpenSSH. The transfers were still restricted to the single 1 Gbps network interface and we used the AES cipher.

Test 2 - Results

The table below provides the test results and the graph a more readable view.

Both servers scaled well with our own server achieving comparable performance with OpenSSH.

Connections

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

Maverick SSHD

Total Time

32

34

49

65

77

92

107

122

137

151

166

180

196

210

231

240

256

270

285

302

MB/s Average

51.6

48.6

33.7

25.4

21.5

18.0

15.4

13.5

12.1

10.9

10.0

9.2

8.4

7.9

7.2

6.9

6.5

6.1

5.8

5.5

Average Time

32

17

16

16

15

15

15

15

15

15

15

15

15

15

15

15

15

15

15

15

MB/s Total

51.6

97.2

101.2

101.7

107.3

107.8

108.1

108.4

108.6

109.4

109.5

110.2

109.6

110.2

107.3

110.2

109.7

110.2

110.2

109.4

OpenSSH

Total Time

30

34

50

64

78

92

106

122

136

151

165

180

195

210

224

240

253

269

283

298

MB/s Average

55.1

48.6

33.0

25.8

21.2

18.0

15.6

13.5

12.2

10.9

10.0

9.2

8.5

7.9

7.4

6.9

6.5

6.1

5.8

5.5

Average Time

30

17

17

16

16

15

15

15

15

15

15

15

15

15

15

15

15

15

15

15

MB/s Total

55.1

97.2

99.1

103.3

105.9

107.8

109.1

108.4

109.4

109.4

110.2

110.2

110.2

110.2

110.7

110.2

111.0

110.6

110.9

110.9

Conclusions

Our tests have demonstrated that the performance of the Maverick SSHD server is comparable to the native performance of an Open SSH server in our laboratory conditions. We have established a formula for ensuring that a server can maximise its resources when it needs to scale by ensuring that it has one NIC for each 2 CPU cores and that our own server should be configured with one transfer thread per core. Anything less than this and performance will almost certainly be compromised.