# increase read/write TCP buffer to allow for larger window sizes. This enables more data to be

# transferred without ACKs and in turn increasing the throughput

net.ipv4.tcp_rmem=40968738016777216

net.ipv4.tcp_wmem=40966553616777216

# Make room for more TIME_WAIT sockets due to more clients, and allow them to be reused if we run out of sockets

# Also increase the max packet backlog

net.core.netdev_max_backlog=50000# increase the length of the processor input queue

net.ipv4.tcp_max_syn_backlog=30000

net.ipv4.tcp_max_tw_buckets=2000000

net.ipv4.tcp_tw_reuse=1

net.ipv4.tcp_fin_timeout=10

# Disable TCP slow start on idle connections

net.ipv4.tcp_slow_start_after_idle=0

# If your servers talk UDP, also up these limits

net.ipv4.udp_rmem_min=8192

net.ipv4.udp_wmem_min=8192

net.core.somaxconn=1000

# recommended default congestion control is htcp

net.ipv4.tcp_congestion_control=htcp

# recommended for hosts with jumbo frames enabled

net.ipv4.tcp_mtu_probing=1

# Disable source routing and redirects

net.ipv4.conf.all.send_redirects=0

net.ipv4.conf.all.accept_redirects=0

net.ipv4.conf.all.accept_source_route=0

# Log packets with impossible addresses for security

net.ipv4.conf.all.log_martians=1

Increase max open files to 100,000 from the default (typically 1024). In Linux, every open network socket requires a file descriptor. Increasing this limit will ensure that lingering TIME_WAIT sockets and other consumers of file descriptors don’t impact our ability to handle lots of concurrent requests.

Decrease the time that sockets stay in the TIME_WAIT state by lowering tcp_fin_timeout from its default of 60 seconds to 10. You can lower this even further, but too low, and you can run into socket close errors in networks with lots of jitter. We will also set tcp_tw_reuse to tell the kernel it can reuse sockets in the TIME_WAIT state.

We won’t tune the total TCP memory (tcp_mem), since this is automatically tuned based on available memory by Linux.

NOTE: Since some of these settings can be cached by networking services, it’s best to reboot to apply them properly (sysctl -p does not work reliably).

Shell Limits

An application could be run as regular user on the host system. If so, you may need to give different limits to this user.

/etc/security/limits.conf (File Descriptors and Max # of processes)

1

2

3

4

5

6

7

8

9

10

11

12

# for just the user "nobody"

nobody soft nofile4096

nobody hard nofile63536

nobody soft nproc2047

nobody hard nproc16384

# for every users

*soft nofile100000

*hard nofile100000

Don’t set the hard limit in FD same as /proc/sys/fs/file-max. As this user could eat up all system FDs, then the entire system will run out of the FDs.

/etc/pam.d/sshd

It needs to load the modified limits.conf

1

2

3

4

5

6

7

8

# ensure pam includes our limits

session required pam_limits.so

# confirm it via running

$ulimit-n

TCP Congestion Window

Finally, let’s increase the TCP congestion window from 1 to 10 segments. This is done on the interface, which makes it a more manual process than our sysctl settings. First, use ip route to find the default route, shown in bold below:

1

2

3

4

5

6

$ip route

default via10.248.77.193dev eth0 proto kernel

10.248.77.192/26dev eth0 proto kernel scope link src10.248.77.212

Copy that line, and paste it back to the ip route change command, adding initcwnd 10 to the end to increase the congestion window:

To make this persistent across reboots, you’ll need to add a few lines of bash like the following to a startup script somewhere. Often the easiest candidate is just pasting these lines into /etc/rc.local:

1

2

3

4

5

defrt=`ip route|grep"^default"|head-1`

ip route change$defrt initcwnd10

Once you’re done with all these changes, you’ll need to either bundle a new machine image, or integrate these changes into a system management package such as Chef or Puppet.

Virtual Memory Tweak

Swap file

discussed above

Page Cache

Under Linux, the Page Cache accelerates many accesses to files on non volatile storage. This happens because, when it first reads from or writes to data media like hard drives, Linux also stores data in unused areas of memory, which acts as a cache. If this data is read again later, it can be quickly read from this cache in memory.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

# check memory status, the main memory currently used for the page cache is indicated in the "cached" column

$free-m

total used free shared buffers cached

Mem:159761519578101679153

-/+buffers/cache:587410102

Swap:200001999

# Writing to disk will first write to page cache (indicated as dirty)

# then periodically transfer to underlying storage device or you can have system call sync or fsync to flush it.

$dd if=/dev/zero of=testfile.txt bs=1Mcount=10

10+0records in

10+0records out

10485760bytes(10MB)copied,0,0121043s,866MB/s

$cat/proc/meminfo|grep Dirty

Dirty:10260kB

$sync

$cat/proc/meminfo|grep Dirty

Dirty:0kB

vm.dirty_ratio (default=20)

Percentage of total available memory that contains free and reclaimable pages at which a process that is generating disk writes will start writing out dirty data.

1

2

3

4

5

##Add this line##

vm.dirty_ratio=80

vm.dirty_background_ratio (default=10)

This value determines the percentage of memory that can contain dirty pages before the background kernel flusher threads start to write dirty pages to disk. If you have 1GB of RAM and you set this to 10 then it would take 100MB of dirty pages to begin the flush process.

1

2

3

4

5

##Add this line##

vm.dirty_background_ratio=5

vm.dirty_expire_centisecs (default=3000)

Value is expressed in 1/100 of a second. It defines the age at which dirty pages are eligible to be written to disk by the kernel flusher threads. This means that the longer this value is the higher the odds of data loss but also more time in memory if the program needs to use it again.

1

2

3

4

5

##Add this line##

vm.dirty_expire_centisecs=12000

File System Tweaks

1

2

3

4

5

6

vim/etc/rc.local

##Add this line##

echo noop>/sys/block/sda/queue/scheduler

Make sure that /etc/rc.local is executable, otherwise the changes will not get applied on reboot, a simple chmod +x /etc/rc.local should do the trick.