Welcome!

Advanced command line

Difficulty:intermediate

Estimated Time:20 minutes

Welcome to Intermediate Command Line and Bash Scripting

Introduction

Bash is a programming language commonly used as a command-line interpreter, or shell. It is the default shell on many operating systems, including OS X and many linux distributions, and you have been using it to complete these tutorials.

This tutorial will show a few slightly more advanced bash tricks and tips and unix commands that often come in useful, as well as introducing some of the inbuilt bash statements that can automate away much of the tedium of data wrangling.

Steps

Advanced command line

01: Introduction to bash

Bash introduction

As a programming language, bash has variables - named references to stored numbers or characters.

We can assign a variable using '=', with the variable name on the left and the value to store on the right.

VAR='Hello World'

The value can then be accessed by prepending a '$' to the variable name:

echo $VAR

or optionally surrounding it in curly braces - {} - which is useful when following the variable directly with other characters, which would otherwise be confused as part of the variable name:

echo ${VAR}s

echo $VARs

Limited variable manipulation

In some cases text manipulation may be more easily achieved through bash directly, rather than sed or awk. Bash includes a number of inbuilt variable manipulations, including:

Changing case

echo ${VAR,}

echo ${VAR,,}

echo ${VAR^}

echo ${VAR^^}

Sub-string removal - Use # to match a pattern from the beginning, and % to match a pattern from the end.

echo ${VAR#Hel}

echo ${VAR%World}

echo ${VAR#World}

Substitution - Use / to replace the first instance or // to replace all instances of a pattern. Particularly useful when renaming things

echo ${VAR/o/e}

echo ${VAR//o/e}

Environmental variables

Your computer will have various environmental variables already set. Of particular use are '$HOME', a variable pointing to the current users home directory, and $PATH, a variable containing all the paths that bash will look for programs on. Often $PATH must be added to when a user installs a program themself. e.g.
PATH=$PATH:$HOME/bin would allow you to run programs from a 'bin' folder in your home directory when you are working in any other directory.

02: Downloading data

Downloading data

To set us up for the next few sections, let's download some data files.

wget is a tool for downloading files through HTTP or FTP. It can be used to download entire websites or download multiple files matching a pattern. For now, we will just use it to fetch A. thaliana genome fasta files from ensembl FTP.

Make a directory to download the files into:

mkdir c_elegans

Change to that directory:

cd c_elegans

Download all files (*) in the chromosomes folder of C. elegans on UCSC:

03: Loops

Loops

One of the most useful aspects of bash is its ability to glue together the many useful and varied unix tools - we've already seen pipes and redirection in the previous scenario. Using loops helps automate away many of the tedius and repetitive tasks we have to do on a computer!

There are three types of loops:

For loops run a command as it goes through each element of a list or range.

While loops run commands while a condition is true

Until loops run commands until a condition is met

For loops

A for loop takes iterates over a list of words, in the following form:

for var in list;
do
command $var
done

As you type the command, you can press enter and bash will keep the same prompt until the loop is closed, or you can use ';' to split commands instead of new lines to keep everything on a single line.

An example that echos the file name and first few lines of the given files:

for value in *.fa.gz
do
echo $value
zcat $value | head
done

zcat is just the same as cat, but works on compressed (.gz) files.

While loops

A while loop takes the following form:

while [ condition ];
do
command
done

To view the first few lines of each file, similarly to above, we could use ls to list our files, and then pipe it to the while loop, using read to read each line.

ls *.fa.gz | while read value;
do
echo $value
zcat $value | head
done

04: Expansions

Brace expansion

A useful element of bash is brace expansion and command subsitution. Both evaluate the contents of the parenthesis.

Sequences

To get a sequence between two numbers or characters, you can write those two characters within curly braces and with two fullstops between them.

E.g.

echo {1..10}

echo {a..z}

By default it will return a sequence with a increment of 1. The increment can be specified by writing a third number:

echo {1..10..4}

echo {a..z..8}

Often file naming conventions are named or numbered according to some sequence, and in those cases the sequence brace expansion can come in handy.

Lists

Characters seperated inside curly braces with commas are expanded into seperate words, which is useful when you have a long string with only minor differences.

E.g.
ls /{,usr/,local/usr/}bin
The above command will list the folders /bin, /usr/bin, and /local/usr/bin. The brace expansion concatenate the surrounding text onto each of the inner strings.

Command substitution

Sometimes we want to use the output of a command in the command we are writing, but piping and redirection is impossible or too unwiedly. Bash evaluates commands written within ` ` or $() before the rest of the command.

echo "The date is: $(date)"

We could rewrite our for loop from the last page as follows:

for value in $(ls *.fa.gz);
do
echo $value
zcat $value | head
done

The true utility of command subsitution is in the evaluation of more complex commands, which can't be replaced by a simple glob (*.fa.gz)

05: Replacing loops with xargs and parallel

xargs and parallel

Often a loop can be replaced with xargs, a unix tool that iterates over a piped input and runs a command on it.

Using gunzip as an example command, we can uncompress each .gz file in the folder using xargs like this:

ls *.gz | xargs gunzip

Again, a simple glob would suffice here (gunzip *.gz).

Unix tools are often single-threaded, while any computer you are likely to use today will have multiple CPU cores. One of the advantages of xargs over a simple loop is that it can run jobs on multiple cores - drastically speeding up many tasks.

Unfortunately the katacoda learning environment only has one core - but your actual computer will almost certainly have multiple cores.

parallel is a drop-in replacement for xargs, which will run jobs in parallel automatically, and has a number of extra features.
For example, it can run jobs on external computers - particularly useful when using high performance clusters, and running multiple commands or nested loops is more easily done than in xargs.
It is a program that is worth bearing in mind if you find yourself using time-consuming single-process programs.

06: Process control

Process control

It is not always clear what processes are running. There are a few tools to display processes and resource usage.

List processes

The ps command prints out a list of running processes, by default only those that are run by the current user and controlled by a terminal.

ps

The -A flag will list all processes running on the computer.

ps -A

There are a variety of columns that can be specified in the output format. To see usage of memory and cpu, we can use %mem and %cpu with the -o argument. A full list of possible columns can be seen in ps's man page.

ps -A -o "pid comm %cpu %mem"

Controlling and stopping processes

Interrupting a foreground process

Here we will use the sleep command as a stand in for some other long-running command. Command-C will send an interrupt signal (SIGINT) to the foreground process.

sleep 8000

Rather than wait 8000 seconds, press Ctrl-C to interrupt the process.

Send a process to sleep

To pause a process and return to the command prompt, you can press Ctrl-Z. The process will pause and you will be able to type into the prompt again.
sleep 8000

Ctrl Z

To resume the process and bring it back to the foreground, you can use the fg bash command:

fg

Running a process in the background

Rather than bring it to the foreground, you can use bg to resume the process in the background

Ctrl-Z

bg

However, the process is still controlled by the terminal, can be brought back to the foreground using fg, and will stop if the terminal is closed.

disown will seperate the process from the terminal. This can be useful if you have run a long-running job on a remote computer, and now wish to disconnect without stopping the job.

Stopping background processes

Now that the sleep command is disowned from the current session, we can no longer stop it with Ctrl-C. Instead we must use the kill <pid> command, which send signals to a process with id . By default, it sends the 'SIGTERM' signal, which tells the process to terminate.

We can find the pid using ps:

ps -A | grep sleep

and then run kill <pid>

We could combine a few commands we have learnt to do it for us:

kill $(ps -A -o "pid comm" | grep sleep | cut -d ' ' -f2)

This will evaluate the commands in braces first, and then run kill on the output. First the pid and command name of each process is listed in two columns, grep selects only those lines that match 'sleep', and finally cut is used to extract the second (pid) column.

Sometimes programs misbehave and are unresponsive to terminate commands. If that happens, the SIGKILL signal can be sent using '-9' as a flag:

kill -9 <pid>

A few more user-friendly tools have been built on top of kill. For instance, pkill can replace our above command and send a signal based on a process name or user directly:

pkill sleep

07: Running programs with nohup

Running in the background using nohup

Rather than having to run a process and then disown, it can be good practice to use nohup to detach a process from the onset.

nohup sleep 5 will disown sleep from the terminal straight away, although nohup will still be in the foreground by default. Any output will be sent to nohup.out by default and if the terminal is closed, the process will continue.

nohup echo "hello" will write "hello" to nohup.out (cat nohup.out)

Any command can be sent to the background immediately by appending '&'. Therefore, it is common to run nohup as nohup <commands> &.

nohup and disown are particularly useful when running commands on a remote computer, where an interruption to the network would otherwise prematurely terminate the process. nohup is more widespread because it is an official POSIX (a set of standards for UNIX systems) command, while disown is specific to bash and a few other shells.

08: tmux

Terminal sessions with tmux

tmux is a terminal multiplexer - it creates a command-line session that you can enter in and out of quickly, and allows you to create 'windows' (similar to browser tabs) and 'panes' (splitting the screen into multiple prompts).

Again, this is particularly useful for remote work. Instead of having a program run detached and in the background, it can be run within tmux. It is then easy to jump in and out of tmux to continue working.

Installation

It is typically not installed by default. Unixes normally have a package manager. One of the most common, used by Debian and Ubuntu, is apt. To install tmux with apt run the following command and press Enter when prompted for confirmation:

sudo apt install tmux}

sudo is a tool that allows you to run a command as another user. By default, it will be the administrator (root) user, which is usually necessary to install programs through a package manager.

Starting out

To start a new tmux session, simply run tmux:

tmux

You should be able to tell you are inside a tmux session by the status bar at the bottom of the terminal. Anything run here won't be interrupted if the terminal is closed or network connection is lost. If you want to get out and have only a single pane open, you can type exit.

Controlling tmux

Prefix

Naturally, keys entered go to the command prompt. To control tmux, first enter the 'Prefix' key combination Ctrl-b. The key entered next will be sent to tmux instead. Pressing : will allow you to enter a longer command.

Detaching and attaching

For example, to detach from the tmux session, enter:

Ctrl-bd

Once detached, the tmux session will continue to run along with any commands you ran inside. To view running sessions, run:

tmux ls

You likely have just one session, named 0. Tmux will use incrementing numbers to name sessions if a name is not given.

To jump back in to a particular session type:

tmux a -t 0 where 0 is the name of the session.

tmux a will attach to the last created session. ('tmux a' is short for 'tmux attach').

Naming a new session

If you are inside a tmux session, detach from it again using Ctrl-b d.
Numbered sessions are fine, but if we have several running at once it is nicer to have names.

To make a new named tmux session, run the following replace [name] with whatever you like:

tmux new -s [name]

tmux ls should now show our new named session.

If you were outside of it, you could reattach to that specific session using:

tmux a -t [name]

Making windows

Windows in tmux are like tabs. After entering the prefix, the key to open a new one is c:

Ctrl-b c

You may notice the windows are listed on the bottom status bar, with a number and then the name of the command currently running in that window. You should have two windows, likely both running bash (0: bash and 1: bash). The current window is signified by a '*'.

To switch between windows, enter the prefix and then the number of the window.

Ctrl-b 0

Making panes

Windows can be split into panes.

To split a window horizontally, use %:

Ctrl-b %

To split a window vertically, use ":

Ctrl-b "

To switch between panes, use an arrow key in the direction of the pane you want to switch to, or use o to cycle round.

Ctrl-b o

Cursor control

Mouse support is available if you would prefer to switch between windows/panes and resize panes with a cursor, but only if the terminal you are using also supports it. Most do, but because this is in a web browser we can't try it out here.

Debugging Scenarios

Help

Katacoda offerings an Interactive Learning Environment for Developers. This course uses a command line and a pre-configured sandboxed environment for you to use. Below are useful commands when working with the environment.

cd <directory>

Change directory

ls

List directory

echo 'contents' > <file>

Write contents to a file

cat <file>

Output contents of file

Vim

In the case of certain exercises you will be required to edit files or text. The best approach is with Vim. Vim has two different modes, one for entering commands (Command Mode) and the other for entering text (Insert Mode). You need to switch between these two modes based on what you want to do. The basic commands are: