As someone who's beginning to delve into bioinformatics, I'm noticing that like biology there are industry standards here, similar to Illumina in genomics and bowtie for alignment, many people use bash as shell.

$\begingroup$I would adjust the examples you provided. Illumina is a standard for short reads, but there are many genomics labs working mostly with PacBio or Nanopore. Bowtie is hardly a standard. Even versions 1 and 2 are very different.$\endgroup$
– burgerJun 1 '17 at 16:41

$\begingroup$@burger what do you suggest then?$\endgroup$
– EMillerJun 1 '17 at 16:47

$\begingroup$No suggestion. Although I agree with all the answers so far, bioinformatics is not good with standards. Even something like a SAM/BAM file that is technically a properly defined standard that almost everyone in genomics uses has many fields that are treated differently, causing issues for a lot of tools.$\endgroup$
– burgerJun 1 '17 at 16:58

3

$\begingroup$A statement "this is not meant to be opinionated" doesn't help much with such a broad question like this. Do you have a particular application that you would like to use a shell for, or an indication of what "industry" you're interested in?$\endgroup$
– gringerJun 1 '17 at 19:54

5 Answers
5

Bioinformatics tools written in shell and other shell scripts generally specify the shell they want to use (via #!/bin/sh or e.g. #!/bin/bash if it matters), so won't be affected by your choice of user shell.

If you are writing significant shell scripts yourself, there are reasons to do it in a Bourne-style shell. See Csh Programming Considered Harmful and other essays/polemics.

A Bourne-style shell is pretty much the industry standard, and if you choose a substantially different shell you'll have to translate some of the documentation of your bioinformatics tools. It's not uncommon to have things like

Set some variables pointing at reference data and add the script to your PATH to run it:

These will typically be shown in Bourne-shell syntax. By using a different shell you have to translate the export commands to your local syntax, and especially PATH munging is somewhat shell-dependent.

If you're experienced in Unix, this will be only a minor niggle. If you're a beginner, IMHO this will add a non-negligible amount of friction on top of all the other things you're learning.

$\begingroup$Do not use #!/bin/bash in the shebang. Having Bash installed in a nonstandard location is common enough that doing so will break often. Use #!/usr/bin/env bash instead, it should have no disadvantage.$\endgroup$
– Konrad RudolphJun 18 '19 at 11:24

SH adheres to an official industry standard, but it is not suitable for scientific computing. Bash is considered an informal standard (e.g., by Google). Bash 3 is preferable in most of situations in the world of bioinformatics.

Long answer

As already described in other answers, SH (/bin/sh, plain Bourne shell, the original UNIX shell) should fully adhere to POSIX which is a real industry standard. However, SH is too limited for scientific computing since many key features were incorporated later in SH successors, especially in Bash (/bin/bash, Bourne Again Shell): set -o pipefail, [[ ... ]], or process substitutions < () to name at least few.

In practice, it is much more difficult to write "safe" scripts in pure SH and only shell experts are usually capable to prevent unexpected behavior. For example, it may be hard to ensure that no command in a pipeline failed in the middle of computation. For Bash, various easy-to-follow defensive-programming recommendations have been developed and they should prevent all these problems. From this reason, many computer scientists, software engineers and companies use Bash as a kind of a standard. For instance, Google internal policy allows only Bash for writing any shell scripts.

Even though we cannot expect that Bash is present on completely every Unix machine (e.g., on mobile devices as @terdon pointed out), a vast majority of *nix machines used for scientific computation should have it. We also should be aware of the fact that Bash can be slower than SH and that it has recently suffered from major security issues. Moreover, various Bash versions exist and scripts working on modern Linux machines with Bash 4 might not work on OS X, which is still based on Bash 3.

To sum up, Bash 3 is probably be the most reasonable choice for scientific computing.

Edit

I addressed the comments from @terdon and @John Marshall. In particular, I added an explanation why Bash is more suitable for scientific computing than SH (in my opinion).

$\begingroup$Bash is not present in every Unix machine, sh is and that is not the same. Yes, Linux tends to have /bin/sh pointing to bash, but Linux is not Unix and, anyway, even in Linux /bin/sh is not always bash (Debian based systems use dash instead, for example). You can safely expect the Bourne shell (sh) to be present on a POSIX-compliant system, but not necessarily the Bourne again shell (bash).$\endgroup$
– terdonJun 2 '17 at 14:28

$\begingroup$@terdon Could you provide any reference, please? According to wiki.debian.org/Bash, bash is the default shell on Debian. Do you know about any (modern) *nix distro where bash would not be installed?$\endgroup$
– Karel BrindaJun 2 '17 at 14:47

1

$\begingroup$@Karel: Asking about the “default shell” is ambiguous. As per wiki.debian.org/Shell, these days on Debian the default /bin/sh is dash while the default login shell (as listed in /etc/passwd) remains /bin/bash. This means that portable shell scripts that identify themselves with #!/bin/sh need to restrict themselves to POSIX shell facilities, while scripts that want to use bash extensions need to use #!/bin/bash. This got tidied up the hard way a few years ago when various distributions switched to dash for /bin/sh…$\endgroup$
– John MarshallJun 3 '17 at 17:31

1

$\begingroup$@terdon @John Marshall Thank you for your comments. Compared to bash, I consider "pure" sh very limited and inappropriate for scientific computing, in particular because of some missing, but very important, features such as set -o pipefail or [[ ... ]]. My experience is that sh scripts can be very susceptible to unexpected behavior (unless the developer is a shell expert, which is usually not the case in bioinformatics). Several good and simple defensive programming strategies for scientific computing exist for bash.$\endgroup$
– Karel BrindaJun 5 '17 at 15:26

1

$\begingroup$I wouldn't do "scientific computing" in a shell no matter what shell it was. The shell should be used for, at most, handling the plumbing for basic utilities and applications. Computing should be handled by utilities and applications designed for those tasks.$\endgroup$
– KusalanandaJun 5 '17 at 19:33

As far as I know, there is no shell that implement exactly what's specified by the standard, but both bash and ksh93 does a pretty good job of adhering to the standard along with their own, sometimes conflicting, extensions. The ksh93 shell in particular has had a big impact on the past development of the POSIX shell specification, but future POSIX specifications may borrow more from bash due to its wide use on Linux.

The bash shell is pretty much ubiquitous on Linux systems, and may be installed on all other Unices too. ksh93 is also available for most Unices but is usually not installed by default on Linux. ksh93 is available by default on at least macOS (as ksh) and Solaris.

If you are concerned about portability when writing a shell script (which is IMHO a good thing to be concerned about), you should make sure that you use only the POSIX utilities and their POSIX command line flags, as well as only use POSIX shell syntax. You should then ensure that you script is executed by /bin/sh which is supposed to be a shell that understands the POSIX specification. /bin/sh is often implemented by bash running in "POSIX mode", but it may also be dash, ash or pdksh (or something else) depending on what Unix you are using.

For a Linux user, the most difficult bit in writing a portable script is often not the shell per se, but the multitude of non-standard command line flags provided by the GNU implementation of the many shell utilities. The GNU coreutils (basic shell utilities) may, like bash, be installed on all Unices though.

Also note that bash, when running in POSIX mode (either when invoked as /bin/sh or with its --posix command line flag), is not strict about its POSIX conformance and may accept some syntax extensions to the POSIX standard.

I would not say bash as a "standard", but it is indeed likely to be the most widely used unix shell and available by default on most modern unix/linux distros. There are a few other more convenient shells like zsh that are broadly compatible with /bin/sh, but they are not as widely available. There is also C-shell and in particular its open-source implementation tcsh. C-shell is quite different from bash. Over ten years ago, I saw it was used from time to time, but nowadays, I rarely see its use, except by programmers from older generations.

The generic command sh is quite literally an industry standard, a POSIX standard, to be precise (IEEE 1003.2 and 1003.2a, available for purchase for hundreds of dollars at various websites). In theory, any script that starts with #!/bin/sh should conform to this standard. In practise, most Linux systems have a shell that is close to this standard, but has a few quirks and extensions.

Problems crop up when these quirks and extensions become standard practise in shell scripts. The Debian operating system changed to dash as their sh shell to encourage people to stop using "bashisms" in shell scripts that didn't specify a particular shell, i.e. those that began with #!/bin/sh. The dash shell tries to be as standards-compliant as possible:

dash is the standard command interpreter for the system. The current version of dash is in the process of being changed to
conform with the POSIX 1003.2 and 1003.2a specifications for the shell. This version has many features which make it
appear similar in some respects to the Korn shell, but it is not a Korn shell clone (see ksh(1)). Only features designated
by POSIX, plus a few Berkeley extensions, are being incorporated into this shell. This man page is not intended to be a
tutorial or a complete specification of the shell.

I'm not familiar with the differences, and generally try to stick to the sh man pages to instruct me with regards to correct standards-compliant shell scripts.

$\begingroup$Note that sh is not bash. Even on systems whose /bin/sh points to bash, being invoked as sh changes bash's behavior and causes it to run in POSIX-compliant mode. The "real" sh shell (bourne shell) is something else again and not the same as bash (bourne again shell).$\endgroup$
– terdonJun 2 '17 at 14:31

$\begingroup$In Debian the default interactive shell, i.e. the one you'll be using on the command-line is bash wiki.debian.org/Shell yes /bin/sh will be symlinked to /bin/dash but the one people will be using live will be bash.$\endgroup$
– Matt BashtonJun 6 '17 at 17:35