This is an outline document to help point users to information helping with basic Linux/UNIX command line skills and other information related to use of the SDSU shared-use servers. A few examples are given of frequently used commands; comprehensive references are given in the links. Special attention should be paid to the section on “Tracking usage–yours and others,” because on a shared server your usage can impact on others.

Other sources of local documentation

To work through the examples here, you need an account on a Linux/UNIX system; the specific output shown here is from blackjack , the scheduler submit node of the main SDSU Linux cluster. Our link on the public SDSU website is http://www.sdstate.edu/technology/UNRC/Cluster/index.cfm. Information is provided there on how to get an account, and how to get logged in when you have your credentials, how to transfer files, etc.

To connect to the cluster and also InsideState, you need an sdstate network account (email ending in sdstate.edu ). If you are having trouble accessing Insidestate, email brian.moore@sdstate.edu to receive documentation by email.

UNRC has a Listserv mailing list with periodic announcements for SDSU research server Linux users. One way to view the list is to go to http://lists.sdstate.edu/scripts and navigate to the UNRC_RESEARCHCOMPUTING list; you can view previous messages there, as well as subscribe to the list.

Linux command line information–online sources

Much information on UNIX/Linux command line skills is available via web search. Try queries such as “linux command line tutorial.” Some links that have looked good to me are:

Many of the initial commands to learn are related to navigating the file system using the command line: listing, creating, deleting, moving files, moving among subfolders, etc.

passwd

For “password.” One of the first things you should do when you first log in to the system is to change your password. Use the passwd command and it will first prompt you for your old password, then give you a chance to choose a new one.

man

The man command is short for manual. For many important commands, you can find more information by typing man followed by the command name, e.g. man ls .

ls

The ls command will list your files in the directory where your command prompt currently resides. The ls command alone will give a short listing; this command, like many UNIX commands has many options which can be invoked with a dash (see the man page for ls). Below is an example.

This is the subfolder gsl of my home directory, /home/mooreb. It contains two folders (1.14, and gsl-1.14) and one other file. When the ls command has finished execution, the system prompt reappears.

cd

For “change directory.” When you log in your system prompt is in your home directory, in my case /home/mooreb . In the example above, after I logged in, I gave the command cd gsl to move into the vmd folder.

pwd

For “print working directory,” It reports where your system prompt is currently located. For example:

Notice that there is more than one way to specify a file’s path and name. If the file is in the folder where your system prompt is located (the location given by pwd ) then you need only give the name of the file, as in the first argument above. If the file is in a different folder, you can specify an absolute path, such as /home/mooreb/testfolder , or a relative path, such as ../testfolder . In the second case, the .. is an abbreviation for the folder one level up from current.

For the cp command, if the last argument is a folder, all the preceding files are copied into the folder.

scp

For “secure copy” but more importantly remote secure copy. Similar to cp but either source or target is on another system. See Wikipedia article “Secure Copy.” Example: copying a file from blackjack to another system:

Command line completion

We can quickly type names like gsl-1.14.tar.gz and much longer with only a few keystrokes. Most Linux systems (including ours) have a command line completion feature, something like what you might do when texting on a cell phone. Type a few characters of the name, then hit the tab key, and it will complete with the longest string of unique characters possible.

mv

For “move.” Syntax is similar to cp , i.e. it takes at least two arguments, but in this case the target (second argument) is replaced by the source (first argument); if the target is a folder, the source file/folder is put in that folder. If the target is a file, it is overwritten. mv is also used to rename files.

rm

For “remove.” The rm command will silently remove (delete) the file given after the command. Be careful! There is no “Trash” area; if you remove a file by mistake, you may not be able to get it back. Sometimes it is good to get in the habit of using the -i option with rm . Then it will prompt you first.

rmdir

For “remove directory.” The target directory must be empty.

wildcards

The wildcard character * can be added to many of the previous commands. For example, my example/sleep folder has many files that begin with the string sleep.pbs.o (these are output files generated each time the program is run with the PBS scheduler).

Wildcards can be used in many different ways, and the case above is just a very simple example. For instance, rm *.o* would probably have also accomplished the same effect.

Be careful using wildcards with the rm command without the -i option!

more

Often you will want to view a plain text file. The simplest way is with the cat command (e.g. cat filename). But if the file is too long for the screen it will scroll off the top. Use the more command instead and you can scroll through it slowly. Type the q key to get out of the more pager. There are other pagers besides more. If you want to use a pager where you can scroll both down and up in the file, try the less pager.

Tracking usage–yours and others

When you log into blackjack, you have a session on the current login/submit node of the SDSU cluster. As such we ask users to keep any processes they run on blackjack short (not more than a few minutes) and not too demanding in terms of CPU and RAM. You can test small instances of your jobs on blackjack, but then when it comes time to run them longer/bigger, use the PBS scheduler submit system to deploy the job to a node of the cluster. If you want to run longer interactive jobs, not on a node, you can use our test/development node, flapjack, or we can get you an account on one of the non-cluster servers.

Hence, when running on any of the SDSU Linux servers, it is very important to keep track of your own use and others of shared resources. The main categories to watch are disk, cpu and RAM.

df

For “disk free.” It will report the usage on all mounted file systems. I like using the -h option to get output in “human” terms (MB, GB).

One thing to take notice of especially is the usage on /home , as that is where all the users’ home folders are located. If you see /home filling up send an email to me and/or Bryan Rieger. The scratch area is for short-term storage for jobs about to run, currently running, or recently finished. If you need more space to run a job requiring large disk, email a request and we will create a folder for you on scratch .

du

For “disk usage.” It will report the usage of every subfolder of the current folder, and give a grand total at the end. Keep an eye on your usage. The -h switch gives the output in “human” terms (default is kB).

This is a typical output for blackjack; very little is running that shows up on cpu. If someone is running a process that creates a load, it will show up, near the top, if the cpu usage is high. To demonstrate, I ran an example, two core, cpu-intensive process:

In this case, you can see that I am running two cpu intensive processes on blackjack. Since there are a total of 12 processor cores, it is not “full” in terms of cpu load yet. Again, we remind users to only run short test jobs on blackjack itself. More demanding jobs should be run on the nodes using the scheduler.

You can change the sort on top by using the > (Shift-comma) or < (Shift-period) keys. Remember that you start on the CPU column by default. For example, after bringing up top, then hitting the > key once, you will be sorting on memory.

Running a command in background, input and ouput redirect, etc.

Compiling a simple C program

To illustrate running in background, etc., I have provided a very simple single-threaded example program on the cluster called prime_ex that does a prime number search. The files should be located in my home folder underneath the examples subfolder, i.e. in /home/mooreb/examples/C .

copying the example code

To compile the program yourself, you have to first copy the C source code into your home folder or a subfolder you create. For example as below:

compiling the program

For this simple C program, use the gcc compiler:

gcc prime_ex.c

This will create an executable file named a.out . You can rename a.out to whatever you like (using mv ), or you can add an option -o in the compile to explicitly name the executable created by the compile.

This just scratches the surface–this is only the very simplest of compile for one example language, C. For other languages, or programs that require linking to libraries, or large complex programs, many more options are available.

Running a program on the command line interactively

We will use the prime_ex program as an example for running interactively. The program computes all the prime numbers less than or equal to a given input number and writes back one line with the number of primes found. So this particular program takes as input interactively one integer number. To run this program for a small argument, from the command line:

mooreb@blackjack:~/testprogram> ./a.out Enter nmax: 100 Input nmax = 100 The number of primes less than or equal to 100 is 25 mooreb@blackjack:~/testprogram>

The program is written to write the prompt and wait for input. Then when it receives the number and the user hits return, the program runs and the output is written to the screen.

This program runs instantaneously when the input is 100; we can make that number bigger and bigger and eventually it will take a while to compute. So this is a small example of a program that would be useful to run in background (and, eventually, on a worker node, not blackjack itself).

input redirect

Our very simple prime_ex program does not take much in input, just one int, but some programs take very large input sets, so that it is inconvenient to type them all on the command line. You can put your inputs in a file and then redirect the file to the program as input.

To create an input file you can use an editor (see the section on Linux/UNIX editors) or you can just do:echo "1000" > infile
and it will create a textfile called infile with just the given echoed characters in it. Then use this as input redirect:./a.out < infile

output redirect

You will notice that when we run with input redirect, it still writes the output to the screen. We can make that output go to a file instead with an output redirect.

./a.out < infile &> outfile

Then you can use the more command to check the contents of the output file.

Note that output redirect does not have to be combined with putting the job in background. For example ls -al /home > lsfile will put what would have been produced by ls -al (a long listing of /home ) into the file lsfile ; in this case there is no point of putting the job in background since it runs so fast.

Running a program in background

Running from the command line is no problem if the program finishes right away, but what if you want to run a program that takes a long time? You don’t want it to hold up the terminal. For our example program, you can try slowly ramping up the value of your input integer and you will see it start to take time to finish. How long depends on system load. I found somewhere around 500000 it started to have a noticeable delay.

Now it responds with a number for the process and puts you back to the system prompt. While the program is running, the terminal is freed up for new commands for you to type while the program is running. If it is a very long process, you can even log off and it will still run in background. When the program is done, it will write to the redirected output file, just as before.

Note the response when you put the job in background; two numbers. The first, [1] , is the number of the process in terms of background processes attached this terminal. The second one, 11220 is the process ID (PID) on the system in general.

Killing a process

You can kill a process with the kill command. The most general way is to know the PID number associated with the process (e.g. the number 11220 above, shown when put in background). If you don’t know the PID of your process you want to kill, you can use the top command to try and find it. To limit the output of top, you can use the -u option, which restricts the display to a named user, e.g. top -u mooreb .

Piping command output to another command

You can send the output of any commands that produces text into another appropriate command. For example, if you were to list all the entries in /home, long listing you would use ls -al /home . You notice that the entries are too many for the screen. So instead use:

ls -al /home | more

and it will display the output through the more pager. This is just one simple example of piping! It can get much more detailed and complicated if necessary.

Using a simple text editor in the terminal

You will usually need to be able to perform some simple text editing on your input or other files on the system. Here we will take as an example copying a basic simple PBS script and editing it to add the command to our simple, single-core, prime number finder.

Working with this same example, we had the program in a folder beneath the home directory called testfolder. Copy a simple PBS script into that folder:

Now, open the file simple.pbs in a text editor. We chose the joe editor because it is simple, but you can use vi if you are familiar with it also.

mooreb@blackjack:~/testprogram> joe simple.pbs

You should see something like the following:

At this point you can move the cursor around with the arrow keys and type text or delete characters, etc. I don’t use joe frequently, but the only thing I need to know is always right on the screen, top right: Ctrl-K H for help. If you hit those keys it will bring up a few lines of help. Repeating Ctrl-K H will make the help menu disappear.

Here we want do to just a simple edit: add the line to run our program. So, position your cursor on a line below the last line (lines beginning with # are comments) and type:

./a.out < infile

When you are done, the screen should look something like:

Then use Ctrl-K X to save the file. Once you have the editor closed, you can use the more command to check the contents and make sure the edit was done how you wanted.

Why did we not use the exact command we had used on the command line, which was ./a.out < infile &> outfile & ?

First, you don’t want to put your job in background in the scheduler script as it defeats the whole point of the scheduler. When we run on the node, using PBS, you will see the job is automatically put in a batch mode, so there is no reason to put it in background.

Second, the scheduler will automatically produce an output file, so we don’t need to re-direct the output. We could, but for such a small job, here there is no need.

Submitting a PBS script

Note that we have a separate set of documentation with more detail on how to use the Moab/PBS scheduler. The section here is just to get you started on a very simple example.

Now, if we are in the folder that has the submit script, and the program, and the input file, we use the qsub command to submit the job.

mooreb@blackjack:~/testprogram> qsub simple.pbs 32873.bigjack

It responds with a number, which is our job number. Assuming the job does not finish immediately, we can use some helper commands to check on the status. The qstat command will show our currently running (or recently ended) jobs.

This does not seem to tell us much, but the most important thing to note is the “R” underneath the S (for status), which means the job is running. Other commonly seen codes are C (Complete), E (error) or Q (queued).

Here we see our job running, having been listed as asking for one process (the default). Note the time remaining, just nearly an hour; that is because our walltime limit was the default of one hour, and we’ve used just a little bit of it.

Now, if all goes well and the job finishes, you should have two new files in your folder: