Post navigation

How to Run Batch Operation on Cluster Computer

If you are pursuing your higher studies in the field of machine learning, its very likely that you have to “deal” with big data. By “deal”, I mean you might have to extract features, or say for example, find the eigenvectors and eigenvalues of large matrices. Carrying out these types of simulations on real or synthetic data on typical desktop/laptops may take days to run. Its suggested that you run these simulations on your University’s super computer or cluster computer. I am assuming that you have access to such a cluster.

These clusters are typically multi-core computers with huge amount of RAMs and tremendous processing powers. But you cannot just run any code on these clusters. You need a batch script (SLURM) to utilize the resources. You have access to a front end node and through that node you have to submit your batch job. I will show how to run a MATLAB code on the cluster using the sbatch command. I am assuming that you are on MacOSX/Linux and can use the terminal. If you are on Windows, install PuTTY to emulate the terminal.

Log in into the front end node. For example, if your username is abc123 and the front end host name is def.edu, type the following command in terminal:

ssh abc123@def.edu

Press enter and it will ask for your password. Type the password and press enter, you will be taken to the home directory of your remote host.

Create a directory in your home folder (optional).

mkdir TEST_MATLAB

cd TEST_MATLAB

Copy your necessary *.m files into the directory you just created. You can do that from terminal or you can use nice GUI programs like Fugu (for MacOSX) or WinSCP (Windows).

Now create a bash script like the following and save it as matlab_batch.sh. You can change the portions written in BOLD according to your specific situation. Copy this file to your remote directory as well.

Your output files should be copied to your remote home directory after the simulation is finished.

However, it might take some time to start the processing on the computing node depending on the current status of the node. You can check whether your job is being processed using the following command:

squeue -u USERNAME

or

squeue -j JOBID

You can also get an estimate on when your job will start:

squeue -u USERNAME –start

or

squeue -j JOBID –start

You can opt for email notification in the case of start, finish or abort events of your job using the following commands. Add the following lines in your shell script.

One thought on “How to Run Batch Operation on Cluster Computer”

Hello, your post is as close as I’ve found thus far online regarding how to run a specific program through my small, 4-part raspberry pi cluster computer. I do not know much about coding, but I do recognize patterns and have a slight familiarity with entering codes from various experts I rely on.

I’m wondering if you can help with my specific issue, which has more to do with writing a script that also utilizes mpiexec and machinefile to execute my program. I’m trying to run Kodi/XBMC and eventually plan to also use this small cluster computer as a file server or host–I’ll cross that bridge later. The command I’ve used to utilize the cluster is: mpiexec -f machinefile -n 1 kodi-standalone and, while this works, I’d like to simply write a script that automatically does this when my raspberry cluster is rebooted, or, conversely, to simply type at the terminal a one word command that then executes the mpiexec machinefile command.