contact
University of Cambridge Home Department of Applied Mathematics
and Theoretical Physics
University of Cambridge > DAMTP > Waves Group > People > Ed Brambley
 
talisman dual nodes

Welcome to talisman

talisman nodes
The newer dual-core talisman nodes (d1–d8). The main talisman nodes (master, amulet, and nodes 1–11).

The talisman cluster is a collection of 21 Debian and Ubuntu GNU/Linux based computers. It has a total of 30 processors with an average speed of about 3GHz, 53 GB RAM, and 1.6 TB of hard-disk space. Each node can be used individually, and the entire system can also be used as a workstation cluster, with communication handled by MPI.

If you want an account on talisman, or have any questions not answered on this page, please contact the talisman admin.


Rules and regulations

There are certain rules and regulations that we expect users to follow. These are designed to ensure fair usage of the cluster.

  • You must obey the DAMTP computing rules and the rules of the Information Strategy and Services Syndicate.
  • Do not run code on an already-occupied processor core (competing for resources makes everything run slower). If only one core of a dual-core node is occupied, running your code on the free one is fine.
  • Any job longer than 15 minutes, and all jobs on master, should be niced (to at least 10).
  • No user should manually start jobs on more than half the processors. If you want to submit more than this you must use the queueing system.
  • The preceding rules can be broken when necessary, e.g. you are off to a conference and need results quickly. However, if you break the rules then you must email talisman admin telling us.
  • All code should be compiled with the maximum level of optimisation possible. This also means using machine optimised libraries (e.g. for linear algebra and fast Fourier transforms). See below for further details.
  • You must not use talisman for projects like distributed.net or SETI@home.

If we detect a breach of these rules, you are highly likely to have your account disabled and/or your running jobs killed.


How to use talisman

You can access the cluster by ssh to talisman.damtp.cam.ac.uk. This will give you a login shell on the master node of the cluster. From there you can reach the other nodes:

Node Type Processor(s) Memory Hard-disk space
master i686 Intel Pentium 4 (3.2 GHz) 2 GB 500 GB
node1 i686 Intel Pentium 4 (2.8 GHz) 1 GB 80 GB
node2 i686 Intel Pentium 4 (2.8 GHz) 1 GB 80 GB
node3
Decommissioned
i686 Intel Pentium 4 (2.8 GHz) 1 GB 80 GB
node4
Decommissioned
i686 Intel Pentium 4 (2.53 GHz) 1 GB 80 GB
node5
Decommissioned
i686 Intel Pentium 4 (2.53 GHz) 1 GB 80 GB
node6 i686 Intel Pentium 4 (2.53 GHz) 1 GB 80 GB
node7 i686 Intel Pentium 4 (3.2 GHz) 2 GB 80 GB
node8 i686 Intel Pentium 4 (3.2 GHz) 2 GB 80 GB
node9
Decommissioned
i686 Intel Pentium 4 (3.2 GHz) 2 GB 80 GB
node10 i686 Intel Pentium 4 (3.2 GHz) 2 GB 80 GB
node11 i686 Intel Pentium 4 (2.8 GHz) 1 GB 80 GB
amulet ia64 Intel Itanium 2 (2 x 1.3 GHz) 4 GB 40 GB
noded1 x86_64 Intel Pentium D (2 x 3.2 GHz) 4 GB 80 GB
noded2 x86_64 Intel Pentium D (2 x 3.2 GHz) 4 GB 80 GB
noded3 x86_64 Intel Pentium D (2 x 3.2 GHz) 4 GB 80 GB
noded4 x86_64 Intel Pentium D (2 x 3.2 GHz) 4 GB 80 GB
noded5 x86_64 Intel Pentium D (2 x 3.2 GHz) 4 GB 80 GB
noded6 x86_64 Intel Pentium D (2 x 3.2 GHz) 4 GB 80 GB
noded7 x86_64 Intel Pentium D (2 x 3.2 GHz) 4 GB 80 GB
noded8 x86_64 Intel Pentium D (2 x 3.2 GHz) 4 GB 80 GB

Compiling your code

For code to be run on the cluster, please compile it on the cluster as shown in the table below:

Nodes Compile on Compiler options
icc gcc
master
node1node11
master -O3 -xN -ip -O3 -ffast-math -fomit-frame-pointer -funroll-loops -march=pentium4 -mfpmath=sse,387 -ftree-vectorize
amulet amulet -O3 -ip -O3 -ffast-math -funroll-loops
noded1noded8 noded1 -O3 -xP -ip -O3 -ffast-math -funroll-loops -march=nocona -mfpmath=sse,387 -ftree-vectorize

To attempt to automatically parallelize your code, so that it uses both processors of a dual-processor node simultaneously, you could try giving the -parallel option to icc. If you're manually parallelizing using pthreads, you might want the extra option -pthread. If you're using OpenMP, use the -openmp option with icc or the -fopenmp option with gcc. Some tutorials on using either pthreads or openmp may be found in the links section below.

To run large parallel distributed-memory code, use MPI. See the MPI section below for details.

Data storage

talisman is intended for processing, rather than for storage. We take no backups and make no guarantees of reliability. We do not impose a disk quota system; if disk space fills up the biggest offenders will be required to free up space.

Your home directory on talisman is stored on master and shared across all nodes via NFS. To transfer data between your DAMTP home directory and your talisman home directory you can use scp or rsync. Local per-node scratch space is available on each of the nodes under /local/uid/, and is quicker and larger than the home directory space.

Although it is not recommended practice, it is possible to directly access the local storage on the nodes from the DAMTP network by using rsync with the option -e "ssh talisman nice ssh". Thanks to Jim McElwaine for suggesting this.

Running your code

There are two different ways in which you might want to run code on the cluster. The first is the simplest; choose an unoccupied processor of the correct type (sysload gives you the load averages) and then ssh to that node and run your job, preferably nice-ed. The second way is to use the queueing system.


Queueing system

We run the Sun GridEngine 6.0 queueing system on talisman. The queueing system takes all the hassle out of finding a vacant node to run code on, especially if you have a large number of jobs (say 100) to submit. GridEngine automatically avoids nodes where there's already code running, and as soon as any node becomes vacant it starts running jobs on it. This can make it difficult to find a free node manually if the queue's in use, in which case use the queue yourself and GridEngine will let everyone's code have a turn.

Queueing serial (non-parallel) jobs

In the simplest form, you submit a job to the queueing system using qsub; for example, qsub $HOME/a.out. This put the job "run a.out" into the queue. You can see what's in the queue by typing qstat (or qstat -r to also see which queues it will run on), and you can remove an item in the queue by typing qdel jobnum. Please don't be put off if the queue looks full, as GridEngine has an idea of fairness and will push an infrequent user's job to the front of the queue. Once your job gets run, its stdout and stderr will be piped to files in your home directory with suffixes .ojobnum and .ejobnum respectively.

Now for a more interesting example (this is how I use the queueing system for serial jobs). First, write a shell script that contains the commands you want to run. For example, I use something like:

 #!/bin/bash

bindir="/home/talisman/ejb48/bin_$(uname -m)"

$bindir/prog1 0.0 10.0 1000 > /local/ejb48/temp1.dat
$bindir/prog2 1000 < /local/ejb48/temp1.dat > /local/ejb48/temp2.dat
$bindir/prog3 0.0 10.0 /local/ejb48/temp1.dat /local/ejb48/temp2.dat > /home/talisman/ejb48/1000.dat

rm /local/ejb48/temp1.dat /local/ejb48/temp2.dat

	

(Of course, I have lots of these scripts with slightly different values.) Note that I select my binary directory based on uname -m, so that I run x86_64 binaries on x86_64 machines, and so on, meaning that the same script will work when run on any of talisman's nodes. Note, also, the use of the local scratch space to store temporary files. To submit a job like this to the queue to be run on master or node1node11, use qsub script (you may need qsub -b y /full/path/of/script in some circumstances). To submit a job to be run on noded1noded8 in addition to these, use qsub -q noded.q script. To submit a job to run on any node (including amulet), use qsub -q noded.q,amulet.q. To get a job to run only on (say) noded1noded8, follow the job submission with qalter -q noded.q jobnum, where jobnum is the number of the job you just submitted. You can also request specific resources; for example, to submit a job that needs 1.5GB of RAM, use qsub -l mem_free=1.5G. See the qsub man page for further details.

Queueing parallel jobs

The above is only for serial (i.e. non-parallel) jobs. To submit a parallel job, you need to use a Parallel Environment, or PE. There are two to choose from on talisman.

The first is for shared-memory parallel programs, such as those using OpenMP or PThreads. For either of these, when submitting the job using qsub you use the command-line option -pe openmp num_threads. Num_threads is either the number of threads you would like, or a range. For example, -pe openmp 2-4 would request a parallel job with between 2 and 4 threads (all running on the same node, of course). The environment variable NSLOTS is set to the actual number of threads allocated to this job, and you must not use more than this. For an OpenMP program, setting OMP_NUM_THREADS = $NSLOTS in your job's batch file will ensure this.

For real parallel jobs, i.e. distributed-memory jobs using MPI, we must use a different parallel environment. In this case, use -pe mpi num_processes. The MPI command to run the code in this case is simply mpirun program. There is no need to mess around with host files or the like, as this is all done for you by the queueing system.


MPI on talisman

In this section, only a simple example of how to run an MPI program on talisman is given. For further details, see the links section below.

To compile your code, use mpicc for C, or mpif77 or mpif90 for Fortran. For details of how to use these in Makefiles, or how to change the compiler used, please see the mpicc manpage.

To run your code, use mpirun -hostfile hostfile -np num_processes prog. Hostfile should be a list of hostnames, formatted as described in the mpirun manpage, and num_processes is the number of processes to run. This is made far easier by using the queueing system, as described above.

Here is a real-world example, submitted using the queueing system. First, the program, mpi_test.c (note that this is missing error checking, which should never be omitted):

#include <stdio.h>
#include <mpi.h>

int main(int argc, char** argv)
{
  int rank, size, processor_name_len;
  char processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Get_processor_name(processor_name, &processor_name_len);

  printf("Message from %.*s (aka %d of %d): you will be assimilated!\n",
         processor_name_len, processor_name, rank+1, size);
  
  MPI_Finalize();

  return 0;
}

We compile this using icc, so run OMPI_CC=icc mpicc -O3 -ip -xN mpi_test.c -o mpi_test_i686 on master. We then repeat this on noded1 with -xP instead of -xN and x86_64 instead of i686, and finally on amulet with no -xN and ia64 instead of i686. We next write a machine-independent wrapper, mpi_test.sh:

#!/bin/sh
exec "$HOME/mpi_test_$(uname -m)" "$@"

and make this executable by running chmod a+x mpi_test.sh. We now submit this to the queue. For this simple case, we'll do it all in one go: qsub -b y -q noded.q,amulet.q -pe mpi 1- mpirun ~/mpi_test.sh. If all goes well, we get four output files in our home directory, of which three are empty (the errors), and one contains the output from our MPI program.


Software libraries

There are a number of useful libraries installed on the cluster, which you are very strongly encouraged to use. Locally installed libraries live in /opt/. If you want other libraries installed, please let us know.

Intel MKL

The Intel Math Kernel library contains many useful routines; most relevantly machine optimised versions of LAPACK, BLAS, and FFTs. These are the fastest versions of these libraries available for Intel processors. The libraries live in /opt/intel/mkl/version/lib/arch, and there is extensive documentation in /opt/intel/mkl/version/doc/. The LAPACK User's Guide can be found on netlib.

The structure of the libraries to link has recently changed with MKL version 10. Please refer to the Intel Manual for details. If you're in a hurry, icc -xN -ip -O3 prog.c -o prog -L/opt/intel/mkl/version/lib/arch -lmkl -openmp may work.

FFTW

FFTW is the Fastest Fourier Transform in the West, or nearly. It is slower than the Intel FFTs in quite a few cases, but is still respectable, and is portable to other platforms. The latest version is FFTW3, which can be linked using -lfftw3.

The Intel MKL provides an FFTW3 interface. To use it, first check the FFTW to MKL wrapper page to make sure the function you want to use is supported, and then just change #include <fftw3.h> to #include <mkl_fftw3.h>.

Other libraries

If you would like any other libraries installing, please contact talisman admin.


Other Computing Resources

  • Condor: a system for running jobs on idle workstations across PWF machines in the University. Condor is intended to provide a significant computational resource for researchers in the University, particularly those who have a need for high throughput computing.

Useful Links

Parallel programming

POSIX threads (Pthreads)

OpenMP

  • OpenMP, from Lawrence Livermore National Laboratory.

MPI