Contents


COSMOS User Guide


Started by Stuart Rankin May 2004
Last updated: by Andrey Kaliazin, 3 May, 2011

New user application form


Quick start

For security reasons, all COSMOS access is restricted to secure shell (SSH) logins which must recognise both you and the workstation from which you are logging on remotely. The application form has a box in which the (fully qualified) domain names or IP addresses of your workstation(s) can be supplied; additional hosts can be added on request later by contacting cosmos_sys.

New users will be contacted with details of their userid and password. Once these have been received, use e.g. the OpenSSH SSH client to log in to COSMOS via:

ssh -X userid 'at' universe.damtp.cam.ac.uk

Note that from within the DAMTP network, it is usually sufficient to say:

ssh -X universe

All DAMTP systems are pre-registered for access, but all COSMOS users, including those from DAMTP will have their accounts preconfigured for using COSMOS application stack in an optimal way. BASH is the default shell for all.

All COSMOS accounts share the environment set up using resource files from /home/cosmos/template/

Individual environments can be customised (shortcuts, aliases, paths, etc) using .bashrc.local. If any changes are made to the existing files, log in again to make them effective.

X-windows applications should work transparently across the (encrypted) SSH connection (if this doesn't seem to work try using ssh -X). The other machines in the facility, e.g. cosmogrid, microcosm or multiverse, are reached in a similar way. See Using SSH for more details.

The basic features of the COSMOS filesystems are as follows:

  1. Each user has a backed up, home directory for long-term storage under /home/cosmos/users (quota 100GB - but please do be sensible and clean up your folders regularly).
  2. Each user can write to a non-backed up directory under /home/cosmos-tmp intended for short-term scratch work.

Interactive use is confined (transparently) to the first 12 cpus and only 30 minutes of cumulative cputime can be accrued by a single job running interactively; there is a queue called express designed to handle larger test jobs.

To submit a batch job, first select the best queue by consulting the table below. List the commands necessary to start the job inside a simple file, called e.g. jobscript, remembering to use dplace if and when appropriate (the next step will print examples of this if the recommendation for the queue is not followed). Then, if, for example, you intend submitting to the smallqueue, run the command:

msub jobscript

This command steps through each required parameter, supplying the queue defaults and advising as needed. At the last step it generates a submission script containing suitable PBS directives and optionally submits it to the queueing system. Please try to be accurate in making resource requests, feeding the numbers reported by watchdog back into later runs if appropriate.

NB Follow the above procedure at least once to generate the latest form of the submission script (older versions should not be expected to work with the current system).

Job status can be monitored via the command showq.

Jobs can be killed before or during execution via the command canceljob.

During execution, the watchdog program may send advisory or warning messages which should be read.


Secure Shell (SSH)

Logging on

COSMOS

  1. Obtain your user id and register the local machine from which you wish to connect with the Cambridge administration.

  2. Issue the command:

  3. ssh cosmos_userid'at'cosmos.damtp.cam.ac.uk

    Enter your password when prompted.

  4. X clients running on COSMOS will normally display without any setenv DISPLAY, xhost or xauth preliminaries.

UNIVERSE

UINVERSE is the facility main compute system. COSMOS users can access this system via SSH in the same way as COSMOS itself. The public host keys for COSMOGRID (a.k.a. cosmos2) have fingerprints

 

Why use Secure Shell?

Secure Shell is being actively developed as a secure replacement for the common UNIX commands, rlogin, rsh, rcp, and rdist. It uses strong authentication methods to establish secure communications with a remote computer.

The principal benefits of SSH are:

  1. By default all information transmitted over the network after the initial machine-to-machine connection is strongly encrypted, including the user's password and any data sent back by e.g. an X application.

  2. SSH is easy to use and indeed much more straightforward to use with X windows than rlogin or rsh.

  3. Using rlogin typically requires setting an explicit value of DISPLAY on the remote system so as to point back to the local display; in addition, it is usually necessary to explicitly enable access to the local display either by issuing an xhost command on the local machine (very unsafe) or by passing a so-called "magic cookie" from the local to the remote system.

    In contrast SSH connects remote X programs to the local display automatically and out of sight of the user, without the need for any of these irksome preliminaries.

    By default, all X programs are directed down the secure (encrypted) channel to the local machine, and are thus also safe from prying eyes whilst in transit. The DISPLAY at the remote end can still be set in the usual way if desired, in which case the X connections will be directed along normal, insecure channels. Only in the case of old DGL applications on COSMOS such as buttonfly (which are not pure X applications) might this be desirable.

Obtaining Secure Shell

Many sites nowadays have some form of SSH installed centrally. The current implementation on COSMOS will operate with most of these.

The OpenSSH `portable' distribution can be downloaded from this UK mirror site. The complete distribution includes, in addition to the server program sshd (compatible with clients using SSH protocol versions 1 and 2), the following client programs: ssh, scp (compatible with servers offering SSH protocol versions 1 and 2) and sftp (compatible with servers offering SSH protocol version 2 only).

A suite of free SSH tools for Win32 platforms (Win95 and later) is also available.

For other platforms, it may be useful to look at the University of Cambridge Computing Service's SSH CD.


Running jobs

Note that the queueing system described here is new and differs significantly from previous versions. We expect to make changes in response to experience and feedback.

Interactive use

Office hours interactive use: This is intended for interactive tasks such as code development, compilation, optimization, job submission and data analysis. Office hours are defined to be 10am-6pm, Monday-Friday.

Please note that interactive use is not appropriate for production code runs which should be performed in the batch queues. By default all interactive activity is automatically confined to the first 8 COSMOS cpus; because this is a limited slice of resources, also required by system services, a socially responsible attitude is essential. Note that there is a special queue called express which can be used for test jobs (up to 16 cpus, 128GB, 2 hours).

Please take care to set the environment variable OMP_NUM_THREADS to an appropriately small value when testing OpenMP jobs interactively (otherwise it is possible to launch enough worker threads to fill the entire system on only 8 cpus), and don't use dplace (to facilitate sharing). To set OMP_NUM_THREADS to N, perform one of the commands below, according to which shell you use:

(bash) $ export OMP_NUM_THREADS=N
(tcsh) > setenv OMP_NUM_THREADS N

Interactive use time-limit: The maximum time for jobs running outside the batch queue system is 30 minutes of cumulative processor time (note that a 4-cpu job can run for 120 minutes in the express queue). After this cpu time-limit is exceeded a job will be automatically terminated by the watchdog program.

Batch queues

Queue Access
Min CPUs
Default CPUS
Max CPUs
Max memory
Max real time
Max CPUs


per job
per job
per job
per job
per job
per queue








super
restricted*
32
64
128
384GB
8hr
144








large
all
16
16
64
256GB
8hr
144








small
all
1
4
16
96GB
8hr
144








express
all
1
4
16
128GB
2hr
148








* For access to the super queue, please contact cosmos_sys with details of your job requirements

Notes

Introduction to LSF

Please make sure jobs are submitted in line with the following procedure - in particular, if you have old bsub or qsub submission scripts, these will not work with the new queueing system and will need to be regenerated using the current automatic writer script (invoked by using the name of the queue, as in small, large etc), or by the cosmos_sub command (use without arguments for usage instructions).

The current batch queueing system is based around Platform LSF, which offers many benefits such as job-level run-time resource monitoring and load balancing. This immediately implies that any old qsub scripts (containing #QSUB style headers) from the era of Cray NQE won't work - they will need regenerating by using one of the currently recommended methods (see below).

Please check that you can run the command

 bqueues

and see a list of queues. If this command produces an error, please check that your bash or tcsh startup scripts are in line with the examples under /home/cosmos/template (all recently created accounts should be already). If this remark does not help please email cosmos_sys.

How to view queue information

Please refer to this summary table of queue parameters.

For more detail on a specific queue do e.g.

bqueues -l large

At the top of the output you should see a description string summarising the intent of the queue and the size of the expected jobs therein:

QUEUE: large
  -- Large COSMOS batch job queue. 
     16-64 cpus (default 16), 256GB memory max, 8 hours max real time per job.

Near the bottom, the USERS parameters indicates access:

USERS: all users

Note the absence of a DISPATCH_WINDOW parameter; in contrast to earlier versions of the queues, all queues accept and dispatch jobs all the time.

How to submit a job

Create a simplistic file containing the necessary job start commands, e.g.

cat > jobscript
cd /home/cosmos-tmp/your_username/mpijob
mpirun -np 16 dplace -s1 mpiprog < input.dat > output.dat
<Control-D>

The above example is an MPI job, but follow exactly the same procedure for a job using OpenMP parallelism (the OMP_NUM_THREADS environment variable will be set automatically). In the OpenMP case, the above would become (note the different option to dplace):

cat > jobscript
cd /home/cosmos-tmp/your_username/ompjob
dplace -x1 ompprog < input.dat > output.dat
<Control-D>

The use of dplace is as recommended for MPI and OpenMP jobs running in the main (i.e. non-aux) Altix queues; note that it cannot be used in the aux queue (jobs attempting to do so will fail to start). Life is more complicated if you have a job in more than one distinct piece or using hybrid MPI and OpenMP parallelism (e.g. later versions of cosmomc) - contact cosmos_sys for advice with these (also see this FAQ).

To submit this to e.g. large, do

large jobscript

The automatic writer script will now run and ask questions about the resources the job will need. It explains the implications of these choices and under what circumstances the job might be killed, or not start as a result of the values given.

Please be as accurate as possible.

Giving overlarge numbers may mean that LSF cannot find a suitable window in which to dispatch your job. On the other hand, giving numbers which are wild underestimates in order to ensure dispatch can lead to oversubscription of system resources and has the potential to bring COSMOS down - please don't do this.

Please note that run-time memory usage can now be monitored accurately. In the aux queue (and only there, at the time of writing) jobs may be killed for exceeding stated memory and cpu requirements by a significant margin.

Finally the script optionally submits the job to the chosen queue, or saves the submission script for manual editing or deferred submission.

Note that to submit such an automatically written script, possibly after manual edits, one should do

bsub < large.xxxx

whereas in NQE one might have done e.g. qsub cam-long.xxxx (without the <).

Recently a new method of submission became available - please try the (non-interactive) command cosmos_sub. Use without arguments for detailed usage information, however the following examples each submit the contents of jobscript as a job called JOB1 to the small queue, using 6 cpus and 100 MB of memory, to run for 120 minutes:

Non-OpenMP job - i.e. 6 processors but no OpenMP threads (e.g. pure MPI, serial farm):
cosmos_sub --submit --job JOB1 -q small -n 6 -s 100 -r 120 jobscript

Simple OpenMP job - i.e. 6 processes and 6 OpenMP threads (e.g. a single OpenMP binary):
cosmos_sub --submit --job JOB1 -q small -t 6 -s 100 -r 120 jobscript

Anything else using 6 processes but 2 OpenMP threads (e.g. cosmomc with 3 chains and 2 OpenMP threads, or farm of 3 OpenMP jobs, 2 threads each ETC):
cosmos_sub --submit --job JOB1 -q small -n 6 -t 2 -s 100 -r 120 jobscript

How to view job status

To examine job status, use

bjobs

JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
274     sjr20   PEND  large     cosmos.amtp                        May  4 20:15
275     sjr20   PEND  large     cosmos.amtp                        May  4 20:16

to list all your own jobs and

bjobs -u all

to list the jobs of all users. Appending the JOBID to bjobs restricts attention to the corresponding job, i.e.

bjobs 275

JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
275     sjr20   PEND  large     cosmos.amtp                        May  4 20:16

Adding an option -l produces more information, e.g. to find out why LSF has not dispatched job 275 (whose STATus is PENDing) one could do

bjobs -l 275

which produces among other verbose information:

 PENDING REASONS:
 User has reached the per-user job slot limit of the queue;

Jobs are initially pending (PEND), while they are awaiting scheduling and dispatch, then when dispatched for execution they enter the RUN state (see table below).

LSF job status values

STAT
Explanation
PEND
Job is pending (not yet started).
PSUSP
Job has been suspended by the user while pending.
RUN
Job is currently running.
USUSP
Job has been suspended by the user while running.
SSUSP
Job has been suspended by the system while running.
DONE
Job has exited normally (exit value 0).
EXIT
Job has exited abnormally (exit value non-zero).
UNKWN
or
ZOMBI
Indicates some system problem. Please contact cosmos_sys.

How to kill a job

First find the jobid from bjobs. Then:

bkill jobid

By default, this sends SIGTERM and then after a short delay SIGKILL.

More generally, to send, e.g., SIGKILL direct to jobid, do either:

bkill -s 9 jobid

or

bkill -s KILL jobid

Sending the SIGSTOP signal to sequential jobs or the SIGTSTP to parallel jobs is the same as using bstop.

Sending the SIGCONT signal is the same as using bresume.

How to suspend a job

Jobs can be suspended by the owner while pending (status becomes PSUSP) or while running (status becomes USUSP) using bstop.

To suspend job jobid do

bstop jobid

Running sequential jobs are sent the SIGSTOP signal and running parallel jobs the SIGTSTP signal in order to suspend them.

Alternatively bkill -s STOP can be used to achieve the same effect.

How to resume a job

Jobs which are in either the PSUSP or USUSP states can be resumed by the owner using bresume.

To resume job jobid do

bresume jobid

Running jobs are sent the SIGCONT signal.

Alternatively bkill -s CONT can be used to achieve the same effect.

Other LSF utilities

Another useful utility is bmod (see man bmod) which modifies job parameters after submission.

For additional information on all the above utilities, refer to Running Jobs with Platform LSF.

LSF Hierarchical Fairshare policy

Fair use of resources is controlled by LSF's Hierarchical Fairshare feature (configured at the host partition level).

The top-level share assignments are currently as follows:

camplanckipsdmouser
100062562530010

The command bhpart -r displays the current dynamic priority of all groups and users (based on recent usage and shares), and also shows which of the above resource assignment groupings contains each user. The dynamic priority applies across all Altix queues.


Disk space and backups

Filesystems

The bulk of the disk space on COSMOS has been configured to provide one 100GB, one 1.4TB and one 3.5TB shared filesystem. These are CXFS filesystems regarded as locally attached by each of cosmos, cosmogrid and microcosm, through a 200MB/sec fibre channel storage area network (SAN) connected to a (N+1) RAID disk vault. These hosts and peripherals are said to form a CXFS cluster.

I/O between these local hosts and the filesystems is coordinated via a metadata server (currently microcosm). Metadata-intensive operations (like listing a large directory) can be slightly slower when compared to an ordinary filesystem, attached to a single host; however large reads and writes take place at locally-attached speeds from all hosts.

/home/cosmos

This 100GB filesystem is intended for long-term storage. There is a top-level subdirectory for each project group, under which each project member has a personal directory. There is a quota of 1GB in force on this filesystem, and it is subject to nightly backups (full backups made at midnight on Monday and retained for four weeks, incremental backups made at midnight on other days, recycled weekly). It is automounted within the local network.

/home/cosmos-med

This is a 1.4TB filesystem intended for medium-term scratch work. Each user has a personal directory. There is a quota of 64GB in force, and it is not backed up. It is automounted within the local network.

/home/cosmos-tmp

This is a 3.5TB filesystem intended for short-term scratch work. Each user has a personal directory. There are no quotas currently enforced, but it is not backed up. It is automounted within the local network. Note finally that files may be automatically migrated to the offline RAID store if free space falls critically low. Migration and recall of files is transparently managed by DMF.

General guidelines

Please note the (soft) quotas applying to each filesystem (above); the hard quotas are slightly higher and allow up to 7 days use in excess of the soft limit. To check your own quota information issue the command:

quota -v

Files which are no longer needed for local processing must be deleted, or moved off the facility - please contact cosmos_sys for advice on how to do this for large amounts of data.

Users, who are not project members, must never exceed a total usage of 1 Gbyte.

All users must take care that their codes only output necessary data and use compression if appropriate to reduce the size of output files (e.g. using gzip, or bzip2).

Data unrelated to code running on COSMOS must not be stored on its filesystems.

NOTE

Data migration and recall

Note that DMF (Data Migration Facility) is now running and is automatically migrating data on /home/cosmos-tmp to the offline RAID.

This means that (preferentially old and large) files on /home/cosmos-tmp will be automatically migrated to a secondary RAID by the system in order to maintain a minimal reasonable level (currently 50%) of free disk space on the main RAID system.

A file migrated in this way will still be visible in its original position in the filesystem (directories are not migrated). The data it contains may also still be present on disk - such files are said to be dual-state, because the data is both online and offline. Other files, such as those targeted for space recovery, may have had their data blocks on disk released, in which case the data only exists on the secondary RAID - such files are offline.

To discover the migration status of your files, use dmls where you would ordinarily use ls. E.g.

dmls -al stephen60-copy-090502.cpio

-rw-r--r--    1 sjr20    GRadmin   15443844096 May  9 06:21 (REG) stephen60-copy-090502.cpio

Here (REG) means this is a regular (unmigrated) file, whereas in

dmls -al stephen60-copy-070602.cpio

-rw-r--r--    1 sjr20    GRadmin   15453611008 Jun  7 20:20 (DUL) stephen60-copy-070602.cpio

(DUL) indicates a dual-state file (data both on disk and on tape). The automatic migration is set to ensure a large fraction (70%) of total capacity on cosmos-tmp is either free or composed of dual-state files, to accelerate recovery of free space when it becomes necessary (at that point it is necessary only to release the data blocks). The owner of a dual-state file can release the data blocks manually via the dmput -r command - e.g. after

dmput -r stephen60-copy-070602.cpio

dmls says:

-rw-r--r--    1 sjr20    GRadmin   15453611008 Jun  7 20:20 (OFL) stephen60-copy-070602.cpio

OFL implies offline (data on tape only), and the space available on cosmos-tmp according to df should have increased. If the initial state had been (REG) rather than (DUL), dmput -r would have taken longer, as the data would have had to have been migrated to tape first. dmput without -r just converts regular files to dual-state files (merely migrating the data without finally releasing the data blocks held on disk).

Similarly, the dmget command operates on offline (OFL) files to recall their data blocks from tape, converting them to dual-state (DUL) files.

More simply, opening an offline file for reading or writing results in automatic recall and change in status to dual-state. For example, if the contents of a large offline data file like the above need recalling in readiness for a job later in the day, issue a command like:

file stephen60-copy-070602.cpio

to trigger the recall action. Just using touch doesn't work as that acts on the inode which is still on disk.

To efficiently recall all files in a directory named mydir and its subdirectories, use the following command:

dmfind mydir -state MIG -o -state OFL | dmget

Recall speed under a typical load is approximately 30 MB/s. Note that there is no deletion of data involved here - each piece of data exists in either 1 or 2 places, until the owner of the file deletes it from the filesystem using rm in the normal way, after which it is irrecoverable (since cosmos-tmp is a scratch filesystem, where scratch means not backed up).

DON'T USE rm ASSUMING THE DATA WILL STILL BE ON THE OFFILE STORAGE - UNLESS THE FILENAME
STILL APPEARS IN THE FILESYSTEM IT IS IRRETRIEVABLE.

For other useful user commands see the man pages:

dmput, dmget, dmls, dmfind, dmattr.


Usage guidelines

This section details the guidelines by which COSMOS users have agreed to abide. There are two distinct categories of user guidelines: (i) those required by the University of Cambridge and (ii) those specified by the CCC consortium and the COSMOS team. The guidelines presume that all users will cooperate in sharing this resource efficiently and politely.

University of Cambridge IT conditions of use

The rules made by the University of Cambridge are largely in common with those of any academic institution. For COSMOS users a summary of these is provided for convenience in the following web page:

University of Cambridge Information Technology Syndicate rules

By logging onto COSMOS, users automatically agree to abide by these rules and guidelines. The user application form assumes that they have been read and understood.

Further conditions of use

A. General

The following additional conditions of use are consistent with regulations set by national centres such as the EPCC. In addition, because of our matching funds arrangement with SGI/Intel users are obliged to acknowledge our sponsors.
  1. Sharing of accounts or passwords is not permitted under any circumstances. (Project members have group access to each other's files so this is not required).
    Please note that it is a vital part of security that passwords are chosen properly. Dictionary words in particular (in any language) offer little if any protection against modern cracking programs. Passwords must be at least 8 characters in length, contain non-alphanumeric characters and avoid any elements derived from personal information such as name, nationality, institution, location etc, or from media references or car registration plates. Tests will occasionally be performed to detect weak passwords.

    Note that the security of the facility rests primarily on the security of user passwords. Do NOT write your COSMOS password on a postit and stick it to your screen! If you access the facility from X windows, Do NOT run the command xhost + - the last time we were hacked, this is how it happened. (The user's home institution was hacked first, incidentally.)

  2. Users should maintain a collaborative link to one of the three CCC centres. Inactive users may be suspended (or resources reduced) after six months.
  3. Users may only make use of COSMOS for the purposes outlined in their User or Project Application forms and they are obliged to inform the CCC when this work has been completed.
  4. All users are under obligation to furnish the CCC with a report of the progress of their work as requested, on at least an annual basis.
  5. Publications of results from work performed on COSMOS must note the use of this UK-CCC facility which is supported by HEFCE and PPARC, while also including the following acknowledgement to our sponsors:
  6. Research conducted in cooperation with SGI/Intel
    utilising the Altix 3700 supercomputer.

B. Interactive use

The primary function of COSMOS is to perform large-scale numerical simulations; the following code of conduct for interactive use ensures the focus on this key purpose:
  1. COSMOS is available for interactive use - that is, compiling programs, pre- and post-processing data, submitting batch jobs etc. - only during weekday office hours; these are defined to be 10am-6pm, Monday-Friday.
  2. Project members may log on to monitor and submit jobs to batch queues outside of office hours.
  3. All significant cpu intensive applications must be submitted to the batch queues. Interactive jobs taking longer than 30 minutes of cumulative processor time will be terminated automatically - note that these are confined by the operating system to an 8-cpu sector of the machine, and that the express queue exists for test jobs.
  4. Small jobs which can be performed on workstations at local institutions should not be submitted to COSMOS.

C. Sharing batch queues

  1. Fairness: Individual users and project members must ensure that they are not using an unfair share of resources, particularly when working interactively (the batch queueing system enforces fair usage of resources automatically).
  2. Efficiency: Users must ensure that their jobs use computer resources efficiently, fulfilling stated scalability criteria. They must also optimize their code and use efficient algorithms. Processor usage efficiency is also easily monitored on the Altix.
  3. Users should not in general stack up more than ten jobs in any one queue.
  4. A job submitted under a specific batch queue must conform at run-time to the stated processor range and memory and cpu limits. If not, the job may be terminated automatically by the watchdog program, or by the system administrator if global performance is adversely affected by the unexpected demand on resources.
  5. Batch jobs will not necessarily run in the order in which they have been submitted - LSF implements a "hierarchical fairshare" policy to ensure fair usage and to prevent single users or projects dominating COSMOS for extended periods of time.
  6. Jobs should not automatically resubmit themselves. This is to avoid the development of infinite loops and other antisocial behaviour by failing scripts. Instead requests which continue earlier jobs should be submitted explicitly, and exit if the restart is not successful.

D. Disks and Tapes

Further information on storage is available.

The large temporary partition on COSMOS is available to all users and project members. Consequently, they must not exceed their fair share of disk space and so interfere with the work of others.

  1. Users must not substantially exceed their recommended user or project allocation of disk space without prior permission.
  2. Don't store data on /tmp.
  3. Be sure to recall data back from tape before you run your batch job.

E. Constructive attitudes

Please note that we cannot offer the same level of user support and help as the heavily staffed national centres.

There is a well-defined mechanism by which to request help from the COSMOS team if you are experiencing difficulties - see Getting help.


More information