For security reasons, all COSMOS access is restricted to secure shell (SSH) logins which must recognise both you and the workstation from which you are logging on remotely. The application form has a box in which the (fully qualified) domain names or IP addresses of your workstation(s) can be supplied; additional hosts can be added on request later by contacting cosmos_sys.
New users will be contacted with details of their userid and password. Once these have been received, use e.g. the OpenSSH SSH client to log in to COSMOS via:
ssh -X userid 'at' universe.damtp.cam.ac.uk
|
Note that from within the DAMTP network, it is usually sufficient to say: ssh -X universe All DAMTP systems are pre-registered for access, but all COSMOS users, including those from DAMTP will have their accounts preconfigured for using COSMOS application stack in an optimal way. BASH is the default shell for all. All COSMOS accounts share the environment set up using resource files from /home/cosmos/template/ Individual environments can be customised (shortcuts, aliases, paths, etc) using .bashrc.local. If any changes are made to the existing files, log in again to make them effective. |
X-windows applications should work transparently across the (encrypted) SSH connection (if this doesn't seem to work try using ssh -X). The other machines in the facility, e.g. cosmogrid, microcosm or multiverse, are reached in a similar way. See Using SSH for more details.
The basic features of the COSMOS filesystems are as follows:
Interactive use is confined (transparently) to the first 12 cpus and only 30 minutes of cumulative cputime can be accrued by a single job running interactively; there is a queue called express designed to handle larger test jobs.
To submit a batch job, first select the best queue by consulting the table below. List the commands necessary to start the job inside a simple file, called e.g. jobscript, remembering to use dplace if and when appropriate (the next step will print examples of this if the recommendation for the queue is not followed). Then, if, for example, you intend submitting to the smallqueue, run the command:
msub jobscript
This command steps through each required parameter, supplying the queue defaults and advising as needed. At the last step it generates a submission script containing suitable PBS directives and optionally submits it to the queueing system. Please try to be accurate in making resource requests, feeding the numbers reported by watchdog back into later runs if appropriate.
NB Follow the above procedure at least once to generate the latest form of the submission script (older versions should not be expected to work with the current system).
Job status can be monitored via the command showq.
Jobs can be killed before or during execution via the command canceljob.
During execution, the watchdog program may send advisory or warning messages which should be read.
Obtain your user id and register the local machine from which you wish to connect with the Cambridge administration.
Issue the command:
ssh cosmos_userid'at'cosmos.damtp.cam.ac.uk
Enter your password when prompted.
X clients running on COSMOS will normally display without
any setenv DISPLAY, xhost or
xauth preliminaries.
The principal benefits of SSH are:
By default all information transmitted over the network after the initial machine-to-machine connection is strongly encrypted, including the user's password and any data sent back by e.g. an X application.
SSH is easy to use and indeed much more straightforward to use with X windows than rlogin or rsh.
Using rlogin typically requires setting an explicit value of
DISPLAY on the remote system so as to point back to the
local display; in addition, it is usually necessary to explicitly
enable access to the local display either by issuing an
xhost command on the local machine (very
unsafe) or by passing a so-called "magic cookie" from the
local to the remote system.
In contrast SSH connects remote X programs to the local display automatically and out of sight of the user, without the need for any of these irksome preliminaries.
By default, all X programs are directed down the secure (encrypted)
channel to the local machine, and are thus also safe from prying eyes
whilst in transit. The DISPLAY at the remote end can
still be set in the usual way if desired, in which case the X
connections will be directed along normal, insecure channels. Only in
the case of old DGL applications on COSMOS such as buttonfly
(which are not pure X applications) might this be desirable.
Many sites nowadays have some form of SSH installed centrally. The current implementation on COSMOS will operate with most of these.
The OpenSSH `portable' distribution can be downloaded from this UK mirror site. The complete distribution includes, in addition to the server program sshd (compatible with clients using SSH protocol versions 1 and 2), the following client programs: ssh, scp (compatible with servers offering SSH protocol versions 1 and 2) and sftp (compatible with servers offering SSH protocol version 2 only).
A suite of free SSH tools for Win32 platforms (Win95 and later) is also available.
For other platforms, it may be useful to look at the University of Cambridge Computing Service's SSH CD.
Please note that interactive use is not appropriate for production code runs which should be performed in the batch queues. By default all interactive activity is automatically confined to the first 8 COSMOS cpus; because this is a limited slice of resources, also required by system services, a socially responsible attitude is essential. Note that there is a special queue called express which can be used for test jobs (up to 16 cpus, 128GB, 2 hours).
Please take care to set the environment variable OMP_NUM_THREADS to an appropriately small value when testing OpenMP jobs interactively (otherwise it is possible to launch enough worker threads to fill the entire system on only 8 cpus), and don't use dplace (to facilitate sharing). To set OMP_NUM_THREADS to N, perform one of the commands below, according to which shell you use:
(bash) $ export OMP_NUM_THREADS=N |
Interactive use time-limit: The maximum time for jobs running outside the batch queue system is 30 minutes of cumulative processor time (note that a 4-cpu job can run for 120 minutes in the express queue). After this cpu time-limit is exceeded a job will be automatically terminated by the watchdog program.
| Queue | Access |
Min CPUs |
Default CPUS |
Max CPUs |
Max memory |
Max real time |
Max CPUs |
|---|---|---|---|---|---|---|---|
| per job |
per job |
per job |
per job |
per job |
per queue |
||
| super |
restricted* |
32 |
64 |
128 |
384GB |
8hr |
144 |
| large |
all |
16 |
16 |
64 |
256GB |
8hr |
144 |
| small |
all |
1 |
4 |
16 |
96GB |
8hr |
144 |
| express |
all |
1 |
4 |
16 |
128GB |
2hr |
148 |
* For access to the super queue, please contact cosmos_sys with details of your job requirements
Notes
The current batch queueing system is based around Platform LSF, which offers many benefits such as job-level run-time resource monitoring and load balancing. This immediately implies that any old qsub scripts (containing #QSUB style headers) from the era of Cray NQE won't work - they will need regenerating by using one of the currently recommended methods (see below).
Please check that you can run the command
bqueues
and see a list of queues. If this command produces an error, please check that your bash or tcsh startup scripts are in line with the examples under /home/cosmos/template (all recently created accounts should be already). If this remark does not help please email cosmos_sys.
For more detail on a specific queue do e.g.
bqueues -l large
At the top of the output you should see a description string summarising the intent of the queue and the size of the expected jobs therein:
QUEUE: large
-- Large COSMOS batch job queue.
16-64 cpus (default 16), 256GB memory max, 8 hours max real time per job.
Near the bottom, the USERS parameters indicates access:
USERS: all users
Note the absence of a DISPATCH_WINDOW parameter; in contrast to earlier versions of the queues, all queues accept and dispatch jobs all the time.
cat > jobscript cd /home/cosmos-tmp/your_username/mpijob mpirun -np 16 dplace -s1 mpiprog < input.dat > output.dat <Control-D>
The above example is an MPI job, but follow exactly the same procedure for a job using OpenMP parallelism (the OMP_NUM_THREADS environment variable will be set automatically). In the OpenMP case, the above would become (note the different option to dplace):
cat > jobscript cd /home/cosmos-tmp/your_username/ompjob dplace -x1 ompprog < input.dat > output.dat <Control-D>
The use of dplace is as recommended for MPI and OpenMP jobs running in the main (i.e. non-aux) Altix queues; note that it cannot be used in the aux queue (jobs attempting to do so will fail to start). Life is more complicated if you have a job in more than one distinct piece or using hybrid MPI and OpenMP parallelism (e.g. later versions of cosmomc) - contact cosmos_sys for advice with these (also see this FAQ).
To submit this to e.g. large, do
large jobscript
The automatic writer script will now run and ask questions about the resources the job will need. It explains the implications of these choices and under what circumstances the job might be killed, or not start as a result of the values given.
Please be as accurate as possible.
Giving overlarge numbers may mean that LSF cannot find a suitable window in which to dispatch your job. On the other hand, giving numbers which are wild underestimates in order to ensure dispatch can lead to oversubscription of system resources and has the potential to bring COSMOS down - please don't do this.
Please note that run-time memory usage can now be monitored accurately. In the aux queue (and only there, at the time of writing) jobs may be killed for exceeding stated memory and cpu requirements by a significant margin.
Finally the script optionally submits the job to the chosen queue, or saves the submission script for manual editing or deferred submission.
Note that to submit such an automatically written script, possibly after manual edits, one should do
bsub < large.xxxx
whereas in NQE one might have done e.g. qsub cam-long.xxxx (without the <).
Recently a new method of submission became available - please try the (non-interactive) command cosmos_sub. Use without arguments for detailed usage information, however the following examples each submit the contents of jobscript as a job called JOB1 to the small queue, using 6 cpus and 100 MB of memory, to run for 120 minutes:
bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 274 sjr20 PEND large cosmos.amtp May 4 20:15 275 sjr20 PEND large cosmos.amtp May 4 20:16
to list all your own jobs and
bjobs -u all
to list the jobs of all users. Appending the JOBID to bjobs restricts attention to the corresponding job, i.e.
bjobs 275
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 275 sjr20 PEND large cosmos.amtp May 4 20:16
Adding an option -l produces more information, e.g. to find out why LSF has not dispatched job 275 (whose STATus is PENDing) one could do
bjobs -l 275
which produces among other verbose information:
PENDING REASONS: User has reached the per-user job slot limit of the queue;
Jobs are initially pending (PEND), while they are awaiting scheduling and dispatch, then when dispatched for execution they enter the RUN state (see table below).
| STAT |
Explanation |
| PEND |
Job is pending (not yet started). |
| PSUSP |
Job has been suspended by the user while pending. |
| RUN |
Job is currently running. |
| USUSP |
Job has been suspended by the user while running. |
| SSUSP |
Job has been suspended by the system while running. |
| DONE |
Job has exited normally (exit value 0). |
| EXIT |
Job has exited abnormally (exit value non-zero). |
| UNKWN or ZOMBI |
Indicates some system problem. Please contact cosmos_sys. |
bkill jobid
By default, this sends SIGTERM and then after a short delay SIGKILL.
More generally, to send, e.g., SIGKILL direct to jobid, do either:
bkill -s 9 jobid
or
bkill -s KILL jobid
Sending the SIGSTOP signal to sequential jobs or the SIGTSTP to parallel jobs is the same as using bstop.
Sending the SIGCONT signal is the same as using bresume.
To suspend job jobid do
bstop jobid
Running sequential jobs are sent the SIGSTOP signal and running parallel jobs the SIGTSTP signal in order to suspend them.
Alternatively bkill -s STOP can be used to achieve the same effect.
To resume job jobid do
bresume jobid
Running jobs are sent the SIGCONT signal.
Alternatively bkill -s CONT can be used to achieve the same effect.
For additional information on all the above utilities, refer to Running Jobs with Platform LSF.
The top-level share assignments are currently as follows:
| cam | planck | ips | dmo | user |
|---|---|---|---|---|
| 1000 | 625 | 625 | 300 | 10 |
The command bhpart -r displays the current dynamic priority of all groups and users (based on recent usage and shares), and also shows which of the above resource assignment groupings contains each user. The dynamic priority applies across all Altix queues.
The bulk of the disk space on COSMOS has been configured to provide one 100GB, one 1.4TB and one 3.5TB shared filesystem. These are CXFS filesystems regarded as locally attached by each of cosmos, cosmogrid and microcosm, through a 200MB/sec fibre channel storage area network (SAN) connected to a (N+1) RAID disk vault. These hosts and peripherals are said to form a CXFS cluster.
I/O between these local hosts and the filesystems is coordinated via a metadata server (currently microcosm). Metadata-intensive operations (like listing a large directory) can be slightly slower when compared to an ordinary filesystem, attached to a single host; however large reads and writes take place at locally-attached speeds from all hosts.
quota -v
Files which are no longer needed for local processing must be deleted, or moved off the facility - please contact cosmos_sys for advice on how to do this for large amounts of data.
Users, who are not project members, must never exceed a total usage of 1 Gbyte.
All users must take care that their codes only output necessary data and use compression if appropriate to reduce the size of output files (e.g. using gzip, or bzip2).
Data unrelated to code running on COSMOS must not be stored on its filesystems.
NOTE
Note that DMF (Data Migration Facility) is now running and is automatically migrating data on /home/cosmos-tmp to the offline RAID.
This means that (preferentially old and large) files on /home/cosmos-tmp will be automatically migrated to a secondary RAID by the system in order to maintain a minimal reasonable level (currently 50%) of free disk space on the main RAID system.
A file migrated in this way will still be visible in its original position in the filesystem (directories are not migrated). The data it contains may also still be present on disk - such files are said to be dual-state, because the data is both online and offline. Other files, such as those targeted for space recovery, may have had their data blocks on disk released, in which case the data only exists on the secondary RAID - such files are offline.
To discover the migration status of your files, use dmls where you would ordinarily use ls. E.g.
dmls -al stephen60-copy-090502.cpio -rw-r--r-- 1 sjr20 GRadmin 15443844096 May 9 06:21 (REG) stephen60-copy-090502.cpio
Here (REG) means this is a regular (unmigrated) file, whereas in
dmls -al stephen60-copy-070602.cpio -rw-r--r-- 1 sjr20 GRadmin 15453611008 Jun 7 20:20 (DUL) stephen60-copy-070602.cpio
(DUL) indicates a dual-state file (data both on disk and on tape). The automatic migration is set to ensure a large fraction (70%) of total capacity on cosmos-tmp is either free or composed of dual-state files, to accelerate recovery of free space when it becomes necessary (at that point it is necessary only to release the data blocks). The owner of a dual-state file can release the data blocks manually via the dmput -r command - e.g. after
dmput -r stephen60-copy-070602.cpio
dmls says:
-rw-r--r-- 1 sjr20 GRadmin 15453611008 Jun 7 20:20 (OFL) stephen60-copy-070602.cpio
OFL implies offline (data on tape only), and the space available on cosmos-tmp according to df should have increased. If the initial state had been (REG) rather than (DUL), dmput -r would have taken longer, as the data would have had to have been migrated to tape first. dmput without -r just converts regular files to dual-state files (merely migrating the data without finally releasing the data blocks held on disk).
Similarly, the dmget command operates on offline (OFL) files to recall their data blocks from tape, converting them to dual-state (DUL) files.
More simply, opening an offline file for reading or writing results in automatic recall and change in status to dual-state. For example, if the contents of a large offline data file like the above need recalling in readiness for a job later in the day, issue a command like:
file stephen60-copy-070602.cpio
to trigger the recall action. Just using touch doesn't work as that acts on the inode which is still on disk.
To efficiently recall all files in a directory named mydir and its subdirectories, use the following command:
dmfind mydir -state MIG -o -state OFL | dmget
Recall speed under a typical load is approximately 30 MB/s. Note that there is no deletion of data involved here - each piece of data exists in either 1 or 2 places, until the owner of the file deletes it from the filesystem using rm in the normal way, after which it is irrecoverable (since cosmos-tmp is a scratch filesystem, where scratch means not backed up).
For other useful user commands see the man pages:
This section details the guidelines by which COSMOS users have agreed to abide. There are two distinct categories of user guidelines: (i) those required by the University of Cambridge and (ii) those specified by the CCC consortium and the COSMOS team. The guidelines presume that all users will cooperate in sharing this resource efficiently and politely.
By logging onto COSMOS, users automatically agree to abide by these rules and guidelines. The user application form assumes that they have been read and understood.
Note that the security of the facility rests primarily on the security of user passwords. Do NOT write your COSMOS password on a postit and stick it to your screen! If you access the facility from X windows, Do NOT run the command xhost + - the last time we were hacked, this is how it happened. (The user's home institution was hacked first, incidentally.)
The large temporary partition on COSMOS is available to all users and project members. Consequently, they must not exceed their fair share of disk space and so interfere with the work of others.
There is a well-defined mechanism by which to request help from the COSMOS team if you are experiencing difficulties - see Getting help.
The period 10:00-14:00 Wednesdays is reserved as a regular planned maintenance (PM) period. This is used to undertake essential system work such as reconfiguration, software updates, engineering visits etc. Although the system may remain available during this period, service may be more vulnerable than usual. Note however that this situation will be advertised, and if downtime is anticipated this will be flagged by email and posted on the news page, as far in advance as possible.
Whenever possible disruption to service following from planned site work, e.g. to the cooling plant, will be coordinated so as to coincide with the weekly PM. This may not be possible in all cases, however, and may lead to periods of "special maintenance". Similarly, the availability of hardware or engineers may lead to a PM outside the regular weekly slot.
We regret that we cannot guarantee to monitor service, or to fix problems, outside normal office hours. However the support team will do what it can so please email cosmos_sys if you become aware of a serious issue out of hours in the normal way (see Getting help).
In the case of an emergency, e.g. a serious environmental issue, a system failure potentially threatening data integrity, equipment or safety of personnel, we reserve the right to withdraw service without notice.
Please follow the procedures below according to the category of problem.
If serious problems occur outside office hours, regrettably we cannot guarantee to deal with them before the next working day. However, we are on the SGI critical list for system support, so we receive high priority treatment and we enjoy close links with SGI engineers.
Suggestions for improvements (e.g. new software or web page changes/corrections) are welcome, again please email cosmos_help.