New COSMOS user application form
Note: starting from November 2012 ALL users applying for a new or renewed account on COSMOS@DiRAC systems, should follow the SAFE registration pocedures for DiRAC, described on the DiRAC Wiki page The COSMOS User application form above is only needs to be filled in by the members of the COSMOS Consortium.
For security reasons, all COSMOS@DiRAC access is restricted to secure shell (SSH) logins, which must recognise both you and the workstation from which you are logging on remotely. The application form and the SAFE registration pages have a field in which the (fully qualified) domain names or IP addresses of your workstation(s) can be supplied; additional hosts can be added on request later by contacting cosmos_sys.
New users will be contacted with details of their userid and password. Once these have been received, use e.g. any SSH (ver.2) client to log in to COSMOS as:
ssh -X userid 'at' universe.damtp.cam.ac.uk
or
ssh -X userid 'at' cosmos.damtp.cam.ac.uk
|
Note that from within the DAMTP (wired) network, it is usually sufficient to say: ssh -X universe orssh -X cosmos All COSMOS users, including those from DAMTP, will have their accounts preconfigured for using COSMOS application stack in an optimal way. Please note that the BASH shell is the default shell for all users, without exceptions, for technical reasons. (It does not preclude anyone from running C-shell scripts, of course. All COSMOS accounts share the environment set up using resource files symbolically linked from /home/cosmos/template/ Peronal preferences can be customised (shortcuts, aliases, paths, etc) using .bashrc.local. If any changes are made to the existing files, log in again to make them effective. |
X-windows applications should work transparently across the (encrypted) SSH connection (if this doesn't seem to work, please ensure that you use ssh flag '-X'). See Using SSH for more details.
The basic features of the COSMOS filesystems are as follows:
Initially, interactive use is confined (transparently) on cosmos or
universe to the first 12 cpus, which are shared by all active users;
there are special interactive queues designed to handle larger
development/analysis jobs. See below.
To submit a batch job, job submission scripts (jobscripts, for short) are used. Jobscripts contain the commands necessary request system resources and to start the job inside a simple file, called e.g. myjob.sub. Then, the job can be submitted for run, using this command:
msub myjob.sub
After submission the job status can be monitored via the command showq.
Jobs can be killed before or during execution via the command canceljob.
See below for the more detailed examples of job submission.
Obtain your user id and register the local machine from which you wish to connect with the Cambridge administration.
Issue the command:
ssh userid'at'cosmos.damtp.cam.ac.uk
Enter your password when prompted.
X clients running on COSMOS will normally display without
any setenv DISPLAY, xhost or
xauth preliminaries.
DISPLAY at the remote end can
still be set in the usual way if desired, in which case the X
connections will be directed along normal, insecure channels. Only in
the case of old DGL applications on COSMOS such as buttonfly
(which are not pure X applications) might this be desirable.
Most sites nowadays have some form of SSH installed centrally. The current implementation on COSMOS will operate with most of these.
A suite of free SSH tools for Win32 platforms (Win95 and later) is also available.
Most simply, one can run programs straightforwardly on the command line of a login node. Note that provided you have an X server on your local machine, and you enable X-forwarding in your SSH connection (e.g. the -X or -Y options to ssh), then X-windows applications launched on a login node should display on your screen.
The login nodes are similar in terms of hardware to the batch
compute nodes. It is possible nevertheless to run small MPI
jobs on the login nodes for testing purposes using shared
memory. However, the login nodes are finite, shared resources and any
such use must respect other users. In particular, parallel jobs must
be short (i.e. minutes), use no more than 2-4 cores and up to 2GB
of memory per core each, and should be niced (prefixed with
nice -19) so as not to impact interactive
responsiveness.
If you find that you need to make such runs more often than occasionally, or for longer periods, then it may be more appropriate to employ the batch-interactive use described below - antisocial monopolisation of a login node will probably receive harsh treatment from the system administrators.
Please note that interactive use is not appropriate for production code runs which should be performed via the batch queues.Please take care to set the environment variable OMP_NUM_THREADS to an appropriately small value when testing OpenMP jobs interactively (otherwise it is possible to launch enough worker threads to fill the entire system on only 8 cpus), and don't use dplace (to facilitate sharing). To set OMP_NUM_THREADS to N, perform one of the commands below, according to which shell you use:
$ export OMP_NUM_THREADS=N |
| Queue | Access |
Min CPUs |
Default CPUS |
Max CPUs |
Max memory |
Max real time |
Max CPUs |
|---|---|---|---|---|---|---|---|
| per job |
per job |
per job |
per job |
per job |
per queue |
||
| super |
restricted* |
32 |
64 |
128 |
384GB |
8hr |
144 |
| large |
all |
16 |
16 |
64 |
256GB |
8hr |
144 |
| small |
all |
1 |
4 |
16 |
96GB |
8hr |
144 |
| express |
all |
1 |
4 |
16 |
128GB |
2hr |
148 |
* For access to the super queue, please contact cosmos_sys with details of your job requirements
Notes
showq command:
showq
canceljob jobidBy default, this sends SIGTERM and then after a short delay SIGKILL.
More generally, to send, e.g., SIGKILL direct to jobid, do either:
quota -v
Files which are no longer needed for local processing must be deleted, or moved off the facility - please contact cosmos_sys for advice on how to do this for large amounts of data. All users must take care that their codes only output necessary data and use compression if appropriate to reduce the size of output files (e.g. using gzip, or bzip2).
Data unrelated to code running on COSMOS must not be stored on its filesystems.
NOTE
This section details the guidelines by which COSMOS users have agreed to abide. There are two distinct categories of user guidelines: (i) those required by the University of Cambridge and (ii) those specified by the CCC consortium and the COSMOS team. The guidelines presume that all users will cooperate in sharing this resource efficiently and politely.
By logging onto COSMOS, users automatically agree to abide by these rules and guidelines. The user application form assumes that they have been read and understood.
Note that the security of the facility rests primarily on the security of user passwords. Do NOT write your COSMOS password on a postit and stick it to your screen! If you access the facility from X windows, Do NOT run the command xhost + - the last time we were hacked, this is how it happened. (The user's home institution was hacked first, incidentally.)
The large temporary partition on COSMOS is available to all users and project members. Consequently, they must not exceed their fair share of disk space and so interfere with the work of others.
There is a well-defined mechanism by which to request help from the COSMOS team if you are experiencing difficulties - see Getting help.
The period 10:00-14:00 Wednesdays is reserved as a regular planned maintenance (PM) period. This is used to undertake essential system work such as reconfiguration, software updates, engineering visits etc. Although the system may remain available during this period, service may be more vulnerable than usual. Note however that this situation will be advertised, and if downtime is anticipated this will be flagged by email and posted on the news page, as far in advance as possible.
Whenever possible disruption to service following from planned site work, e.g. to the cooling plant, will be coordinated so as to coincide with the weekly PM. This may not be possible in all cases, however, and may lead to periods of "special maintenance". Similarly, the availability of hardware or engineers may lead to a PM outside the regular weekly slot.
We regret that we cannot guarantee to monitor service, or to fix problems, outside normal office hours. However the support team will do what it can so please email cosmos_sys if you become aware of a serious issue out of hours in the normal way (see Getting help).
In the case of an emergency, e.g. a serious environmental issue, a system failure potentially threatening data integrity, equipment or safety of personnel, we reserve the right to withdraw service without notice.
Please follow the procedures below according to the category of problem.
If serious problems occur outside office hours, regrettably we cannot guarantee to deal with them before the next working day. However, we are on the SGI critical list for system support, so we receive high priority treatment and we enjoy close links with SGI engineers.
Suggestions for improvements (e.g. new software or web page changes/corrections) are welcome, again please email cosmos_help.