Disk Space in DAMTP

This is a brief guide to disk spaces in DAMTP

Contents

How much disk space are you using?
Using cleanup and file compression
Tape backups
Data, Scratch and temporary space
NOBACKUP directories

How much disk space are you using?

For most users the graphical quota monitor applet or the command quota to in the quota info is usually sufficient.

To obtain a list of disk space used by each directory use:

  du -sk * .??*

The figure reported by du -sk is in KBytes. See the quota info page for more examples of using du

There are no strict rules about the amount of Data or Scratch disk space usage on the public workstations - you will normally be able to use as much space as you need for your work. However, if you need more than 100 GBytes of long-term storage you should discuss your needs with the Computer Officers.

If you need very large amounts of space, i.e. many hundereds of GBytes, you should seek funding to provide the necessary resources within your own research group.

You can find out how much free space remains on the disk where files are stored using the df command. This is usually pointless since it wil be shared by so many other users.

Please check your disk usage regularly. We impose disk quota limit on home directories but not on Data or Scratch spaces.

Note that if a disk becomes full, all users with files and directories on that disk may be unable to continue working on their files.

If a disk fills the COs may take emergency action to free up disk space, either by deleting any files which are growing without limit (usually output from a faulty user program), or by deleting very large files belonging to anyone found to be using more than their fair share of the disk.

Using cleanup and file compression

Please ensure that you delete all files which are not actually needed, especially old backup files, files which can be easily recreated (e.g. executable files, dvi output from TeX) or files which are available from Internet archives. You should also compress very large files, especially text files which often compress by a factor of over 90%.

The cleanup command is provided to perform some of these tasks for you - please run it at least once a week. Running cleanup regularly in your home directory also helps keep you under quota. cleanup does the following:

  • delete *~ .*~ #*# *.o
  • delete .dvi .log .aux files when the .tex file exists
  • delete .bak files when the master file exists
  • delete core and a.out files (even if compressed) not read for over 21 days
  • compress core and a.out files not read for over 2 days
  • compress all files over 100K which have not been used (read, written or run) for over 21 days

cleanup -i will cause cleanup to prompt before deleting any files, -X turns off file compression, -O prevents deletion of .o files, and -c turns off deletion of old core and a.out files.

If no directory name is specified then cleanup will look at all your files. As it runs it outputs the names and sizes of all files it deletes and compresses.

The compression program used is bzip2 -v9 which is very good for most text files. It does not work very well with binary files, but it may be worth trying.

The bzip2 appends .bz2 to the name of a compressed file. The -v flag produces statistics on the amount of compression achieved. The -9 flag uses the highest compression level (lower levels takes less time).

The bunzip2 command is used to uncompress a .bz2 file (removing the .bz2 suffix in the process). It can also be used in a pipe to provide input for another process without uncompressing the file on disk:

      bunzip2 < filename.bz2 | myprog

Or you can just use bzcat for that purpose, e.g.:

      bzcat filename.bz2 | myprog

Tape backups

All important user files (not Data, Scratch or NOBACKUP directories, see below) are regularly backed up into snapshots held on a machine at the far end of the site.

This typically holds quite a few snapshots. Weekly the latest snapshots are written to tape, currently an LTO-3 Tape robot. Because of the enormous amount of information involved (up to 1.2 TBytes), each user's files are completely backed up to tape only about once a week. A tape backup is kept for about 10 months before the tape is overwritten.

More information about backups can be found in the DAMTP Backups and Archives pages.

Data, Scratch and temporary space

Less valuable sata can be stored in what is known as Data and Scratch spaces e.g. /data/sub/, /data/hostname, /scratch/hostname etc for some machines (both public and group owned). For a list of computers which have scratch space look in the file /opt/damtp/info/diskspace. For more detailed information about scratch space and how to access different Data and Scratch spaces from different computers check the Data and Scratch space instructions.

UNIX/Linux systems also have /tmp and /var/tmp (tmp is pronounced tump) directories which can be used by anyone to store small amounts of temporary data. Data in /tmp is considered temporary and will get deleted from time to time without warning.

Many standard utilities store temporary working files in these directories. To ensure that a filename is unique, programs may use something like mktemp to generate a unique name (see man mktemp). You can create such filenames using for example:

      TEMPNAME=$(mktemp /tmp/mytempXXXXXX)
      myproc > $TEMPNAME

As so many standard programs also use /tmp you must be aware that using a high proportion of /tmp file space can lead to UNIX/Linux programs not working correctly (for you and for others) and therefore we recommend that you use Data or Scratch spaces (see above) for storage of large volumes of data.

You will usually find a significant performance improvement writing files into a temporary directory or Data or Scratch space which is local to the machine where code is running. This is because these will only use the local hard disk, rather than having to transfer the data across the (slow) network from another host or server.

NOBACKUP directories

Within the constraints of quotas disk space is no longer at a premium and users may feel justified in keeping fairly large temporary datasets in their home directories rather than in the scratch or data areas.

For example, a user might transfer large datasets of experimental data from an archive for ease of processing (assuming they have enough quota). This may not cause problems for other users, but will cause problems with the backup system if the backup space (or tapes) fill unexpectedly.

Such data need not be backed up locally as it can always be retrieved from the archive (where it is backed up). The solution is to store the data in a directory called NOBACKUP (upper case) anywhere in your home directory.

If you prefer to retain an existing directory structure for your files you can create soft links to files in the NOBACKUP directory:

      mv big.temp.file ~/NOBACKUP
      ln -s ~/NOBACKUP/big.temp.file big.temp.file 

Please don't use hard links as the file may then be backed up (possibly many times).