Disk Space in DAMTP
This is a brief guide to disk spaces in DAMTP
Contents
- How much disk space are you using?
- Using cleanup and file compression
- Tape backups
- Data, Scratch and temporary space
- NOBACKUP directories
How much disk space are you using?
For most users the graphical quota monitor applet or the command
quota to in the quota
info is usually sufficient.
To obtain a list of disk space used by each directory use:
du -sk * .??*
The figure reported by du -sk is in KBytes. See the quota info page for more examples of using
du
There are no strict rules about the amount of Data or Scratch disk space usage on the public workstations - you will normally be able to use as much space as you need for your work. However, if you need more than 100 GBytes of long-term storage you should discuss your needs with the Computer Officers.
If you need very large amounts of space, i.e. many hundereds of GBytes, you should seek funding to provide the necessary resources within your own research group.
You can find out how much free space remains on the disk where files are
stored using the df command. This is usually pointless since it
wil be shared by so many other users.
Please check your disk usage regularly. We impose disk quota limit on home directories but not on Data or Scratch spaces.
Note that if a disk becomes full, all users with files and directories on that disk may be unable to continue working on their files.
If a disk fills the COs may take emergency action to free up disk space, either by deleting any files which are growing without limit (usually output from a faulty user program), or by deleting very large files belonging to anyone found to be using more than their fair share of the disk.
Using cleanup and file compression
Please ensure that you delete all files which are not actually needed, especially old backup files, files which can be easily recreated (e.g. executable files, dvi output from TeX) or files which are available from Internet archives. You should also compress very large files, especially text files which often compress by a factor of over 90%.
The cleanup command is provided to perform some of these tasks
for you - please run it at least once a week. Running cleanup regularly in
your home directory also helps keep you under quota. cleanup does
the following:
- delete *~ .*~ #*# *.o
- delete .dvi .log .aux files when the .tex file exists
- delete .bak files when the master file exists
- delete core and a.out files (even if compressed) not read for over 21 days
- compress core and a.out files not read for over 2 days
- compress all files over 100K which have not been used (read, written or run) for over 21 days
cleanup -i will cause cleanup to prompt before deleting any
files, -X turns off file compression, -O prevents
deletion of .o files, and -c turns off deletion of old core and
a.out files.
If no directory name is specified then cleanup will look at all
your files. As it runs it outputs the names and sizes of all files it deletes
and compresses.
The compression program used is bzip2 -v9 which is very good
for most text files. It does not work very well with binary files, but it may
be worth trying.
The bzip2 appends .bz2 to the name of a compressed
file. The -v flag produces statistics on the amount of compression
achieved. The -9 flag uses the highest compression level (lower
levels takes less time).
The bunzip2 command is used to uncompress a .bz2 file (removing
the .bz2 suffix in the process). It can also be used in a pipe to provide input
for another process without uncompressing the file on disk:
bunzip2 < filename.bz2 | myprog
Or you can just use bzcat for that purpose, e.g.:
bzcat filename.bz2 | myprog
Tape backups
All important user files (not Data, Scratch or NOBACKUP directories, see below) are regularly backed up into snapshots held on a machine at the far end of the site.
This typically holds quite a few snapshots. Weekly the latest snapshots are written to tape, currently an LTO-3 Tape robot. Because of the enormous amount of information involved (up to 1.2 TBytes), each user's files are completely backed up to tape only about once a week. A tape backup is kept for about 10 months before the tape is overwritten.
More information about backups can be found in the DAMTP Backups and Archives pages.
Data, Scratch and temporary space
Less valuable sata can be stored in what is known as Data and
Scratch spaces e.g. /data/sub/,
/data/hostname, /scratch/hostname
etc for some machines (both public and group owned). For a list of computers
which have scratch space look in the file /opt/damtp/info/diskspace. For more detailed information
about scratch space and how to access different Data and Scratch spaces from
different computers check the Data and Scratch space
instructions.
UNIX/Linux systems also have /tmp and /var/tmp
(tmp is pronounced tump) directories which can be used by anyone to store small
amounts of temporary data. Data in /tmp is considered temporary and will get
deleted from time to time without warning.
Many standard utilities store temporary working files in these
directories. To ensure that a filename is unique, programs may use something
like mktemp to generate a unique name (see
man mktemp). You can create such filenames using for
example:
TEMPNAME=$(mktemp /tmp/mytempXXXXXX)
myproc > $TEMPNAME
As so many standard programs also use /tmp you must be aware that using a high proportion of /tmp file space can lead to UNIX/Linux programs not working correctly (for you and for others) and therefore we recommend that you use Data or Scratch spaces (see above) for storage of large volumes of data.
You will usually find a significant performance improvement writing files into a temporary directory or Data or Scratch space which is local to the machine where code is running. This is because these will only use the local hard disk, rather than having to transfer the data across the (slow) network from another host or server.
NOBACKUP directories
Within the constraints of quotas disk space is no longer at a premium and users may feel justified in keeping fairly large temporary datasets in their home directories rather than in the scratch or data areas.
For example, a user might transfer large datasets of experimental data from an archive for ease of processing (assuming they have enough quota). This may not cause problems for other users, but will cause problems with the backup system if the backup space (or tapes) fill unexpectedly.
Such data need not be backed up locally as it can always be retrieved from the archive (where it is backed up). The solution is to store the data in a directory called NOBACKUP (upper case) anywhere in your home directory.
If you prefer to retain an existing directory structure for your files you can create soft links to files in the NOBACKUP directory:
mv big.temp.file ~/NOBACKUP
ln -s ~/NOBACKUP/big.temp.file big.temp.file
Please don't use hard links as the file may then be backed up (possibly many times).