Data and Scratch Spaces

The terms Data and Scratch refer to various storage areas that can hold a large amount of data. These areas are not backed up, but the data on them is not removed or deleted, unless there is a hardware/software/human fault or we are forced to wipe the disks.

The Data spaces are much more likely to survive re-installation of machines, while Scratch spaces are usually cleared during re-installation (they are usually just the remaining space on the disk used for the system install).

Classes of storage

The current DAMTP classes of storage are:

Storage class Description
Home directories Very Reliable for valuable data. Quota'd, backed up regularly (more than once a day), achived and mirrored.
Data Reliable Scratch Storage for longer term storage of data. Dedicated disks, no quotas, not backed up, unlikely to be lost/deleted unless there is a hardware fault. Some of these may be stored on reliable/redundant servers (e.g. /data/septal).
Scratch no quotas, not backed up, VERY likely to be lost when a machine is upgraded, re-installed or if there is a hardware fault. Mainly intended for short term use by jobs running on the machine where the scratch is local.
tmp no quotas, not backed up, files are removed after a period, or on reboot (depending on the system). Intended for smaller very short-lived files.

Use of Data and Scratch

It is safer to keep data for longer periods in Data or Scratch instead of using /tmp. We recommend that the type of data you keep there is data that can be regenerated e.g. the output from a program or information that you have copies of on other systems, CDs/DVDs or external (USB) storage etc.

Please do take the time to consider where the best location for your work is. In particular, please ensure all of your original work, including source code, is kept on a drive which is regularly backed up. To be clear, /data and /scratch are not backed up.

List of Data and Scratch Space - /opt/damtp/info/diskspace

You can use as much Data or Scratch space as you need, remembering that it is a shared resource. The file /opt/damtp/info/diskspace contains a list of the Data and Scratch space, what group each of the computers belongs to, how much scratch space it has and how much is unused, in the form:

  Data Store             Group           Total G  Free G
  /scratch/job           PUB               111.8    36.3
  /scratch/joel          PUB                34.0    26.6
  /data/subiculum        SYSTEM           1374.1   122.6
  */data/subiculum can also be used as /data/sub

You can use Data or Scratch spaces by e.g. change directory to a data store and make yourself a directory (for Scratch spaces usually you have to do this in the directory public/) -- most people use their login name for the directory name.

You do not need to be logged into the machine which hold the Data or Scratch space -- all DAMTP Linux/UNIX computers can access all the data stores (though access is faster locally). If you were logged into nipah you can still use narmada's scratch space by the path /scratch/nipah/public/... etc.

Files held in these directories are not deleted automatically so you should tidy up when your files are no longer needed.

Sometimes it may look as if a particular computer cannot see another computers scratch space. This happens when you (for example) use ls /scratch/ -- all you see is the scratch spaces already mounted (if any). But you can still access the Data or Scratch space if you use the complete name. This is due to the auto-mounter only having those spaces actually in use mounted at any given time.

The list above mentions which group(s) a Data or Scratch space belong to, please only use data stores from the PUB, SYSTEM or any groups that you are a member of. If you are unsure which computers you have access to use the command access-list to find out which computers you can log into.

Using Gnome to access the data and scratch spaces

To access a data or scratch space open the Computer icon on the desktop. From inside that area select File from the menu then Open Location and then type in the path to the scratch or data area and select Open.

screenshot of Open Location

type in the data area

Septal is a data space which anyone with a DAMTP account is welcome to use.

Remember to create a folder named after your CRSid to store your files and data in.

To delete any files you create, highlight the file(s) and press shift-delete. Don't use the usual "move to Trash" as this fails and causes a "Not on the same filesystem" error message window to appear.

Accessing Data Areas from DAMTP Windows Computer

You can put a soft-link in your home directory like so:

cd
ln -s /data/septal/CRSID ./septal-CRSID
where CRSID = your login ID

And then use a DAMTP Windows computer and look in your home directory (N: drive).

Frequently asked questions (with answers)

FAQ: What is scratch, the historic use of scratch in DAMTP

Originally scratch areas were the spaces left over on the disks after whatever the Operating System (OS) needed. Disks were often a little larger than the OS needed but the total space for scratch on each machines was still small by modern standards. The spaces were never really intended to be used for more than transient storage of large datasets needed by running jobs.

When machines were to be upgraded/re-installed, these small scratch areas could quickly be copied off and back again after the install was done. A few machines also had disks bought _just_ for storing scratch data, and in more recent times we have even joined/stripped disks together to make even larger volumes.

FAQ: Why did anything need to change?

As disks have grown in capacity the sizes of scratch spaces has reached a point where it can take many hours to copy the data. e.g. on a typical 100 Megabit network, a modest 250 GigaBytes of data will take over 12 hours to transfer. And that of course assumes that we have free space elsewhere to store it all, and that nothing else is trying to use the network or disks at either end at the same time. Verifying the checksums for each transfer also takes some additional time.

Spending that length of time copying data clearly reduces the rate at which we can afford to upgrade machines, so upgrading 300 over a short time (like the summer vacation), is almost impossible.

FAQ: Has anything else changed? Can you give more details?

You may recognise "data" as being very like our old definition of "scratch", but the "dedicated disks" restriction is there so that we can avoid needing to copy/restore the files during system work. Those groups needing very large amounts of reliable storage can have redundant setups to store their data.

Unlike the older "scratch" directories in general there isn't a world-writable public/ directory in /data/ spaces, the toplevel itself is world-writable. Simply make a directory for your own use in the toplevel to store your files, e.g. mkdir /data/sub/$USER

In the past all the scratch areas were accessed as /home/scratch-name (mostly /home/scratch/host name). Apparently this was confusing some people into thinking that these were "home directories" and so backed up. From now on most scratch and data areas will be accessed using paths like /scratch/name and /data/name, though some historical entries under /home will remain, and we will arrange that the old paths still work (for some time).

In general we would expect that most machines (with single disks) will be set up with only "scratch" space roughly like:

  boot-disk:   some space for OS, rest for "scratch"

Machines with more than one disk may (optionally) be set up like:

  boot-disk: some space for OS, rest for "scratch"
  non-boot-disks: "data" volume(s)

If the relative volatility of "scratch" storage is a problem on machines with only one disk then we may be able to add a small (cheap) disk to be the new boot disk, and move original disk to be "data" (though someone will need to buy the disk and there will still be a need to transfer files etc).

To aid with some of the transfers and to add useful amounts of "data" storage, the department has installed a somewhat more modern server which is exporting nearly 1.4 TBytes of (Raid-5) disk. This can already be accessed under /data/sub/ (or /data/subiculum). This machine is connected to the core Gigabit switch so should support significant rates of access from machines (most of which are only connected at 100 Megabit speeds). A couple of group's scratch areas have already been converted to "data" areas as part of the testing.

In future when planning for new machines, please consider if any (non valuable) files which will be stored on it should to be preserved over re-installs and upgrades. If so you should ensure that the machine have enough disks to be set up to provide "data" storage. If in doubt please ask us.