Details of DAMTP File Backups and Archiving
All home directories are backed up more than once per day. Some files in home directories are not backed at all, such as all files in the NOBACKUP directory. We do not back up /scratch/ or /data/ files or those held in the various /tmp/ directories etc.
The primary reason for keeping these backups is so that data can be recovered in the event of a disaster or catastophic event (e.g. several disks failing simultaneously or a fire). That said we can often recover accidently deleted or damaged files.
We have two forms of backups to disk:
- One which is just simply a frequent/regular copy of the home directory files done 3 times per day (starting at the hours 05, 13 and 21). These backup copies can now be accessed directly by users, see the Recovery of Deleted File(s) page for more details.
- The other in which we take snapshots, ie multiple copies of the trees (sharing data where possible). We currently have allocated about 3TB for the snapshots of DAMTP home directories. This snapshot facility is shared with DPMMS and the Statslab and is also used to hold copies of important system files to aid in disaster recovery. Once a week the latest snapshot is also written to tape for longer term storage. These backup tapes are currently reused after about 10 months.
Backup Exclusions
Currently for most DAMTP file-systems we exclude from the snapshots files which match the following patterns:
NOBACKUP .Trash **/.mozilla/**/Cache* **/Library/Caches* urlclassifier*.sqlite XUL.mfasl XPC.mfasl **/.java/deployment/cache .Spotlight-V100 .thumbnails **/.adobe/Acrobat/*/Cache **/.adobe/Acrobat/*/Temp .fontconfig .fonts.cache* **/.mcop/trader-cache **/.local/share/Trash **/Library/Logs **/.TeXmacs/system/cache **/.openoffice.org**/user/uno_packages/cache **/.openoffice.org**/user/registry/cache **/.kde/share/cache .jpi_cache **/.Mathematica*/FrontEnd/*_Caches **/.netscape/*cache* **/.opera/cache* **/Library/Preferences/*Cache **/Library/Mail/IMAP*@* *.sqlite-journal **/evolution/cache **/.macromedia/**/#SharedObjects **/.adobe/Flash_Player/AssetCache **/.googleearth/Temp **/.googleearth/Cache **/evolution/mail/imap **/.thunderbird/**/ImapMail/** **/.thunderbird/**/Cache **/Thunderbird/**/ImapMail/** **/Thunderbird/**/Cache .cache .mcrCache*
These are rsync patterns so XUL.mfasl as a leafname is excluded
anywhere while **/.mozilla/**/Cache* excludes anything called Cache* at least 2
levels below a directory called .mozilla/ (in any directory).
The excludes are intended to avoid us needing to back up files which are not considered important, e.g. those which are simply caches or forms of content which can easily be regenerated if lost - e.g. the various IMAP directories listed are caches of indexing material (etc) held on the relevant remote imap server.
Overriding the exclude pattersn we do backup files matching the following patterns:
**/.thunderbird/**/ImapMail/**/ **/.thunderbird/**/ImapMail/**.dat **/Thunderbird/**/ImapMail/**/ **/Thunderbird/**/ImapMail/**.dat
Which are used for storing various pieces of Thunderbird state despite being in locations which appear to be local caches of remote data.
If you believe that we are accidentally excluding other material which is actually valuable please let us know.
See the rsync manual if you want to see why ** is used not *.
Archive
Backup archives are kept going back at least 3 months for home directories. Therefore if something is accidentally deleted during the last 3 months we should be able to retrieve it for you.
What about big data files?
Large data files should be stored in Data or Scratch spaces which are not backed up.
So how many copies do we keep?
We arrange to make 'snapshots' of the contents of home directories roughly once every 11-12 hours (more frequently for secretarial directories). The snapshots are held on a server in a different location than the main server (for safety), and are pruned as they get older so after a few days we only store one per day, then one per week, one per 30days and then one per 90days.
The actual number we can afford to keep may change but currently for most users we are keeping about 101 snapshots arranged as:
45x11h, 30x26h, 22x7d, 3x30d, 1x90d
so after ~20 days of roughly 11-hourly snapshots we then keep copies spaced at about 26 hours for another 30 days and then ones spaced at 7 day intervals for another 22 weeks etc.
In addition once a week (at the weekend) we take the latest snapshots and write them to tape (using Bacula) which are held for about 10 months. Recovery of files from tape is much slower so should only be requested if there is no other option.
Some of my files don't need backing up
If you know that a file or files don't need backing up you can signal to the backup systems that this is the case. Doing this speeds up the backups of important files and reduces the possibilities that the backups will not have enough space to hold much more important files (e.g. your papers!).
As can be seen from the exclude patterns above any directory called
NOBACKUP is ignored by the snapshot backups. All files or
directories under it are not backed up by either the main or incremental
backups.
Thus if you have a set of files which can be re-constructed by running code
or by fetching from the original site or you have backed up yourself on a known
safe media, you can arrange for them to be stored under a NOBACKUP
directory.
Clearly you should not do this for any file which would be a problem if it is lost.
Is this true for all machines?
Some groups may do their own backups either in addition to those run by the department or instead of them. If in doubt ask the people who look after your group machines.