Match: Format: Sort by:
Search:

Introduction - tardis sshd for the laptop data backup service

This document is about the setup being used on temporal-lobe (and maybe other boxes later) to run as tardis.damtp.cam.ac.uk for the laptop data backup service.

The server consists of 2 bits:

  1. an sshd running on an extra IP address (tardis in our case) and with a somewhat special config which uses the forcecommand option to avoid the user getting direct shell access
  2. the script that this runs which handles adding/replacing ssh keys and running rsync with the extra options to make the snapshots

Currently we build a different (newer) version of openssh for the sshd since we were using features which were not in the standard rhel/sl version, and have a package which contains the ssh start/stop script and basic config.

[ If we have several boxes e.g. holding files for different users, then we need them to all accept the same keys and probably run with a single set of shared ssh keys themselves, we also need a way for the users to tell which one they ought to connect to though we can arrange (by magic) for it to work to any - though not as efficiently as if they connect to the box where their storage is local. ]

This document is intended to document the setup and allow the service to be easily re-installed or moved to different hardware as needed.

For more details of servers in DAMTP see the servers page.

For information about the sl5-ssh service see sl5-ssh. For information about the system rsync snapshots see snapshots. This system is based on parts of both of those (with extra stuff of course).

Currently the master copies of the bits for making the RPMs (and wikified notes!) live in: invent/Building-info/backups/sshd-virt-tardis/ while the laptopwrap script lives in ~invent/Building-info/backups/laptopwrap (also the sample scripts for unix users live in ~invent/Building-info/backups/tardis/

Preparation

A pair of simple rpm packages were built (with rpmbuild as usual), signed and inserted into our repos (details of that elsewhere), so that all sl5 machines can get access to it as normal via yum.

There are two rpm packages in this one to provide a replacement sshd binary updated to a version which supports the features we need and the other contains the init-script and sshd-config needed to start the tardis service.

The tardis-openssh package is based on the Fedora openssh SRPM simplified a bit since we don't need some of the extra bits that they build, and with a prefix setting so that it won't touch any of the files used by the main system openssh packages (so they can be updated as normal as/when sl make updates). Currently it is only built for x86_64 hardware but has been added into the central repos.

The sshd-virt-tardis package contains the magic init-script and config files needed to start up the extra sshd and cause the tardis-wrapper code to be run when a client makes rsync connections.

It is an rpm partly so we can make use of the rpm scripts and triggers mechanism - to cause the service to be restarted if the package itself (for the config say), or openssh is updated etc, it also ensures that we have properly documented all the steps needed to install the files etc and allows for fairly easy installation on hosts.

The sshd-virt-tardis package is a noarch rpm and contains just two files, an init-script and the modified sshd_config we use. It requires that a host-specific config file and the (potentially shared) ssh-keys be installed on each machine running the service (currently only temporal-lobe).

 131.111.17.221 is the tardis address used on temporal-lobe

Note that if/when we install this on more than one machine DNS changes will be needed and also the software (at both client and server ends) would need to be changed to understand how data is being split over the servers. Unlike the ssh service the servers cannot really be treated as independent devices since data copied to the service must be accessible to the user - and for best performance they ought to transfer the files to the right place, also for the --fake-super magic to work it needs local access in order to be able to access the xattrs for each file.

One possible mechanism would be that the client would connect to tardis.damtp and present the username+tag and be given a specific server to re-connect to (which will always hold their data).

e.g. simplifying a bit a client might do:

  ssh $user@tardis.damtp.cam.ac.uk getserver $user $tag

and get back a string like:

  tardis-03.damtp.cam.ac.uk

the client would then need to connect to that name (with the same credentials etc) to store/retrieve their data. Each user would need to be assigned to a specific server so they always see the right file-system.

If we move/split a server we would then need to update the mapping after copying their existing backups.

If performance isn't a big problem then each tardis server could simply forward the request to the right server for this user but obviously that would involve the data passing through two hosts - but would avoid the need for any extra client support.

Mechanism

Just like the ssh.damtp.cam.ac.uk setup we arrange to run a second sshd on each host running as the service-name(s), listening on the service IP addresses.

Just like the sl5 version of the ssh setup we avoid needing to change any existing system sshd/sshd_config files - since they might be overwritten by an update to a package.

To avoid problems with hosts not in the DNS being denied access you may want to add:

sshd-virt-tardis: ALL

to the /etc/hosts.allow or libwrap may deny access - it did for us until we fixed it!

In this case the extra sshd is running with a ForceCommand option so rather than attempting to run the command(s) that rsync (etc) supply it arranges to call rsync with options suitable for making the backup copy - and making hard-links where possible.

Thus a client simply does (or the windows equivalent) of:

SSH='ssh -x2 -c arcfour'
TAG=mylaptop3
LDIR=/local/scratch/public/tardis-tests
rsync -aSHR -e "$SSH" -z --omit-dir-times $LDIR tardis:$TAG

and on the server that will run the tardis-wrapper which works out the right directory for this user+tag looks for the newest of the previous backups and runs rsync to a fresh directory with --link-desk set to the previous one. In this case (for me in one case) the code ends up changing directory to /local/tardis/laptops/jp107/mylaptop3 and then running:

/usr/local/bin/rsync --server -lOHogDtprRSze.is --chmod Du+rwX --fake-super --link-dest=/local/tardis/laptops/jp107/mylaptop3/2009-03-10--19:39:54/ . .TmpSnap/

and afterwards renames the .TmpSnap to 2009-01-24--02:11:41, and marks the oldest backup (if we have enough) to be deleted. Some example output with lots of debugging is:

$ rsync -aSHR -e "$SSH" -z --omit-dir-times $LDIR tardis:$TAG
PRI(4):Thu Mar 12 18:53:51 2009: Running as jp107 backupsdir=/local/tardis/laptops/jp107 tag=mylaptop3, cmd=rsync --server -lOHogDtprRSze.is . mylaptop3
PRI(4):Thu Mar 12 18:53:51 2009: cmd=rsync --server -lOHogDtprRSze.is . mylaptop3 argv=-1
PRI(0):Thu Mar 12 18:53:51 2009: rsync being called with some options we expect cmdopts=--server -lOHogDtprRSze.is tag=mylaptop3
PRI(5):Thu Mar 12 18:53:51 2009: Using /local/tardis/laptops/jp107/mylaptop3 for backups of mylaptop3 (for jp107)
PRI(4):Thu Mar 12 18:53:51 2009: Using link-dest from 2009-03-12--18:53:15 for 2009-03-12--18:53:51
PRI(5):Thu Mar 12 18:53:51 2009: rsync command will be: /usr/local/bin/rsync --server -lOHogDtprRSze.is --chmod Du+rwX --fake-super --link-dest=/local/tardis/laptops/jp107/mylaptop3/2009-03-12--18:53:15/ . .TmpSnap/
PRI(5):Thu Mar 12 18:53:51 2009: Renamed temporary tree to 2009-03-12--18:53:51
PRI(5):Thu Mar 12 18:53:51 2009: Symlink latest at 2009-03-12--18:53:51
PRI(4):Thu Mar 12 18:53:51 2009: Skipping 2009-03-12--18:53:51 it is latest

The other features of the tardis-wrapper are:

At the moment the code allows (for each user+tag) a fixed number of backups (set to 30 during testing but who knows what we should set it to in production), and it simply removes the oldest when the limit is reached. We also have a version marked 'LastKnownGood' which will never be removed.

Currently (during testing) the deletion isn't actually done, it just renames into a Zapped/ dirctory which can be cleaned up later. We don't want to force the user to stay connected just to wait for the cleanup - so that should probably just trigger a tidy-up to happen after they disconnect. Depending on how many people end up connecting at the same time it might be that the deletions will cause a significant load.

We currently set a quota for users of 50G but that isn't actually enforced but can be just by altering the mount options (see the fstab entry). There is a script to add the users to /etc/access.tardis (prepending access.tardis.local), and set up quotas on /local/tardis for any users who don't have limits already. This is currently run from a daily cron job.

Here are a few sample commands being run from a linux box showing the debugging (which is sent to stderr):

$ $SSH tardis.damtp.cam.ac.uk quota
PRI(4):Thu Mar 12 19:12:01 2009: Running as jp107 backupsdir=/local/tardis/laptops/jp107 tag=quota, cmd=quota
2851796 20000000
PRI(4):Thu Mar 12 19:12:01 2009: cmd=quota argv=-1

$ $SSH tardis.damtp.cam.ac.uk list 
PRI(4):Thu Mar 12 19:12:44 2009: Running as jp107 backupsdir=/local/tardis/laptops/jp107 tag=list, cmd=list
JSPHP-backup
JSPHP-tard
mylaptop2
mylaptop3
mytardis1
PRI(4):Thu Mar 12 19:12:44 2009: cmd=list argv=-1

$ $SSH tardis.damtp.cam.ac.uk list mylaptop3
PRI(4):Thu Mar 12 19:13:00 2009: Running as jp107 backupsdir=/local/tardis/laptops/jp107 tag=mylaptop3, cmd=list mylaptop3
mylaptop3/2009-03-10--17:38:07
mylaptop3/2009-03-10--17:38:18
mylaptop3/2009-03-10--17:38:55
mylaptop3/2009-03-10--19:39:54
mylaptop3/2009-03-12--18:52:13
mylaptop3/2009-03-12--18:53:15
mylaptop3/2009-03-12--18:53:51
mylaptop3/LastKnownGood  ->  2009-03-10--17:38:07
mylaptop3/latest  ->  2009-03-12--18:53:51
PRI(4):Thu Mar 12 19:13:00 2009: cmd=list mylaptop3 argv=-1

$ $SSH tardis.damtp.cam.ac.uk showpath mylaptop3
PRI(4):Thu Mar 12 19:21:43 2009: Running as jp107 backupsdir=/local/tardis/laptops/jp107 tag=mylaptop3, cmd=showpath mylaptop3
/local/tardis/laptops/jp107/mylaptop3
PRI(4):Thu Mar 12 19:21:43 2009: cmd=showpath mylaptop3 argv=-1

$ $SSH tardis.damtp.cam.ac.uk quota          
PRI(4):Thu Mar 12 19:21:53 2009: Running as jp107 backupsdir=/local/tardis/laptops/jp107 tag=quota, cmd=quota
2851796 20000000
PRI(4):Thu Mar 12 19:21:53 2009: cmd=quota argv=-1

$ $SSH tardis.damtp.cam.ac.uk showpath mylaptop3
PRI(4):Thu Mar 12 19:22:03 2009: Running as jp107 backupsdir=/local/tardis/laptops/jp107 tag=mylaptop3, cmd=showpath mylaptop3
/local/tardis/laptops/jp107/mylaptop3
PRI(4):Thu Mar 12 19:22:03 2009: cmd=showpath mylaptop3 argv=-1
-bash:~:jp107 929$ $SSH tardis.damtp.cam.ac.uk quota mylaptop3
PRI(4):Thu Mar 12 19:22:07 2009: Running as jp107 backupsdir=/local/tardis/laptops/jp107 tag=mylaptop3, cmd=quota mylaptop3
2851796 20000000
LastKnownGood -> 2009-03-10--17:38:07
PRI(4):Thu Mar 12 19:22:07 2009: cmd=quota mylaptop3 argv=-1

$ sftp tardis.damtp.cam.ac.uk
Connecting to tardis.damtp.cam.ac.uk...
sftp> ls
.
..
JSPHP-backup
JSPHP-tard
mylaptop2
mylaptop3
mytardis1
sftp> pwd
Remote working directory: /local/tardis/laptops/jp107
sftp> 
sftp> cd mylaptop3
sftp> ls
.
..
2009-03-10--17:38:07
2009-03-10--17:38:18
2009-03-10--17:38:55
2009-03-10--19:39:54
2009-03-12--18:52:13
2009-03-12--18:53:15
2009-03-12--18:53:51
LastKnownGood
latest

If you wonder why the quota command also optionally shows the LastKnownGood it is to avoid the Windows client needing to make a second ssh connection just to get that info and display it.

Installation

set up file system

After installing the machine we need to create a large store for the data, we are using XFS for this because it seems better suited to handling large volumes of data (than ext3). To use XFS we first need to pull in the xfs-filesystem and xfsprogs rpms (available in our standard sl repos). The former pulls in the xfs kernel modules and the latter the tools like mkfs.xfs and xfs_info etc.

$ ssh temporal-lobe df -hlP /local/tardis/
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/TempLobeSys00-tardis  4.0T  5.5G  4.0T   1% /local/tardis

so about 4TB should be available. That was created by making a large LV and then running mkfs.xfs on it thus:

mkfs.xfs -d su=64k,sw=6 -i attr=2 -l internal,version=2  /dev/TempLobeSys00/tardis

See the mkfs.xfs man page for more details of the options.

$ xfs_info /dev/TempLobeSys00/tardis
meta-data=/dev/TempLobeSys00/tardis isize=256    agcount=32, agsize=32770048 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=1048641536, imaxpct=25
         =                       sunit=16     swidth=96 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=2
         =                       sectsz=512   sunit=16 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0

The current fstab line is (broken here to avoid wrapping):

$ grep tardis /etc/fstab
/dev/TempLobeSys00/tardis       /local/tardis   xfs \
 defaults,nosuid,nodev,uqnoenforce,gqnoenforce,noatime,context=user_u:object_r:file_t:s0 \
 1 2

Note that this is on the same VG as the boot system - because we have all disks on the PERC/6i raid controller so they are all in a single RAID-5 set.

install tardis rpm

On each machine, log in become root; to refresh the yum cache of repo-data:

 yum makecache

install the rpm:

 yum install sshd-virt-tardis

this should also pull in the required tardis-openssh package. At this point you can check that the new service now exists, e.g.

 $ service sshd-virt-tardis status
 Status of sshd tardis: No pidfile found

That error is quite normal since there is no pidfile yet!

add ssh-keys and ipaddr

To make it live we need to first copy over the ssh keys and then tell it the IP address and device to use.

Installing the tardis (possibly shared) ssh keys:

 cd /etc/sshd-virt-tardis/
 rsync cingulum:/opt/ssh-secrets/Virtual/ssh-tardis.tar ./ssh-tardis.tar
 tar -xpf ssh-tardis.tar

Specifying the IP address for this host to use:

 printf "# Use this addess on $(hostname)\nIPADDR=131.111.17.221\nDEV=eth0\n" > ipaddr

testing ssh service

Then check it starts up ok with:

 service sshd-virt-tardis start

That init-script does brief checks that the address isn't already being used by something else, if it is (e.g. due to a typo/accident), you will get a message something like:

 $ service sshd-virt-tardis start 
 ARPING 131.111.17.221 from 0.0.0.0 eth0                    [FAILED]
 Unicast reply from 131.111.17.221 [......] for 131.111.17.221 [......] 0.692ms
 Sent 1 probes (1 broadcast(s))
 Received 1 response(s)

If all is ok you just get a short delay and a normal startup message like:

 $ service sshd-virt-tardis start
 Starting sshd [tardis]                                    [  OK  ]

now we can check that it is running ok:

 $ service sshd-virt-tardis status
 Status of sshd tardis: sshd [tardis] is running (20288)

Finally check that the chkconfig entry was added ok by the rpm %post, and if so reboot just to double-check that all comes up as expected.

 $ chkconfig --list sshd-virt-tardis
 sshd-virt-tardis       0:off   1:off   2:off   3:on    4:on    5:on    6:off
 $
 $ reboot

directories and other stuff

Of course you also need to create the directory for the backups to go into and point the tardis-wrapper at it. To be explained once that is all known.

Need to document the commands to install the xfs support, create the LV create the filesystem, add it to fstab etc.

Removal

To remove the setup, just remove the package. That will cleanly shut down the service first (and take away the extra IP address), e.g. run as root:

 yum remove sshd-virt-tardis

Then one can simply put the service with that IP address on another host by following the instructions above.

One should also clean up the files in /etc/sshd-virt-tardis/ (the ssh-keys and ipaddr file) so they won't get picked up if the package is later re-installed.

Of course any existing backup data would also need to be moved as well.