Department of Applied Mathematics and Theoretical Physics

The Faculty of Mathematics seeks to recruit a High Performance Computing (HPC) Manager & Research Software Engineering (RSE) Project Manager to start as soon as possible.

The role holder is expected to provide oversight of the Faculty of Mathematics Computing Development Platform, an HPC and data analytics supercomputer facility, together with Research Software Engineers supporting related programming efforts within Faculty research groups.

The role will be responsible for strategic design and enhancement of the Computing Development Platform and will provide second-tier support to research staff using the system, including would be expected to take technical responsibility for the Faculty of Mathematics' HPC training resource. This encompasses the design, tier-two support and service improvement by the identification and the addressing of fundamental, systemic issues of services failure. It also includes the management and resolution of any service disruption. The role holder will take responsibility for the development and on-going operation of services for HPC application development, delivery and training; and high performance computing including design, specification, procurement, commissioning and support.

The role will provide senior level expertise to oversee the employment and coordination of Research Software Engineers based in individual research groups. The role facilitates RSE teamwork in the areas of testing, profiling and improving the performance of parallel code in preparation for production runs on external HPC facilities, using in-depth optimisation and tuning of code submitted by users and improvement of parallel scaling characteristics. The role advises Faculty members on grant applications for RSE support and hardware procurement and supports them by offering RSE-based assistance.

The role holder is expected to have experience of programming in C, C++, Fortran 90 and Python, using scripting languages such as Bash and Perl, and of parallel-programming using OpenMP and MPI. They will have a track record of administering and integrating Linux operating systems in a research environment with experience of sustainable configuration management and automation, and of configuring and managing Linux HPC clusters and massive SMP systems, including the management of queuing systems such as Moab/Torque and Slurm). Their experience of software development will include standard software engineering practices such as source control systems and more advanced techniques of compilation, optimisation and installation methods on a variety of scientific HPC applications. Knowledge of storage sub-systems, co-processors (such as the Xeon Phi) and accelerators (such as GPU) would be desirable.

The role holder will need to become proficient in a variety of techniques related to developing code for HPC systems: benchmarking, debugging, profiling and optimisation including vectorization of parallel applications (including KNL and large shared-memory systems). They will be expected to have experience of visualisation techniques for massive parallel computations; of programming on HPC architectures with coprocessors and accelerators, local offload in situ and offload-over-fabric; and of managing CXFS filesystems.

Limited funding: The funds for this post are available for 3 years in the first instance.

Please quote reference LE25714 on your application and in any correspondence about this vacancy.

Mar 14th 2021

