skip to content

Deep Learning Approaches for PDE-based Image Analysis and Beyond: From the Total Variation Flow to Medieval Paper Analysis

Abstract: Partial differential equations (PDEs) play a fundamental role in the mathematical modelling of many processes and systems in physical, biological and other sciences, as well as in engineering and computer science. Image analysis is a prime example for a field where PDEs have triggered many innovations. One such PDE that has gained considerable attention in the last few years is the total variation (TV) flow. The TV flow generates a scale-space representation of an image based on the TV functional. This gradient flow has desirable features for images such as sharp edges and enables spectral, scale, and texture analysis. Based on the solution to the TV flow, a non-linear spectral decomposition can be derived. Due to its ability to extract spectral components corresponding to objects of different size and contrast, such decompositions enable filtering, feature transfer, image fusion and other applications. However, obtaining the spectral TV decomposition involves solving multiple non-smooth optimisation problems to solve the governing PDE - the TV flow - and is therefore computationally highly intensive.

In the first part of the thesis, we present a supervised neural network approximation of the spectral TV decomposition which significantly speeds up its numerical solution. We report up to four orders of magnitude speedup in processing of mega-pixel size images, compared to classical GPU implementations of spectral TV. Our proposed network, the TVspecNET, is able to implicitly learn the underlying PDE and, despite being entirely data-driven, inherits equivariances of the model-based transform. To the best of our knowledge, this is the first approach towards learning a non-linear spectral decomposition of images. The TVspecNET, however, is designed as a supervised learning approach and in that relies on ground truth data. It is additionally constrained to produce fixed spectral bands of the image. We therefore extend the work to learn the TV flow solution in the third part of the thesis.

Learning the solution to PDEs has been a rapidly growing area at the intersection of machine learning and PDEs. The recent success of deep neural networks at various approximation tasks has motivated their use in the numerical solution of PDEs. So-called physics-informed neural networks (PINNs) and their variants have shown to be able to successfully approximate a large range of PDEs. However, before the advent of deep learning, many classical numerical methods had been developed to approximate PDE solutions on a discrete level. The finite element method (FEM), for instance, is one standard methodology to do so. So far, PINNs and FEM have mainly been studied in isolation of each other. In the second part of the thesis, we compare the methodologies in a systematic computational study. Indeed, we employ both methods to numerically solve various linear and non-linear PDEs: the Poisson equation in 1D, 2D, and 3D, the Allen-Cahn equation in 1D, and the semilinear Schrödinger equation in 1D and 2D. We then compare computational costs and approximation accuracies. In terms of solution time and accuracy, PINNs have not been able to outperform FEM in our study. In some experiments, they were faster at evaluating the solved PDE.

In the third part of the thesis, we consider the deep learning approximation of the TV flow solution. Compared to the TVspecNET that learns the entire spectral TV decomposition pipeline, this unsupervised approach is inspired by the PINN framework and is more flexible in terms of scale representation and does not require ground truth data. Computing the TV flow is challenging because the subdifferential of TV is not a singleton unless the image has no constant regions. Numerical methods amount to either modifying the gradient of the image in constant regions to make sure that the subdifferential is single-valued or an implicit scheme which requires solving multiple non-smooth optimisation problems. The first option includes FEM approaches, however, due to the gradient modifications this introduces artefacts. The second option is the classical approach to solve the TV flow. Even with state-of-the-art convex optimisation techniques, this is often prohibitively expensive and strongly motivates the use of alternative, faster approaches. Inspired by and extending the framework of PINNs, we propose the TVflowNET, an unsupervised neural network approach to approximate the solution of the TV flow given an initial image and a time instance. We require no ground truth data but rather make use of the PDE for optimisation of the network parameters. We circumvent the challenges related to the subdifferential by additionally learning the related diffusivity term. We significantly speed up the computation time and show that the TVflowNET approximates the TV flow solution with high fidelity for different image sizes and image types. Additionally, we give a full comparison for different network architecture designs as well as training regimes to highlight the fidelity of our approach.

The last part of the thesis concerns the application of the spectral TV decomposition to medieval paper analysis. Medieval paper, a handmade product, is made with a mould which leaves an indelible imprint on the sheet of paper. This imprint includes chain lines, laid lines and watermarks which are often visible on the sheet. Extracting these features allows the identification of paper stock and gives information about chronology, localisation and movement of manuscripts and people. Most computational work for feature extraction of paper analysis has so far focused on radiography or transmitted light images. While these imaging methods provide clear visualisation for the features of interest, they are expensive and time consuming in their acquisition and not feasible for smaller institutions. However, reflected light images of medieval paper manuscripts are abundant and possibly cheaper in their acquisition. We propose algorithms to detect and extract the laid and chain lines from reflected light images. We tackle the main drawback of reflected light images, that is, the low contrast attenuation of chain and laid lines and intensity jumps due to noise and degradation, by employing the spectral TV decomposition and develop methods for subsequent chain and laid line extraction. Our results clearly demonstrate the feasibility of using reflected light images in paper analysis. This work enables the feature extraction for paper manuscripts that have otherwise not been analysed due to a lack of appropriate images. We also open the door for paper stock identification at scale.