Towards Learning the Geometry of Data: From Diffusion Models to Riemannian Geometry
Abstract: This thesis establishes novel connections between generative modelling, self-supervised learning, and Riemannian geometry, paving the way for learning the intrinsic geometry of data manifolds. In chapter 3, we introduce CAFLOW, a conditional normalising flow that improves image-to-image translation by hierarchically modelling image distributions across scales. In chapter 4, we introduce non-uniform diffusion models, which apply faster diffusion speed to high-frequency information and slower diffusion speed to lower-frequency information, creating a multi-scale structure similar to multi-scale normalising flows. This approach outperforms standard diffusion models in both image quality and generation speed. We also introduce a novel estimator for the conditional score function based on non-uniform diffusion, and present the first rigorous proof of consistency for the Conditional Denoising Estimator (CDE)—the most widely used loss function for training conditional diffusion models. In chapter 6, we introduce ScoreVAE, a novel Variational Auto-encoder (VAE) that alleviates the typical VAE limitation of blurry reconstructions by combining a frozen pretrained diffusion model with a learnable time-dependent encoder to model the reconstruction distribution. In chapter 5, we establish the fundamental connection between diffusion models and data geometry by proving that diffusion models approximate the normal bundle of data manifolds. Building on this insight, we develop a new state-of-the-art method for estimating the local intrinsic dimension (LID), which has been successfully applied to studying memorisation and generalisation in diffusion models, as well as improving out- of-distribution detection. In chapter 7, we take the first step towards a scalable method for learning the geometry of the data manifold. We construct the pullback Riemannian metric defined by the pseudo score function, which leads to closed-form expressions for fundamental geometric constructs, such as geodesics, distances, and exponential/logarithmic maps. Moreover, the proposed Riemannian structure yields a Riemannian Auto-encoder (RAE) that simultaneously learns the intrinsic dimension of the data manifold, a global chart mapping from the manifold to the intrinsic latent space (encoder), and the inverse chart mapping from latent space back to the manifold (decoder).