Generative AI has seen tremendous successes recently, most notably the chatbot ChatGPT and the DALLE2 software creating realistic images and artwork from text descriptions. Underlying these and other generative AI systems are usually neural networks trained to produce text, images, audio, or video from text inputs. The aim of this talk is to develop an understanding of the fundamental capabilities of generative neural networks. Specifically and mathematically speaking, we consider the realization of high-dimensional random vectors from one-dimensional random variables through deep neural networks. The resulting random vectors follow prescribed conditional probability distributions, where the conditioning represents the text input of the generative system and its output can be text, images, audio, or video. It is shown that every d-dimensional probability distribution can be generated through deep ReLU networks out of a 1-dimensional uniform input distribution. What is more, this is possible without incurring a cost—in terms of approximation error as measured in Wasserstein-distance—relative to generating the d-dimensional target distribution from d independent random variables. This is enabled by a space-filling approach which realizes a Wasserstein-optimal transport map and elicits the importance of network depth in driving the Wasserstein distance between the target distribution and its neural network approximation to zero. Finally, we show that the number of bits needed to encode the corresponding generative networks equals the fundamental limit for encoding probability distributions (by any method) as dictated by quantization theory according to Graf and Luschgy. This result also characterizes the minimum amount of information that needs to be extracted from training data so as to be able to generate a desired output at a prescribed accuracy and establishes that generative ReLU networks can attain this minimum.

This is joint work with D. Perekrestenko and L. Eberhard