Data Depth: Methodology and Computation

Data depth is a statistical function that measures centrality of an observation with respect to a probability distribution, with an empirical measure (on a data set) being its most important particular case. By exploiting the geometry of data, the depth function is fully non-parametric, robust to both outliers and heavy tailed distributions, and satisfies desirable invariances. By dint of these advantages, it is used in a variety of tasks as a generalisation of quantiles in higher dimensions and as an alternative to the probability density. Introduced in the second half of the twentieth century and having undergone theoretical and computational developments since then, data depth became a universal methodology for ordering complex data and is now employed in numerous applications: supervised and unsupervised machine learning, robust optimisation, financial risk assessment, statistical quality control, extreme value theory, imputation of missing data, etc.
In this presentation, we should survey the notion of data depth and highlight its most relevant advantages. We shall start with the formal definition of statistical data depth function and study of its (most important) properties. This will be followed by an assortment and analysis of commonly used depth notions. Furthermore, relevant computational aspects will be regarded including most recent advances that allow depth-based applications on contemporary scale. The presentation shall be accompanied by application examples on synthetic and real data sets. Finally, several open questions will be discussed in the outlook.

Department of Applied Mathematics and Theoretical Physics

Further information

Time:

Venue:

Speaker:

Series:

Study at Cambridge

About the University

Research at Cambridge