A Deep Learning Approach to Automated Coeliac Disease Diagnosis
Abstract: Coeliac disease (CD) is a small intestinal autoimmune disorder triggered by the consumption of gluten. Manifestations are variable and non-specific, ranging from no symptoms, through fatigue, diarrhoea, and vomiting, and, in rare cases, lymphoma and duodenal cancer. The treatment consists of a lifelong gluten-free diet, making early and accurate diagnoses of critical importance.
The current gold-standard for CD diagnosis is through the procurement and assessment of a hematoxylin and eosin (H&E)-stained duodenal biopsy. However, this is an unavoidably subjective process with low interobserver concordance, resulting in as many as 8/9 affected individuals remaining undetected. There is a clear unmet need for a new gold-standard in CD diagnosis.
A potential method for developing such a diagnostic tool is through digital image processing. Modern machine learning techniques, specifically deep learning, have found success in reliably automating complex image processing tasks. Additionally, biopsies can now be scanned digitally as whole-slide images (WSI), allowing biopsy analysis to be automated using digital image processing. In this thesis, I address the following question: “Can modern machine learning techniques provide a sensitive, objective and reproducible test for CD by learning from WSIs of H&E-stained duodenal biopsies?”
Firstly, I describe in detail how pathologists diagnose CD, how biopsies are represented digitally, and the advantages and challenges of using digital image processing to automate CD diagnosis. Secondly, I assess the use of traditional image processing techniques for providing an automated CD diagnosis tool. I then develop and analyse tools required for a deep learning approach. I build an image processing tool that can segment the tissue in WSIs from background and artefacts, and I compare different data processing techniques to improve the generalizability of a deep learning based CD diagnosis tool. Finally, I use the tissue segmentation tool and data processing techniques discussed previously to develop a deep learning based CD diagnosis tool which focuses on accuracy, generalizability and interpretability.
This thesis aims to assess whether modern machine learning can be used as a much-needed new gold-standard in CD diagnostics.