Vision-based classification for underwater safety critical applications

Luiza Ribeiro Marnet

Research output: ThesesPhD thesis

Abstract

Computer vision is crucial for ensuring a safe future for the quickly growing underwater infrastructure monitoring industry, which currently relies primarily on labour-intensive and costly manual methods. Over the last twelve years, deep learning models have revolutionized the field of computer vision and have been applied across various domains. These models could potentially assist in developing underwater surveys, for example, by analyzing videos and image data from equipment inspection and environment monitoring, thereby automating the process and reducing the time spent on visual inspections. Despite the success of deep learning models, underwater images impose challenges not faced with in-air images. Underwater images are generally of poor quality with issues such as blur, haziness, non-uniform illumination, and color degradation. These factors raise concerns about the performance of well-known deep learning models in underwater applications.

As a subset of machine learning, deep learning models are datadriven, and the quality of the training data impacts the model’s performance. Furthermore, as a consequence of the large number of parameters, they are data-greedy, requiring large datasets for effective training. However, large underwater datasets are difficult to generate and not widely available. Collecting underwater data relies on the availability of underwater vehicles, specialists to perform the survey, and favorable weather and water conditions. Once collected, the data needs to be labeled. Data annotation is an extremely laborious task, prone to human errors, which reduces the quality of the final dataset.

Furthermore, a well-known problem in deep learning is the overconfidence of the models, which can predict outputs with high probability even for inputs out of distribution (OOD) of the training data. This dissertation leverages this overconfidence by employing predictive uncertainty to identify the most important images for labelling, addressing limited financial or human resources for data annotation.

An extensive review of the state-of-the-art of deep learning models applied to underwater images concluded that predictive uncertainty is rarely used in this field. The literature review also revealed the scarcity of large underwater datasets and the researchers’ efforts to develop models and training strategies to overcome this limitation. This lack of publicly available datasets is even more pronounced for industrial applications. To address this gap, this thesis supported the development and release of three publicly available underwater RGB datasets: MIMIR (synthetic), SubPipe, and MarinaPipe (real-world datasets). The contribution for the MIMIR project consisted of evaluating the dataset usability in the context of pipeline segmentation. In the SubPipe and MarinaPipe projects, the work consisted on annotating and evaluating the datasets for image segmentation. MarinaPipe was recorded in very shallow waters, resulting in images with visible sunlight rays, and contains pipelines occluded by the sediments from the marina floor.
SubPipe was recorded in deeper waters, leading to darker images, with many parts of the pipeline covered by sand. The links for downloading these three datasets are available in the REMARO GitHub (https://github.com/remaro-network).

The overconfidence in deep learning models is an overwhelming concern. However, it is possible to leverage this overconfidence by calculating the predictive uncertainty to assess the models’ lack of knowledge and use this information to reduce the effort required to generate labeled datasets. This thesis investigated the hypotheses that, given a model trained on synthetic data, the predictive uncertainty enables the selection of real-world images about which the model demonstrates little to no knowledge, for fine-tuning the model and bridging the synthetic-to-reality gap that exists even for photorealistic images. Selecting images based on uncertainty, calculated with Monte Carlo dropout, resulted in a model with better performance compared to randomly selecting the same amount of images.

Additionally, this research explored using predictive uncertainty calculated with Monte Carlo dropout for active learning in the underwater domain. When training with real underwater pipeline images and using uncertainty-driven active learning for selecting images, at least 15.9% fewer annotations were needed to achieve the same performance as a model trained on randomly selected images. In addition, this PhD research trained a generalized few-shot segmentation model and used predictive uncertainty to evaluate the reliability of the predictions using mutual information and entropy.

The experiments performed indicate that predictive uncertainty both increases the reliability of deep learning models and optimizes the use of human and financial resources available for data annotation by selecting the most informative data samples.
Original languageEnglish
QualificationPhD
Supervisor(s)
  • Wasowski, Andrzej, Principal Supervisor
  • Grasshof, Stella, Co-supervisor
Award date11 Mar 2025
Publisher
Print ISBNs978-87-7949-536-4
Publication statusPublished - 2025

Fingerprint

Dive into the research topics of 'Vision-based classification for underwater safety critical applications'. Together they form a unique fingerprint.

Cite this