Foto del docente

Davide Salomoni

Adjunct professor

Department of Pharmacy and Biotechnology

Teaching

Dissertation topics suggested by the teacher.

  • Integration of Big Data standards in bioinformatics-related processes to address data quality and data lineage

With the rapid generation of large and complex biomedical datasets, ensuring data quality and traceability is fundamental to guaranteeing the reliability and reproducibility of machine learning and artificial intelligence applications, as well as effective data governance. Although the use of containers (e.g., Docker, Apptainer) and workflow managers (e.g., Nextflow, Snakemake) is essential for managing and executing data analysis pipelines, complete information about data quality and data lineage is equally important. However, consistent and standardized treatment of these aspects is often missing. This thesis will focus on integrating state-of-the-art Big Data management practices to improve biomedical data governance. In this way, we aim to enhance the reliability and impact of biomedical research, fostering better collaboration and innovation within the scientific community. The thesis work will be validated on both synthetic and real-world data.

  • Genomics Data Encryption

This thesis aims to investigate and prototype secure methods for accessing sensitive genomic data in distributed computing environments, including High-Performance Computing (HPC) clusters and Cloud platforms. The work will explore encryption and access control strategies that ensure data remains protected — even from system administrators — while remaining usable within bioinformatics workflows. Technologies such as Crypt4GH, the official encryption standard of the Global Alliance for Genomics and Health (GA4GH) and adopted by the ELIXIR Federated EGA, and FUSE-based virtual filesystems will be evaluated for their potential to enable transparent, on-the-fly decryption. The system will be designed for integration with workflow managers like Nextflow and Snakemake, supporting reproducible and privacy-preserving analysis pipelines. The goal is to identify and prototype practical solutions that balance security, performance, and usability in real-world genomics research.

  • A Chatbot for the clinical annotation of sequence variants

This thesis aims to develop a Chatbot capable of assisting clinicians in the annotation of sequence variants, leveraging the latest advancements in Large Language Models (LLMs). This will require the integration of multiple data sources, such as ClinVar, GnomAD, Humsavar, and others. This is a complex thesis, that will be performed in close collaboration with prof. Emidio Capriotti and collaborators, with ISO-certified data centers (e.g., those of INFN), and with the Vall d’Hebron University Hospital in Barcelona, Spain, a leading institution in the field of clinical genomics. We expect the Chatbot to be designed to interact with clinicians, providing them with relevant information and insights about genetic variants, including their potential clinical significance, associated diseases, and available treatments. The goal is to create a user-friendly tool that enhances clinical decision-making and improves patient outcomes by facilitating the interpretation of complex genomic data.