91407 - LABORATORIO DI GENOMICA COMPARATA

Academic Year 2020/2021

  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Biodiversity and Evolution (cod. 9075)

Learning outcomes

After this course the student will master the basic computational and bioinformatics methods necessary for handling and analyzing genomics data. The course will provide theoretical and practical skills to work with High-Throughput Sequencing data through a pipeline that includes quality check and filtering, de novo assembly of genomes/transcriptomes, mapping and variant discovery, RNA-Seq, normalization, transcript quantification, identificaton of differentially expressed genes, annotation of coding and non-coding elements, orthology identification. Everything will be presented in a comparative framework, to identify the evolutionary components that contributed to shape the extant organisms.

Course contents

 Course introduction

2. Technologies and applications

Technologies: 454 Pyrosequencing; Reversible Terminator Sequencing (Illumina); Ion Semiconductor Sequencing (Ion Torrent); Single-Molecule Real-Time (SMRT) Sequencing (PacBio); Nanopore Sequencing; Comparison of sequencing platforms.
Applications: Shotgun Sequencing; RAD-Seq; Hybrid Enrichment; RNA-Seq; Single-Cell genomics and transcriptomics.

3. Practical computing skills

“Big Data” in Biology, robust and reproducible research, experimental design, data and documentation management.
The Unix Shell: streams, redirections, Unix pipe, process management and interactions, connection to remote machines with SSH, maintaining long-running jobs (nohup, screen), data download (wget, curl, scp), data integrity (md5), looking at differences between data, data compression.
Manipulating text data: head, tail, less, wc, ls, cut, grep, sort, uniq, join, sed; Awk and Bioawk.
Sequence data: FASTA format, FASTQ format, base calling quality, trimming, parsing.
Alignment data: SAM, BAM, samtools, sort and index, variant calling.
Primer of Shell scripting.

4. Database and bioinformatics resources

Public database overview.

5. Genome and transcriptome assembly

Data quality and filtering.
Assembly strategies: Greedy Assemblies; Overlap-Layout-Consensus (OLC) Assemblies; K-mer Assemblies Using de Bruijn Graphs; comparison of assembly strategies.
De novo assembly; scaffolding; hybrid assembly; RNA-Seq; metagenomics.

6. Transcriptomics

Analysis of differential transcription, data normalization and comparative methods; mapping and transcript quantification; DEGs, fold change, Gene Set Enrichment Analysis (GSEA).

7. Annotation

Databases: Gene Ontology; UniProt; Pfam; Ensembl; KEGG.
Tools: BLAST; HMMER; InterProScan; Multiple Sequence Alignment; Alignment Masking; Mapping Sequence Reads; Whole-Genome Alignments.
Finding genes: homology, orthologs and paralogs; Hidden Markov Profiles; gene ontology and “the Ortholog Conjecture”.

8. Variant discovery and genotyping

Variant calling; VCF, GATK, FreeBayes.

9. Phylogenomics and comparative genomics

The comparative approach: strengths, caveats, methodological hurdles. Sources of errors and incongruences in phylogenomic analyses: systematic errors; missing data; taxon sampling; gene sampling; incongruences between species tree and gene trees.
Phylogenetic markers and future perspectives.

Readings/Bibliography

  • Vince Buffalo “Bioinformatics Data Skills”, O’Reilly.

  • Christoph Bleidorn “Phylogenomics”, Springer.

  • Scientific publications and online material.

Teaching methods

The course will alternate frontal theoretical lessons to practical, hands-on lessons during which the students will have the opportunity to perform analyses on a dataset of their choice.
Immediately after the part 3 of the program, each student will choose a biological problem and a dataset (among public data) on which they will develop a project that will be evaluated for the final grade.

Before taking this course, it is highly recommended to have attended the following courses:

91400 - Biometria Evoluzionistica ed Ecologica,
91360 - Genetica di Popolazione ed Evoluzione Molecolare,
91789 - Evoluzione e Filogenesi (C.I.),
91399 - Evoluzione del Genoma.

Assessment methods

Evaluation of the project and brief interview (focused on the project).

The requirements for the projects and the submission procedures will be explained during the introductory lesson.

Teaching tools

Slides, scientific papers, online material, hands-on sessions on the PC, use of a high performance workstation.

Office hours

See the website of Fabrizio Ghiselli