66573 - LABORATORY OF BIOINFORMATICS 2

Course Unit Page

SDGs

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.

Partnerships for the goals

Academic Year 2018/2019

Learning outcomes

At the end of the course, the student acquires expertise on the selection and/or development and application of tools useful to address important problems of Bioinformatics and to verify the capability in handling autonomously a research project. The student will be acquainted with: - analyzing a research project where the bioinformatic approach is required; - developing the project workflow with all the necessary steps; - evaluating all the possible risks of failure and success probability; - applying of selected bioinformatics tools for the project outcomes; - developing the required software if necessary; - analyzing the results in terms of their exportability to a wet lab; - drawing conclusions in terms of benefits vs. putative costs.

Course contents

Choice of a project: 1) Search in the state of the art literature and critical reading, 2) Cost/Benefit analysis, 3) Application of the putative results, 4) Discussion of putative solutions, 5) Development of the project workflow, 6) Writing a paper on the selected topic and its solution within a bioinformatic framework according to a given journal (Bioinformatics). Discussion of the results in relation to the the expected goal. Some lectures will be given addressing open problems in Bioinformatics:

  • Protein-protein interaction and their prediction
  • SNP annotation and some test cases

Readings/Bibliography

Selected reviews and articles in cloud sharing

Teaching methods

Lectures and practicum. Development of a project in the field of Bioinformatics

Assessment methods

The final assessment aims at evaluating the capabilities of the student to have acquired expertise in the field of Bioinformatics and it comprises the following:

Sending a paper [in the editing format required by the leading journal in the field :Bioinformatics] at least two working days before the oral section. The oral section will include a discussion on the following

  • Overview of the project
  • Brief introduction to the biological problem
  • Statistical description of the dataset of proteins adopted to solve the problem at hand
  • Representation of protein sequences using orthogonal vectors
  • Sliding windows and imput encoding
  • Basic vector algebra: addition, subtraction, dot product, mean vector
  • Development of simple linear classifiers for classification
  • Geometrical interpretation of the linear classifier/s
  • Scoring indices: confusion matrix, sensitivity, specificity, accuracy, Matthews correlation coefficient
  • The cross-validation procedure
  • Development of a machine learning approach
  • Evaluation of the reasons why linear classifiers are overpassed by machine learning approaches

The candidate's technical capabilities will be checked on:

  • Knowledge of the libsvm package for SVM development
  • Description of the input file format
  • Description of the command line options: svm types, kernels, hyper-parameters
  • Grid-search procedure for hyper-parameter optimization
  • The python interface to libsvm
  • Examples of usage

Finally the candidate should also prove to be an expert on the state of art of two major hot topics in Bioinformatics:

  • Protein-protein interaction and their prediction
  • SNP annotation and some test cases

Teaching tools

Online, Public Data Bases, PubMed, and materials (lecture's pdfs, selected articles) in cloud sharing.

Office hours

See the website of Rita Casadio

See the website of Castrense Savojardo