# 69443 - ALGORITHMS AND DATA STRUCTURES FOR COMPUTATIONAL BIOLOGY

### Course Unit Page

• Teacher Zeynep Kiziltan

• Learning modules Zeynep Kiziltan (Modulo 1)
Ivan Lanese (Modulo 2)
Alan Albert Bertossi (Modulo 3)

• Credits 10

• SSD INF/01

• Language English

• Campus of Bologna

• Degree Programme Second cycle degree programme (LM) in Bioinformatics (cod. 8020)

## Learning outcomes

At the end of the course, the student has the basic Knoledge in design and analysis of correct and efficient algorithms and data structures. In particular, the student has basic knowledge on algorithms and data structures. The student will be able to: design correct and efficient algorithm for common computational tasks; analyze existing algorithms and data structures; design and analyze new algorithms and data structures.

## Course contents

Introduction to algorithms and basic data structures: definitions, algorithms, simple data structures. Lists, arrays, simple operations, examples, implementation. Complex data structures, computation efficiency and relation between computation efficiency and data structures. Stacks and queues (definition, examples, implementation). Trees, visit algorithms of a tree and tree operations. Sets, dictionaries and hash tables, their operations and implementation concepts. Graphs, visit algorithms for graphs, operations and implementation concepts. Exercises and tests ( graphs and trees, priority queues and heaps). Binary trees, search trees, balanced binary trees, B-trees, AVL trees. Merge-find sets. Suffix tries, trees, and arrays. Algorithms. Decision tree. Analysis of computational complexity of algorithms. O(), Omega() and Theta() functions. Iterative and recursive programming methodologies. Divide and conquer. Recursion. Greedy techniques (knapsack problem, radix sorting problem, Huffman codes). Local search. Enumeration. Binary search. Sorting algorithms. Sequence pattern analysis and matching. Sequence alignment. Algorithm optimization. Complexity analysis of algorithms. Hard problems. Branch & bound. Approximation. Heuristics. Exercises and tests.

It is fundamental to use the electronic slides and material provided during the classes, including live exercises. The slides will be made available on the IOL platform.

Suggested readings (not to be necessarily purchased):

Books more specific on computational biology topics:

- Carlos Setubal, Joao Meidanis, Introduction to computational molecular biology, Cengage Learning eds.;
- 1997 C. Gibas, P. Jambeck, Developing Bioinformatics Computer Skills, O'Reilly media, 2001

Books more general on computer algorithms and data structures topics:

- Alfred V. Aho, Jeffrey D. Ullman, John E. Hopcroft. Data Structures and Algorithms. Addison Wesley, 1983.
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein. Introduction to Algorithms. McGraw-Hill, 2001.
- Donald E. Knuth. The Art of Computer Programming, Volumes 1-3. Addison-Wesley Professional, 1998.
- S.B. Kishor Data Structures, Edition 3. Das Ganu Prakashan, Nagpur, 2008.

## Teaching methods

The course is structured with live lessons, illustrating the theoretical aspects of the topics concerning the data structures and algorithmic design for computational biology. After the introduction of basic definitions and concepts, the course illustrates the design of simple data structures and the relationship with the costs of execution (computational efficiency) and memory occupation (space) on the computer architecture. Some results are presented and demonstrated concerning the design of efficient algorithms and their relationship with efficient and well designed data structures, functional to the algorithm execution. Some programming and algorithmic design techniques are introduced and discussed in terms of computational and space complexity: iterative, recursive, divide et impera, greedy, dynamic programming. Finally, space is given to the application of design concepts towards the realization of solutions for exercises and analysis methodologies in bioinformatics and molecular biology. In addition to the live lessons, live exercises in class are provided.

## Assessment methods

The final exam is composed of a written part (3 hours) and possibly an oral part (30 minutes). During the written part, it is NOT allowed the use of books and notes, calculator and communication devices, including Internet access. The written part is composed of two sections: the first section concerns the design and analysis of algorithms and data structures oriented to the maximum computational and space efficiency by adopting the techniques and knowledge learnt during the classes. Specifically, the candidates must develop the pseudo code of the algorithm and the corresponding well designed data structures, by commenting and motivating the choices made, and defining the space and time complexity of execution. The first exercise will takes more or less 90 minutes, and the weigth of the score is more or less 50% of the total score. The second section of the written part concerns exercises on the topics of the classes, including interpretation and design of the concepts learned. Such exercises are derived and similar to those made in class. The time limit is 90 minutes and the weight is more or less 50% of the total score. The calendar of written exams is provided by the end of the classes, and at least one writen exam session is allocated in June, July, September, October, December and February. It is mandatory to subscribe to the list of participants on almaesami within the announced deadlines. Provided that the candidate passes the written exam (with evaluation greater or equal to 18/30), s/he may be scheduled for the oral part in case the written exam evaluation is low. In that case, to pass the course, it is mandatory to receive a sufficient evaluation in both the written and oral parts.

## Teaching tools

Electronic slides and projector.
Personal computer.
White board.