84530 - Database and Big Data Technologies M

Course Unit Page

Academic Year 2019/2020

Learning outcomes

Knowledge of building principles of DataBase Management Systems. Ability to design physical databases. Technologies for the management of big data

Course contents

A prior knowledge and understanding of models and tools for organizing, managing and designing relational databases is required to attend with profit this course. This is usually obtainable by passing the exam of "Information Systems T" of the Computer Engineering bachelor degree.

  1. Architecture of a DBMS
    Main modules and their roles.
  2. The physical Data Base
    Memory management: devices, pages and files. Representing attributes and tuples. Reading and writing disk pages: the buffer manager. File types. Cost evaluation of some basic file operations.
  3. Mono-dimensional indices
    Index types. Tree indices: B-tree and B+-tree. Hash indices: static hash, dynamic hash (linear hashing, extendible hashing).
  4. Multidimensional (spatial) data and indices
    Spatial queries. Point indices (k-D and k-D-B-tree), indices for spatial objects (R-tree), GiST.
  5. Implementing relational operations
    Logical and physical operators: sort (external Z-way sort-merge), selection (sequential scann, single index, multiple indices), projection (sort-based, hash-based, index-based), join (nested loops, block nested loops, merge scan, hash join), set operators (union and difference), aggregation operators.
  6. Query processing
    Steps of the evaluation process. Semantic checks and catalogs. Rewriting SQL queries. Statistical profiles: average values and histograms. Estimating costs and result size. Access plans: evaluation using materialization and pipeline. The optimization process: enumerating access plans and domination rules. Determining the optimal access plan using dynamic programming.
  7. Transaction management
    Concurrency control: problems, lock and Strict 2PL protocol. Fault tolerance: log file, WAL protocol, buffer and commit management, checkpoint and DB dump.
  8. Physical design of DataBases
    Query workload, index selection. Performance tuning (indices, schema and queries).
  9. Ranking of results
    Motivations and limits of existing solutions for Top-k queries. SQL extensions for ranking results.
    Mono- e multi-dimensional Top-k queries: attributes space, attributes weighing, distance functions, limits of B+-tree query processing.
    R-tree-based algorithms: k nearest neighbor and distance browsing. Top-k join queries: sorted and random access, scoring functions, relationships with distance functions. B0, FA, TA, CA, and NRA algorithms.
  10. Skyline queries
    Concept of domination, relationship with scoring functions, index-based algorithms (BBS) and non-index algorithms (NL, BNL, SFS, SaLSa).
  11. Big Data & NoSQL DBMS
    Hadoop and MapReduce. Non-relational data models. The CAP theorem.

Teaching methods

The course is provided by means of slides displayed during lecture hours.

Fluent spoken and written italian is a necessary pre-requisite: all lectures and tutorials will be in italian. However the provided slides are in english.

Assessment methods

Achievements will be assessed by the means of a final exam. This is based on an analytical assessment of the "expected learning outcomes" described above. In order to properly assess such achievement the examination is composed of an oral exam with both teachers. The final score takes into account the marks obtained in both oral exams. The date of the oral exam should be fixed with both teachers. For exams with several students, it could be the case that a written exam (with open-ended questions) is required.

Higher grades will be awarded to students who demonstrate an organic understanding of the subject, a high ability for critical application, and a clear and concise presentation of the contents. To obtain a passing grade, students are required to at least demonstrate a knowledge of the key concepts of the subject, some ability for critical application, and a comprehensible use of technical language. A failing grade will be awarded if the student shows knowledge gaps in key-concepts of the subject, inappropriate use of language, and/or logic failures in the analysis of the subject.

Teaching tools

Classroom lessons will be held using slides, which will be integrated with the use of the blackboard for the development of exercises.

Links to further information

http://www-db.disi.unibo.it/courses/TBD/

Office hours

See the website of Marco Patella

See the website of Paolo Ciaccia