72944 - Database Technologies M

Academic Year 2014/2015

  • Docente: Marco Patella
  • Credits: 8
  • SSD: ING-INF/05
  • Language: Italian
  • Moduli: Marco Patella (Modulo 1) Paolo Ciaccia (Modulo 2)
  • Teaching Mode: In-person learning (entirely or partially) (Modulo 1); In-person learning (entirely or partially) (Modulo 2)
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Computer Engineering (cod. 0937)

Learning outcomes

Knowledge of building principles of DataBase Management Systems. Ability to design physical databases.

Course contents

  1. Architecture of a DBMS
    Main modules and their roles.
  2. The physical Data Base
    Memory management: devices, pages and files. Representing attributes and tuples. Reading and writing disk pages: the buffer manager. File types. Cost evaluation of some basic file operations.
  3. Mono-dimensional indices
    Index types. Tree indices: B-tree and B+-tree. Hash indices: static hash, dynamic hash (linear hashing, extendible hashing).
  4. Multidimensional (spatial) data and indices
    Spatial queries: range, k-nearest neighbor, spatial join. Point indices (k-D and k-D-B-tree, hB-tree, Grid file), indices for spatial objects (R-tree), GiST, space transformation techniques.
  5. Implementing relational operations
    Logical and physical operators: sort (external Z-way sort-merge), selection (sequential scann, single index, multiple indices), projection (sort-based, hash-based, index-based), join (nested loops, block nested loops, merge scan, hash join), set operators (union and difference), aggregation operators.
  6. Query processing
    Steps of the evaluation process. Semantic checks and catalogs. Rewriting SQL queries. Statistical profiles: average values and histograms. Estimating costs and result size. Access plans: evaluation using materialization and pipeline. The optimization process: enumerating access plans and domination rules. Determining the optimal access plan using dynamic programming.
  7. Transaction management
    Concurrency control: problems, lock and Strict 2PL protocol. Fault tolerance: log file, WAL protocol, buffer and commit management, checkpoint and DB dump.
  8. Physical design of DataBases
    Query workload, index selection. Performance tuning (indices, schema and queries).
  9. Ranking of results
    Motivations and limits of existing solutions for Top-k queries. SQL extensions for ranking results.
    Mono- e multi-dimensional Top-k queries: attributes space, attributes weighing, distance functions, limits of B+-tree query processing.
    R-tree-based algorithms: k nearest neighbor and distance browsing. Top-k join queries: sorted and random access, scoring functions, relationships with distance functions. B0, FA, TA, CA, and NRA algorithms.
  10. Skyline queries
    Concept of domination, relationship with scoring functions, index-based algorithms (BBS) and non-index algorithms (NL, BNL, SFS, SaLSa).

Teaching methods

The course is provided by means of slides displayed during lecture hours.

Assessment methods

Student assessment is made by means of an oral exam: both teachers should be contacted in order to fix the exam date.

Teaching tools

Classroom lessons will be held using slides, which will be integrated with the use of the blackboard for the development of exercises.

Links to further information

http://www-db.deis.unibo.it/courses/TBD/

Office hours

See the website of Marco Patella

See the website of Paolo Ciaccia