41574 - Information Systems (Graduate Course)

Academic Year 2009/2010

  • Docente: Paolo Ciaccia
  • Credits: 6
  • SSD: ING-INF/05
  • Language: Italian
  • Teaching Mode: In-person learning (entirely or partially)
  • Campus: Bologna
  • Corso: Second cycle degree programme (LS) in Computer Engineering (cod. 0234)

Learning outcomes

This advanced course aims to provide the student with the knowledge needed to solve, both efficiently and effectively, several search problems arising in modern information systems, in particular those arising when dealing with non-standard data types, such as text documents, XML data, images, geometric objects, Web pages, and time series. For each of these scenarios basic approaches able to find the "best" data according to user preferences will be introduced and discussed

Course contents

  1. Introduction
    Course organization, peculiarities of modern information systems, application scenarios (e-commerce, search engines, multimedia systems, selective dissemination of information).
  2. Top-k queries
    Limits of traditional DBMS's. SQL extensions: the Stop After clause and the Stop operator. Multidimensional Top-k queries: attribute space, weights, distance functions, limits of  B+-tree-based evaluation. The R-tree: basic principles, algorithms for range and k nearest neighbors queries, distance browsing
  3. Multi-sources Top-k queries
    Mediators and wrappers (sketch). Sorted and random accesses, scoring functions. The B0, A0 and TA algorithms.
  4. Preference relations
    Quantitative and qualitative preferences, preference relations, weak and partial orders, preference composition. The BMO (Best-Matches-Only) query model: the Best operator and the BNL (Block-Nested-Loops) algorithm. Skyline queries: the BBS (Branch and Bound Skyline) algorithm.
  5. Text Information Retrieval
    Boolean queries, inverted index, stemming and thesauri. Term weighting: the tf.idf method. The Vector Space model. Precision and recall. Web search engines: why links affect relevance. Google PageRank. Hubs and Authorities. Approximate queries on XML documents: basic principles 
  6. Multimedia Information Retrieval
    Content-based search, feature extraction and feature approximation. The Filter & Refine (F&R) strategy: the L-B lemma, multi-step k nearest neighbors queries. Image data bases: color spaces, color histograms and quadratic distance, texture and Gabor filters (sketch), shape and parametric representation. Time series: dmensionality reduction using DFT (Discrete Fourier Transform) and PAA (Piecewise Aggregate Approximation), the DTW (Dynamic Time Warping) distance. Relevance feedback: principles and examples, basic techniques
  7. Multimedia indexing
    Vector spaces: the R-tree (object insertion and node split algorithms). Metric spaces: indexing principles, the M-tree. High-dimensional spaces: the curse of dimensionality, the X-tree and the VA-file

Links to further information

http://www-db.deis.unibo.it/courses/SI-LS/

Office hours

See the website of Paolo Ciaccia