- Docente: Paolo Ciaccia
- Credits: 6
- SSD: ING-INF/05
- Language: Italian
- Teaching Mode: In-person learning (entirely or partially)
- Campus: Bologna
- Corso: Second cycle degree programme (LS) in Computer Engineering (cod. 0234)
Learning outcomes
This advanced course aims to provide the student with the knowledge needed to solve, both efficiently and effectively, several search problems arising in modern information systems, in particular those arising when dealing with non-standard data types, such as text documents, XML data, images, geometric objects, Web pages, and time series. For each of these scenarios basic approaches able to find the "best" data according to user preferences will be introduced and discussed
Course contents
- Introduction
Course organization, peculiarities of modern information systems, application scenarios (e-commerce, search engines, multimedia systems, selective dissemination of information). - Top-k queries
Limits of traditional DBMS's. SQL extensions: the Stop After clause and the Stop operator. Multidimensional Top-k queries: attribute space, weights, distance functions, limits of B+-tree-based evaluation. The R-tree: basic principles, algorithms for range and k nearest neighbors queries, distance browsing - Multi-sources Top-k queries
Mediators and wrappers (sketch). Sorted and random accesses, scoring functions. The B0, A0 and TA algorithms. - Preference relations
Quantitative and qualitative preferences, preference relations, weak and partial orders, preference composition. The BMO (Best-Matches-Only) query model: the Best operator and the BNL (Block-Nested-Loops) algorithm. Skyline queries: the BBS (Branch and Bound Skyline) algorithm. - Text Information Retrieval
Boolean queries, inverted index, stemming and thesauri. Term weighting: the tf.idf method. The Vector Space model. Precision and recall. Web search engines: why links affect relevance. Google PageRank. Hubs and Authorities. Approximate queries on XML documents: basic principles - Multimedia Information Retrieval
Content-based search, feature extraction and feature approximation. The Filter & Refine (F&R) strategy: the L-B lemma, multi-step k nearest neighbors queries. Image data bases: color spaces, color histograms and quadratic distance, texture and Gabor filters (sketch), shape and parametric representation. Time series: dmensionality reduction using DFT (Discrete Fourier Transform) and PAA (Piecewise Aggregate Approximation), the DTW (Dynamic Time Warping) distance. Relevance feedback: principles and examples, basic techniques - Multimedia indexing
Vector spaces: the R-tree (object insertion and node split algorithms). Metric spaces: indexing principles, the M-tree. High-dimensional spaces: the curse of dimensionality, the X-tree and the VA-file
Links to further information
http://www-db.deis.unibo.it/courses/SI-LS/
Office hours
See the website of Paolo Ciaccia