30711 - RECORD LINKAGE

Anno Accademico 2023/2024

  • Docente: Edoardo Redivo
  • Crediti formativi: 6
  • SSD: SECS-S/01
  • Lingua di insegnamento: Inglese
  • Moduli: Edoardo Redivo (Modulo 1) Edoardo Redivo (Modulo 2)
  • Modalità didattica: Convenzionale - Lezioni in presenza (Modulo 1) Convenzionale - Lezioni in presenza (Modulo 2)
  • Campus: Bologna
  • Corso: Laurea in Scienze statistiche (cod. 8873)

Conoscenze e abilità da conseguire

At the end of the course the student will know the methods for linking the information referred to the same statistical unit. This information belongs to different archives and the statistical unit is not identified by means of a code free of errors. The student will be able to use the exact matching, by means of deterministic and probabilistic record linkage and the basic tools of statistical matching.

Contenuti

- The statistical formalisation of the record linkage problem

- Deterministic record linkage

- Comparison functions

- Deterministic and probabilistic blocking

- Probabilistic record linkage (Fellegi-Sunter model)

- Estimation via the EM algorithm

- Record linkage as an assignment problem

- Supervised classification for record linkage

- Unsupervised classification for record linkage

- More recent developments and Bayesian models for record linkage

Testi/Bibliografia

Christen, P. (2012). Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Berlin: Springer, pp.270. ISBN: 978-3-642-43001-5.

Herzog, N., Scheuren, F.J., Winkler, W.E. (2007). Data Quality and Record Linkage Techniques. New York: Springer, pp.227. ISBN: 978-0-387-69502-0.

Binette, O., Steorts R. (2022). “(Almost) All of Entity Resolution.” Science Advances 8 (12): https://doi.org/10.1126/sciadv.abi8021.

Metodi didattici

Lectures and tutorials in R.

Modalità di verifica e valutazione dell'apprendimento

Written exam with the use of R that covers both practical and theoretical exercises.

Paper notes and resources are allowed, while electronic and online resources are not.

Optional oral exam (with the prerequisite of a passing grade, ≥18/30, in the written part) that focuses on theoretical understanding. The oral exam grade range is ±3 with respect to the written exam grade.

Strumenti a supporto della didattica

Slides and blackboard.

Orario di ricevimento

Consulta il sito web di Edoardo Redivo