30711 - RECORD LINKAGE

Anno Accademico 2022/2023

  • Docente: Riccardo D'Alberto
  • Crediti formativi: 6
  • SSD: SECS-S/01
  • Lingua di insegnamento: Inglese
  • Moduli: Riccardo D'Alberto (Modulo 1) Riccardo D'Alberto (Modulo 2)
  • Modalità didattica: Convenzionale - Lezioni in presenza Convenzionale - Lezioni in presenza (Modulo 1) Convenzionale - Lezioni in presenza (Modulo 2)
  • Campus: Bologna
  • Corso: Laurea in Scienze statistiche (cod. 8873)

Conoscenze e abilità da conseguire

At the end of the course the student will know the methods for linking the information referred to the same statistical unit. This information belongs to different archives and the statistical unit is not identified by means of a code free of errors. The student will be able to use the exact matching, by means of deterministic and probabilistic record linkage and the basic tools of statistical matching.

Contenuti

- The conditions for using a data base for statistical purposes.

- Data quality properties and how to measure it.

- Improving data quality through editing, imputation, and record linkage.

- The question of merging lists.

- The problem of duplication.

- Conditional independence and statistical matching techniques.

- Automatic data editing and imputation.

- Non random and probabilistic record linkage.

- Blocking techniques.

- The problem of disclosure and access to microdata.

- Examples in economics, health statistics, and Official Statistics.

- Examples with the use of the software R.

Testi/Bibliografia

Christen, P. (2012). Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Berlin: Springer, pp.270. ISBN: 978-3-642-43001-5.

D'Orazio, M., Di Zio, M., Scanu, M. (2006). Statistical Matching: Theory and Practice. Chichester: Wiley & Sons, pp.272. ISBN: 978-0-470-02353-2.

Herzog, N., Scheuren, F.J., Winkler, W.E. (2007). Data Quality and Record Linkage Techniques. New York: Springer, pp.227. ISBN: 978-0-387-69502-0. 

Zhang, L.-C., Chambers, R.L. (eds.) (2019). Analysis of Integrated Data. Boca Raton: Chapman & Hall/CRC Press, pp.256. ISBN: 978-1-4987-2798-3.

Further bibliographical references, papers, technical reports, R scripts and data sets will be given during the course.

Metodi didattici

Lectures and practical exercises with the software R.

Modalità di verifica e valutazione dell'apprendimento

The final exam for this module of the course consists of a written essay AND an oral exam.

The written essay (through the "take-home" modality) will be based on the case study, data set(s) and/or the scientific articles proposed by the student by the end of the lectures period and approved by the teacher. The written essay must be sent to the teacher, at latest, 5 days before the oral exam. This essay will be discussed during the oral exam that, in addition, will consider the theoretical and practical arguments treated during the lectures.

A final overall mark for the two modules of the course will be proposed to each student, after that the exams for BOTH modules have been taken.

Further insights on the specificities of the written essay and the "to do" work will be given during the course.

Strumenti a supporto della didattica

Slides sketching the content of the lessons will be available, as well as additional materials (e.g., scientific articles, technical reports, data sets, R scripts, etc.) through Virtuale.

Orario di ricevimento

Consulta il sito web di Riccardo D'Alberto

SDGs

Istruzione di qualità Imprese innovazione e infrastrutture Ridurre le disuguaglianze Partnership per gli obiettivi

L'insegnamento contribuisce al perseguimento degli Obiettivi di Sviluppo Sostenibile dell'Agenda 2030 dell'ONU.