30711 - Record Linkage

Course Unit Page

  • Teacher Daniela Cocchi

  • Credits 6

  • SSD SECS-S/01

  • Teaching Mode Traditional lectures

  • Language English

  • Campus of Bologna

  • Degree Programme First cycle degree programme (L) in Statistical Sciences (cod. 8873)

  • Teaching resources on Virtuale


This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.

Quality education Industry, innovation and infrastructure Reduced inequalities Partnerships for the goals

Academic Year 2021/2022

Learning outcomes

At the end of the course the student will know the methods for linking the information referred to the same statistical unit. This information belongs to different archives and the statistical unit is not identified by means of a code free of errors. The student will be able to use the exact matching, by means of deterministic and probabilistic record linkage and the basic tools of statistical matching.

Course contents

Improving data quality through editing, imputation and record linkage.

The conditions for using a data base for statistical purposes.

Data quality properties and how to measure it.

The question of merging lists.

Conditional independence and capture and recapture methods.

Automatic data editing and imputation.

Non random and probabilistic record linkage.

Blocking techniques.

The problem of duplication.

The problem of disclosure and access to microdata.

Examples in economics, official statistics, health statistics


N. Herzog, F. J. Scheuren, W. E. Winkler (2007) Data Quality and Record Linkage Techniques, Springer ISBN 978-0-387-69502-0

Istat (2002) Metodi statistici per il record linkage (a cura di M. Scanu)

P. Christen (2012) Data Matching. Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection, Springer ISBN 978-3-642-43001-5

Further bibliographical references will be given during the course

Teaching methods


Assessment methods

The final exam for this module of the course is a written test that contains also questions of theory.

The online test will be performed via “Esami on Line” (EOL). Zoom will be the platform for identification and monitoring.


A final overall mark for the two modules of the course will be proposed to each student, after the exams for BOTH modules (record linkage and data bases).

Teaching tools

Slides sketching the content of the lessons will be available

Office hours

See the website of Daniela Cocchi