30711 - Record Linkage

Academic Year 2018/2019

  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: First cycle degree programme (L) in Statistical Sciences (cod. 8873)

Learning outcomes

At the end of the course the student will know the methods for linking the information referred to the same statistical unit. This information belongs to different archives and the statistical unit is not identified by means of a code free of errors. The student will be able to use the exact matching, by means of deterministic and probabilistic record linkage and the basic tools of statistical matching.

Course contents

Improving data quality through editing, imputation and record linkage.

The conditions for using a data base for statistical purposes.

Data quality properties and how to measure it.

The question of merging lists.

Conditional independence and capture and recapture methods.

Automatic data editing and imputation.

Non random and probabilistic record linkage.

Blocking techniques.

The problem of duplication.

The problem of disclosure and access to microdata.

Examples in economics, official statistics, health statistics

Readings/Bibliography

N. Herzog, F. J. Scheuren, W. E. Winkler (2007) Data Quality and Record Linkage Techniques, Springer ISBN 978-0-387-69502-0

Istat (2002) Metodi statistici per il record linkage (a cura di M. Scanu)

P. Christen (2012) Data Matching. Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection, Springer ISBN 978-3-642-43001-5

Further bibliographical references will be given during the course

Teaching methods

Lectures

Assessment methods

The final exam for this part of the course occurs after the end of the course, immediately after the final test of the data bases part. The exam is a written test that contains also questions of theory. A final overall mark will be proposed .

Teaching tools

Together with lectures, some seminars held by prefessionals will be held.

Office hours

See the website of Daniela Cocchi