85173 - ANALYSIS OF CATEGORICAL DATA

Academic Year 2018/2019

  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Statistical Sciences (cod. 9222)

Learning outcomes

By the end of the course the student acquires the knowledge of descriptive and probabilistic methods for the analysis of contingency tables. The student is also able to choose the best method to perform multivariate analyses of a given categorical dataset and to interpret the obtained results.

Course contents

Introduction (2 hours)

  • Data matrices and contingency tables
  • Descriptive and inferential techniques for the analysis of contingency tables

Descriptive methods (10 hours)

  • Geometric concepts in multidimensional space
  • Matrix decompositions (spectral, singular value), low-rank matrix approximation and multidimensional analysis
  • Theory and algebra of simple correspondence analysis
  • Canonical correlation analysis of contingency tables
  • Multiple correspondence analysis

Probabilistic methods (10 hours)

  • Probability structure for contingency tables
  • Loglinear models for contingency tables
  • Model specification
  • Parameter estimation and interpretation
  • Goodness-of-fit measures
  • Model selection and comparison
  • Diagnostics for checking models

R functions and packages for the analysis of contingency tables (8 hours)

  • Syntax, usage and output of functions and packages available in the R environment for the analysis of contingency tables
  • Examples of analyses based on the use of such functions and packages

The reported number of hours is an estimate which takes account of both theoretical and practical lessons. Practical lessons will take place in a computer laboratory on a weekly basis, starting from the third week of lesson.

Readings/Bibliography

Compulsory readings

  • M. Greenacre. Theory and applications of correspondence analysis. London: Academic Press, 1984. Chapters 1-5
  • A. Agresti. Categorical data analysis, Second edition. Hoboken: John Wiley & Sons, 2002. Chapters 1-3, 8-9
  • O. Nenadic, M. Greenacre. Correspondence analysis in R, with two- and three-dimensional graphics: the ca package. Journal of Statistical Software. May 2007, Volume 20, Issue 3
  • Additional readings concerning topics not included in the recommended textbooks (to be announced during the lessons)
  • Teacher's lecture notes with the slides used by the teacher during the lessons

Teacher's lecture notes are available on the platform "Insegnamenti online - Supporto online alla didattica" (https://iol.unibo.it/ ) for all enrolled students. In order to have access to this platform, students must use their username and password. Teacher's lecture notes have been made available on the platform from the 10th of September 2018.

Since such lecture notes are simply composed of the slides used by the teacher during the lessons, the preparation of the exam cannot be based solely on them; students are supposed to prepare the exam by also using all the compulsory readings.

Teaching methods

Theoretical lessons in a lecture hall and practical lessons in a computer laboratory through the R computing package. R scrips used during the practical lessons are available in the teacher's lecture notes.

Although attending lessons is not mandatory, it is strongly recommended.

Assessment methods

The exam tests the qualifications of each student both on a theoretical and a practical level.

The exam is written, lasts two hours and takes place in a lecture hall. It is composed of four parts with open questions: some concern the theoretical aspects of the statistical methods, some other are mainly focused on the ability of using methods for data analysis and interpreting results. These latter questions require solving numerical exercises. In some cases, results obtained from the analysis of a real data set using the R packages illustrated during the practical lessons may be provided. Consulting textbooks or notes during the written exam is not allowed. A pocket calculator is necessary. The maximum mark for each exercise is 8. The overall mark of the exam is given by the sum of the marks in the four parts, which is expressed on a scale of 30.

Further useful information about the exams

  • In order to take the exam, students are required to put their names down for the exam through Almaesami platform.
  • Exams can only be taken in the official exam sessions.
  • An identity card is required to take part in the exam.

Teaching tools

Explanations are given by using slides, which are contained in the teacher's lecture notes available on the platform "Insegnamenti online - Supporto online alla didattica" (https://iol.unibo.it/ ) for all enrolled students. Such notes can be used to prepare the exam in conjunction with the recommended textbooks.

Office hours

See the website of Gabriele Soffritti