- Docente: Gabriele Soffritti
- Credits: 10
- SSD: SECS-S/01
- Language: Italian
- Teaching Mode: In-person learning (entirely or partially)
- Campus: Bologna
- Corso: Second cycle degree programme (LM) in Statistical Sciences (cod. 8875)
Learning outcomes
The main goal is to provide students with the knowledge of the main descriptive and probabilistic methods for the analysis of contingency tables (simple and multiple correspondence analysis, log-linear models, latent variable models). Students will be also able to choose the best method to perform multivariate analyses of a given categorical dataset and to interpret the obtained results.
Course contents
- Descriptive and inferential techniques for contingency tables.
- Log-linear models for two-way and multi-way contingency tables: model specification, parameter estimation and goodness-of-fit measures.
- Linear latent variable models for categorical data: an introduction. Latent class models: model specification, parameter estimation and goodness-of-fit measures. Logit-normit and normit-normit latent trait models: model specification, parameter estimation and goodness-of-fit measures.
- Singular value decomposition. Eckart-Young theorem.
- Simple and multiple correspondence analysis.
- R packages for categorical data analysis.
Readings/Bibliography
M. Greenacre. Theory and applications of correspondence analysis. London: Academic Press, 1984. Capitoli 1-5.
A. Agresti. Categorical data analysis, Second edition. Hoboken: John Wiley & Sons, 2002. Capitoli 1-3, 8-9.
D. J. Bartholomew, M. Knott, I. Moustaki. Latent variable models and factor analysis: a unified approach, Third edition. Chichester, UK: Wiley, 2011. Capitoli 1-2, 4-6.
O. Nenadic, M. Greenacre. Correspondence analysis in R, with two- and three-dimensional graphics: the ca package. Journal of Statistical Software. May 2007, Volume 20, Issue 3.
D. A. Linzer, J. B. Lewis. poLCA: an R package for polytomous variable latent class analysis. Journal of Statistical Software. June 2011, Volume 42, Issue 10.
D. Rizopoulos. ltm: an R package for latent variable modeling
and item response theory analyses. Journal of Statistical Software. November 2006, Volume 17, Issue 5.
Teaching methods
Theoretical lessons in a lecturehall and practical lessons in a computer laboratory. Datasets and R scrips used during the practical lessons are made available on the lab net at the beginning of each lesson.
Assessment methods
The exam will test the qualifications of each student both on a theoretical and a practical level.
The exam is composed of two parts: the first is mandatory, the second is optional.
The mandatory part is written. It lasts two hours and takes place in a room. Some questions concern the theoretical aspects of the statistical methods, other questions are mainly focused on the ability of using methods for data analysis and interpreting results. These latter questions require solving numerical exercises. In some cases, results obtained from the analysis of a real data set using the R packages illustrated during the practical lessons may be provided. Consulting textbooks or notes during the written exam is not allowed. A pocket calculator is necessary.
After the written exam each student is assigned a note on a scale of 30. If the note is at least 20/30, students may ask to take the second part of the exam. The optional part is oral and consists of an additional question concerning the theoretical aspects of the statistical methods. After this oral exam, students are assigned a second note, that is a score between -2 and +2. The overall note is given by the sum of the two notes.
Teaching tools
Explanations are given by using slides that can be found on the AMS Campus website, where examples of written tests are also available. They should be used to prepare the exam in conjunction with the explanations provided in the recommended textbooks.
Office hours
See the website of Gabriele Soffritti