28177 - Statistical Models

Course Unit Page


This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.

Quality education

Academic Year 2021/2022

Learning outcomes

By the end of the course the student should know the basic theory of normal linear models and generalized linear models. In particular the student should be able: - to define a statistical model - to formulate the normal linear model, estimate its parameters and test their significance - to use the variable selection procedures - to define a generalized linear model, by combining a random component with a linear predictor with a proper link function - to estimate and test the significance of the parameter of a generalized linear model - to evaluate the goodness of fit of a model and to detect violations of model assumptions

Course contents

  • Basic framework for linear models
    • model specification and assumptions;
    • parameter estimation: least squares and maximum likelihood methods;
    • coefficient of determination R2: definition and properties;
    • finite and asymptotic properties of the estimators;
    • hypothesis testing on regression coefficients.
  • Regression diagnostics
    •  residuals: definitions and properties;
    • Influential observations and leverage points;
    • Multicollinearity
  • Model selection
    • effects of model mispecification;
    • stepwise methods;
    • best subset selection.
  • Inclusion of qualitative regressors
    • dummy variable coding;
    • interactions between regressors.
  • Some special cases
    • one-way ANOVA;
    • two-way ANOVA.
  • Generalized linear models
    • general definition: linear predictor, link function, random component;
    • maximum likelihood estimation;
    • hypothesis testing on model parameters.


Recommended readings (a detailed list of selected chapters and sections in available on IOL):

Kutner, M. H., Nachsteim, C. J., Neter, J., Li, W. (2005). Applied

Linear Statistical Models (5th edition). McGraw-Hill.

Handouts provided by the teacher.

Other readings:

Fox J. (2016). Applied Regression Analysis and Generalized Linear Models (3rd edition). Sage.

Weisberg S. (2005). Applied Linear Regression. Wiley, third edition.

Teaching methods

Class lectures

Tutorial sessions in computer lab

As concerns the teaching methods of this course unit, all students must attend Module 1, 2 on Health and Safety online

Assessment methods

The exam will test the qualifications of each student on both a theoretical and a practical level.

The exam is composed of two parts that have to be taken during the same exam sitting.

The first mandatory part is a theory exam and lasts one hour. This part focuses on the theoretical and practical properties of linear and generalized linear models andand consists in a (possibly computer-based) quiz containing both multiple-choice and open-answer questions. As far as the multiple-choice questions, correct answers are marked with 1 point, wrong answers are marked -0.20 points and missing answers 0 points. Each open-answer question receives a mark ranging between 0 and 2, depending on the correctness of the answer and the appropriateness of the terminology. The number of multiple-choice and open-answer questions may vary from sitting to sitting, holding fixed the maximum mark to 16. Consulting textbooks or personal notes during the written exam is not allowed.

As far as the first sitting is concerned, students have the option of splitting the mandatory theory exam into two partial exams. The first partial theory exam takes place after the first 5 weeks and is focused on the topics covered during the first part of the course. The second partial theory exam is scheluled after the end of the course, and covers the topics addressed during the second part of the course. Students must take both partial theory exams. In particular, in order to register for the second partial exam, a student must have taken the first partial exam.

The second mandatory part is a computer-based practical exam. and lasts one hour. The practical exam assesses the ability of a student in solving practical problems by exploiting linear and generalized linear models. This exam is composed of two exercises. Students will be asked to write an R script solve each exercise and to report results obtained using that script. Each exercise receives a mark ranging between 0 and 8, depending on the correctness of the answer and the appropriateness and correctness of the corresponding R script. Consulting textbooks or personal notes during the practical exam is allowed.

In case of access restrictions due to the COVID pandemics, Students will be allowed to take both mandatory exams online, via the esamionline platform.

The final mark is given by the sum of the marks obtained in the practical and in the written exam. Non-integer final marks are rounded down to the next small integer. Final marks larger that 30 are rounded down to 30. Final marks equal to 32 are considered 30 cum laude.

In case of failure or rejection of the overall mark, students must repeat the whole exam in one of the following sittings.

Office hours

See the website of Giuliano Galimberti