28177 - Statistical Models

Course Unit Page

Academic Year 2019/2020

Learning outcomes

By the end of the course the student should know the basic theory of normal linear models and generalized linear models. In particular the student should be able: - to define a statistical model - to formulate the normal linear model, estimate its parameters and test their significance - to use the variable selection procedures - to define a generalized linear model, by combining a random component with a linear predictor with a proper link function - to estimate and test the significance of the parameter of a generalized linear model - to evaluate the goodness of fit of a model and to detect violations of model assumptions

Course contents

  • Basic framework for linear models
    • model specification and assumptions;
    • parameter estimation: least squares and maximum likelihood methods;
    • coefficient of determination R2: definition and properties;
    • finite and asymptotic properties of the estimators;
    • hypothesis testing on regression coefficients.
  • Regression diagnostics
    •  residuals: definitions and properties;
    • Influential observations and leverage points;
    • Multicollinearity
  • Model selection
    • effects of model mispecification;
    • stepwise methods;
    • best subset selection.
  • Inclusion of qualitative regressors
    • dummy variable coding;
    • interactions between regressors.
  • Some special cases
    • one-way ANOVA;
    • two-way ANOVA.
  • Generalized linear models
    • general definition: linear predictor, link function, random component;
    • maximum likelihood estimation;
    • hypothesis testing on model parameters.

Readings/Bibliography

Kutner, M. H., Nachsteim, C. J., Neter, J., Li, W. (2005). Applied

Linear Statistical Models (5th edition). McGraw-Hill

Fox J. (2016). Applied Regression Analysis and Generalized Linear Models (3rd edition). Sage.

Weisberg S. (2005). Applied Linear Regression. Wiley, third edition.

Handshouts.

Teaching methods

Class lectures

Tutorial sessions in computer laboratory

Assessment methods

The exam will test the qualifications of each student on both a theoretical and a practical level.

The exam is composed of three parts: the first two are mandatory, the third one is optional. The three parts have to be taken during the same exam sitting.

The first mandatory part is a written exam. It lasts one hour. This part focuses on the theoretical properties of linear and generalized linear models and contains both multiple-choice and open-answer questions. Consulting textbooks or personal notes during the written exam is not allowed. The evaluation of the written exam is given in marks out of 16.

As far as the first sitting is concerned, students have the option of splitting the mandatory written exam into two partial exams. The first partial written exam takes place after the first 5 weeks and is focused on the topics covered during the first part of the course. The second partial written exam is scheluled after the end of the course, and covers the topics addressed during the second part of the course. Students must take both partial written exams. In particular, in order to register for the second partial exam, a student must have taken the first partial exam.

 

The second mandatory part is a practical exam. It lasts one hour and takes place after the written exam in a computer laboratory. The practical exam assesses the ability of a student in solving practical problems by exploiting linear and generalized linear models. Students will be asked to write an R script and to answer questions by reporting results obtained using that script. Consulting textbooks or personal notes during the practical exam is allowed. The evaluation of the practical exam is given in marks out of 16.

The final mark of the mandatory parts is given by the sum of the marks obtained in the practical and in the written exam. Non-integer final marks are rounded down to the next small integer. Final marks larger that 30 are rounded down to 30. Final marks equal to 32 are considered 30 cum laude.

The optional part is an oral exam. Only students with a final mark for the mandatory part equal or larger than 18 can take this optional part. The oral exam consists of additional questions concerning the theoretical properties of linear and generalized linear models. The evaluation of the oral exam is given in marks out of 30.

The overall mark is given by:

  • The final mark for the mandatory parts, if a student does not take the oral exam;
  • The average of the marks of the mandatory and of the optional parts, if a student takes also the oral exam.

In case of failure or rejection of the overall mark, students must repeat the whole exam in one of the following sittings.

Office hours

See the website of Giuliano Galimberti