87944 - STATISTICAL DATA ANALYSIS FOR NUCLEAR AND SUBNUCLEAR PHYSICS

Course Unit Page

Academic Year 2018/2019

Learning outcomes

At the end of the course the student will be acquainted with the main statistical concepts used in physics. After a review of the fundamentals of probability theory, parametric inferential statistics will be introduced, from point estimates and confidence intervals to hypothesis testing and goodness-of-fit. Each item will be addressed both in the Bayesian and frequentist approaches. Dedicated practical sessions will allow the student to become familiar with these conceptual tools by studying applications in nuclear and subnuclear physics.

Course contents

The structure of the course is the following.

For all students:

  • Module 1, theory (lecturer M. Sioli)

Only for Applied Physics Students:

  • Module 2a, exercises and complements (lecturer C. Sala)

Only for Nuclear and Subnuclear Physics Students:

  • Module 2b, exercises and complements (lecturer T. Chiarusi)
  • Module 3b, laboratory (lecturer G. Sirri)

 

Program of Module 1:

Concept of probability: axiomatic, combinatorial, frequentist and subjective. Conditional probability. Statistical independence. Bayes' theorem. 


Random variables and probability density functions. Multivariate distributions. Marginal and conditional densities. Functions of random variables. Distribution moments: expectation value, variance, covariance. Error propagation in the presence of correlated variables.   Examples of probability distributions: Binomial, Multinomial, Poisson, Exponential, Normal (multivariate), Chi-square, Breit-Wigner, Landau.

Characteristic functions and their applications. Central Limit Theorem.

Statistical inference. Fisher information. Test statistics and sufficient test statistics.

Monte Carlo method: convergence criteria, law of large numbers, calculation of integrals and their uncertainties. Variance reduction. Random number generators. Sampling a generic distribution.

Generalities on statistical estimators. Test statistics and estimators. Estimators for the expectation value, variance and correlation. Variance of the estimators. The maximum likelihood method. Score and Fisher information. Multi-parametric estimator uncertainties with correlations. Extended Maximum Likelihood. Bayesian estimators, Jeffrey's priors. Least squares method.

Hypothesis testing. Simple hypotheses. Efficiency and power of the test. Neyman-Pearson lemma. Linear test, Fisher's discriminant. Multivariate methods: Neural Networks, Boosted Decision Tree, k-Nearest Neighbor. Statistical significance. P-values. Look-Elsewhere Effect. Chi-square method for hypothesis testing.

Exact methods for the construction of confidence intervals. Gauss and Poisson case. Unified approach. Bayesian method. CLs method. Systematic errors and nuisance parameters in the calculation of confidence intervals. Frequentist and Bayesian methods. Asymptotic properties.

 

Program of Module 2a:

Introduction to R and RStudio.

Generation of random variables and probability distributions. Law of large numbers. Central limit theorem.

Hypothesis testing. Student's t-test. Fisher's F-test. P-value: statistical significance and power.

Maximum Likelihood Estimation. Linear regression. Correlation. Analysis Of VAriance. Generalized linear models.

Multivariate linear regression. Multicollinearity. Lasso and Ridge penalizations.

 

Program of Module 2b:

Exercises and complements.

 

Program of Module 3b:


Lab: Elements of C++ and ROOT. RooFit Workspace, Factory, composite models, multi-dimensional models. Use of RooStats to compute confidence intervals, Profile Likelihood, Feldman-Cousins, Bayesian intervals, w/ and w/o nuisance parameters. Use of TMVA as classifier, description of TMVAGui.

Readings/Bibliography

Bibliography for Module 1:
  • Frederick James, Statistical Methods in Experimental Physics, World Scientific, 2007

Bibliography for Module 2a:

  • Data Analysis and Graphics using R -an Example-based approach." by John Maindonald and W. John Braun (Cambridge University Press, 2003)
  • An Introduction to Statistical Learning with Applications in R." by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani (Springer, 2013)

Bibliography for Module 2b and Module 3b:

  • Glen Cowan, Statistical Data Analysis, Oxford Univ. Press, 1998
  •  O. Behnke et al., Data Analysis in High Energy Physics: A Practical Guide to Statistical Methods, Wiley, 2013
  • A. G. Frodesen, O. Skjeggestad, H. Toft, Probability and Statistics in Particle Physics, Universitetforlaget, 1979
  • G. D'Agostini, Bayesian reasoning in data analysis - A critical introduction, World Scientific Publishing, 2003

Teaching methods

Frontal lessons and laboratory sessions with statistical tools to solve practical problems.

Assessment methods

For Applied Physics students:

Oral examination for both modules, to be given in the same joined session (the commission is formed by lecturers of Modules 1 and 2a). Alternatively, only for Module 2a, oral examination can be replaced with the fulfillment of a small project described during the lectures.

 

For Nuclear and Subnuclear Physics students:

Oral examination for the three Modules, to be given in the same joined session (the commission is formed by lecturers of Modules 1, 2b and 3b). Admission to the examination will be provided to students who fulfilled and delivered compulsory laboratory exercises (even if they will not be used in the final grading evaluation). During the examination, the assessment will be carried out asking questions on general theory items, evaluating problem solving attitude and the knowledge of software tools described during the lectures.

Teaching tools

Lecture notes are available in Insegnamenti OnLine. In case of problems write an email to the respective lecturer.

Office hours

See the website of Maximiliano Sioli

See the website of Tommaso Chiarusi

See the website of Gabriele Sirri