87944 - Statistical Data Analysis for Nuclear and Subnuclear Physics

Academic Year 2023/2024

Learning outcomes

At the end of the course the student will be acquainted with the main statistical concepts used in physics. After a review of the fundamentals of probability theory, parametric inferential statistics will be introduced, from point estimates and confidence intervals to hypothesis testing and goodness-of-fit. Each item will be addressed both in the Bayesian and frequentist approaches. Dedicated practical sessions will allow the student to become familiar with these conceptual tools by studying applications in nuclear and subnuclear physics.

Course contents

The structure of the course is the following.

For all students:

  • Module 1, theory (lecturer M. Sioli)

Only for Applied Physics Students:

  • Module 2a, exercises and complements (lecturer C. Sala)

Only for Nuclear and Subnuclear Physics Students:

  • Module 2b, exercises and complements (lecturer M. Negrini)
  • Module 3b, laboratory (lecturer G. Sirri)

 

Program of Module 1:

Concept of probability: axiomatic, combinatorial, frequentist and subjective. Conditional probability. Statistical independence. Bayes' theorem. 


Random variables and probability density functions. Multivariate distributions. Marginal and conditional densities. Functions of random variables. Distribution moments: expectation value, variance, covariance. Error propagation in the presence of correlated variables.   Examples of probability distributions: Binomial, Multinomial, Poisson, Exponential, Normal (multivariate), Chi-square, Breit-Wigner, Landau.

Characteristic functions and their applications. Central Limit Theorem.

Statistical inference. Fisher information. Test statistics and sufficient test statistics.

Monte Carlo method: convergence criteria, law of large numbers, calculation of integrals and their uncertainties. Variance reduction. Random number generators. Sampling a generic distribution.

Generalities on statistical estimators. Test statistics and estimators. Estimators for the expectation value, variance and correlation. Variance of the estimators. The maximum likelihood method. Score and Fisher information. Multi-parametric estimator uncertainties with correlations. Extended Maximum Likelihood. Bayesian estimators, Jeffrey's priors. Least squares method.

Hypothesis testing. Simple hypotheses. Efficiency and power of the test. Neyman-Pearson lemma. Linear test, Fisher's discriminant. Multivariate methods: Neural Networks, Boosted Decision Tree, k-Nearest Neighbor. Statistical significance. P-values. Look-Elsewhere Effect. Chi-square method for hypothesis testing.

Exact methods for the construction of confidence intervals. Gauss and Poisson case. Unified approach. Bayesian method. CLs method. Systematic errors and nuisance parameters in the calculation of confidence intervals. Frequentist and Bayesian methods. Asymptotic properties.

 

Program of Module 2a:

Introduction to R and RStudio.

Generation of random variables and probability distributions. Law of large numbers. Central limit theorem.

Hypothesis testing. Student's t-test. Fisher's F-test. P-value: statistical significance and power.

Maximum Likelihood Estimation. Linear regression. Correlation. Analysis Of VAriance. Generalized linear models.

Multivariate linear regression. Multicollinearity. Lasso and Ridge penalizations.

 

Program of Module 2b:

Exercises and complements.

 

Program of Module 3b:


Lab: Elements of C++ and ROOT. RooFit Workspace, Factory, composite models, multi-dimensional models. Use of RooStats to compute confidence intervals, Profile Likelihood, Feldman-Cousins, Bayesian intervals, w/ and w/o nuisance parameters. Use of TMVA as classifier, description of TMVAGui.

Readings/Bibliography

Bibliography for Module 1:
  • Frederick James, Statistical Methods in Experimental Physics, World Scientific, 2007

Bibliography for Module 2a:

  • Data Analysis and Graphics using R -an Example-based approach." by John Maindonald and W. John Braun (Cambridge University Press, 2003)
  • An Introduction to Statistical Learning with Applications in R." by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani (Springer, 2013)

Bibliography for Module 2b and Module 3b:

  • Glen Cowan, Statistical Data Analysis, Oxford Univ. Press, 1998
  •  O. Behnke et al., Data Analysis in High Energy Physics: A Practical Guide to Statistical Methods, Wiley, 2013
  • A. G. Frodesen, O. Skjeggestad, H. Toft, Probability and Statistics in Particle Physics, Universitetforlaget, 1979
  • G. D'Agostini, Bayesian reasoning in data analysis - A critical introduction, World Scientific Publishing, 2003

Teaching methods

Frontal lessons and laboratory sessions with statistical tools to solve practical problems.

As concerns the teaching methods of this course unit, all students attending Modules 2a and 3b of the course must attend Module 1, 2 on Health and Safety online [https://www.unibo.it/en/services-and-opportunities/health-and-assistance/health-and-safety/online-course-on-health-and-safety-in-study-and-internship-areas].

Assessment methods

The assessment method is a written exam (two hours long) with:

1. a theory question

2. an exercise

3. a question for the Lab part, where you will be asked to comment of block of code

Some parts of the written exam may be different depending on the channel chosen (module 2a and modules 2b+3b).

For 30/30 cum laude you must have achieved 30/30 in the written exam and take an additional oral exam.

Note that admission to the written examination will be provided to students who fulfilled and delivered compulsory laboratory exercises (even if they will not be used in the final grading evaluation).

Teaching tools

Lecture notes are available Virtuale. In case of problems write an email to the respective lecturer.

Office hours

See the website of Maximiliano Sioli

See the website of Matteo Negrini

See the website of Gabriele Sirri