96082 - Computer Science for High Energy Physics

Academic Year 2023/2024

  • Moduli: Maximiliano Sioli (Modulo 1) Matteo Negrini (Modulo 2) Gabriele Sirri (Modulo 3) Francesco Giacomini (Modulo 4) Andrea Chierici (Modulo 5)
  • Teaching Mode: Traditional lectures (Modulo 1) Traditional lectures (Modulo 2) Traditional lectures (Modulo 3) Traditional lectures (Modulo 4) Traditional lectures (Modulo 5)
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Advanced Methods in Particle Physics (cod. 5810)

Learning outcomes

The course provides an introduction to the software and hardware used in a typical High Energy Physics experiment. At the end of the course the student will acquire an advaced knowledge in programming languages and tools for data processing. Furthermore, the student will learn the fundamental aspect of a data center dedicated to scientific computation.

Course contents

Module 1

Concept of probability: axiomatic, combinatorial, frequentist and subjective. Conditional probability. Statistical independence. Bayes' theorem.

Random variables and probability density functions. Multivariate distributions. Marginal and conditional densities. Functions of random variables. Distribution moments: expectation value, variance, covariance. Error propagation in the presence of correlated variables. Examples of probability distributions: Binomial, Multinomial, Poisson, Exponential, Normal (multivariate), Chi-square, Breit-Wigner, Landau.

Characteristic functions and their applications. Central Limit Theorem.

Statistical inference. Fisher information. Test statistics and sufficient test statistics.

Monte Carlo method: convergence criteria, law of large numbers, calculation of integrals and their uncertainties. Variance reduction. Random number generators. Sampling a generic distribution.

Generalities on statistical estimators. Test statistics and estimators. Estimators for the expectation value, variance and correlation. Variance of the estimators. The maximum likelihood method. Score and Fisher information. Multi-parametric estimator uncertainties with correlations. Extended Maximum Likelihood. Bayesian estimators, Jeffrey's priors. Least squares method.

Hypothesis testing. Simple hypotheses. Efficiency and power of the test. Neyman-Pearson lemma. Linear test, Fisher's discriminant. Multivariate methods: Neural Networks, Boosted Decision Tree, k-Nearest Neighbor. Statistical significance. P-values. Look-Elsewhere Effect. Chi-square method for hypothesis testing.

Exact methods for the construction of confidence intervals. Gauss and Poisson case. Unified approach. Bayesian method. CLs method. Systematic errors and nuisance parameters in the calculation of confidence intervals. Frequentist and Bayesian methods. Asymptotic properties.

Module 2

Exercises and complements to Module 1.

Module 3

Lab activity. Elements of ROOT. RooFit Workspace, Factory, composite models, multi-dimensional models. Use of RooStats to compute confidence intervals, Profile Likelihood, Feldman-Cousins, Bayesian intervals, w/ and w/o nuisance parameters. Use of TMVA as classifier, description of TMVAGui.

Module 4

Application of C++ programming techniques to scientific software development, including data abstraction, polymorphism, generic programming, concurrency and parallelism. Use of modern C++ to safely and efficiently exploit the memory hierarchy and the heterogeneous nature of current computer architectures. Introduction to elements of software engineering and use of effective development tools. Introduction to elements of operating systems and computer architecture.

Module 5

The module will provide basic concepts of Infrastructures for processing and for running scientific applications. In particular it will focus on the Infrastructure-as-a-Service Cloud paradigm. The course will start with an introduction to different computing facilities (from the laptop to the data center) and to Big Data, describing how they are related to scientific applications. It will continue with a description of the building blocks of modern Data Centers and how they are abstracted by the Cloud computing models.

Access to a limited set of Cloud resources and services will be granted to students in order to complete the exercises that will be given during the module. Containers and in particular Docker Containers will be introduced as for the concept of High Performance Computing (HPC). The module will end giving notions about the emerging "Fog" and "Edge" computing paradigms and how they are linked to Cloud infrastructures.

Readings/Bibliography

For Module 1:

  • Frederick James, Statistical Methods in Experimental Physics, World Scientific, 2007

For Modules 2 and 3:

  • Glen Cowan, Statistical Data Analysis, Oxford Univ. Press, 1998
  • O. Behnke et al., Data Analysis in High Energy Physics: A Practical Guide to Statistical Methods, Wiley, 2013
  • A. G. Frodesen, O. Skjeggestad, H. Toft, Probability and Statistics in Particle Physics, Universitetforlaget, 1979
  • G. D'Agostini, Bayesian reasoning in data analysis - A critical introduction, World Scientific Publishing, 2003

For Module 4:

  • B. Stroustrup, Programming: Principles and Practice Using C++, 2nd edition, Addison-Wesley
  • B. Stroustrup, The C++ Programming Language , 4th edition, Addison-Wesley
  • C++ reference [https://en.cppreference.com/w/]

For Module 5:

    • Hardware Bible, Sixth Edition
      Winn L. Rosch
      Published by Que
    • Modern Computer Architecture and Organization - Second Edition
      Jim Ledin, Dave Farley
      Packt Publishing
    • Docker deep dive
      by Nigel Poulton
      Packt publishing
    • Cloud Architecture Patterns
      by Bill Wilder
      O'reilly
    • Big Data Fundamentals: Concepts, Drivers & Techniques
      By Thomas Erl, Wajid Khattak and Paul Buhler
      Pearson

    Teaching methods

    Frontal lessons and laboratory sessions with programming and statistical tools to solve practical problems, also using resources available on Cloud.

    Assessment methods

    For the first three modules, the assessment method is a written exam (two hours long) with:

    1. a theory question
    2. an exercise
    3. a question for the Lab part, where you will be asked to comment of block of code

    For 30/30 cum laude you must have achieved 30/30 in the written exam and take an additional oral exam.

    Note that admission to the written examination will be provided to students who fulfilled and delivered compulsory laboratory exercises (even if they will not be used in the final grading evaluation).

    For modules 4 and 5, the exam consists in a project to be prepared autonomously and in an oral part.

    Teaching tools

    Lecture notes and accompanying material is made available either on Virtuale or in Git repositories.

    Office hours

    See the website of Francesco Giacomini

    See the website of Maximiliano Sioli

    See the website of Matteo Negrini

    See the website of Gabriele Sirri

    See the website of Andrea Chierici