- Docente: Romain Jacques Madar
- Credits: 6
- SSD: FIS/01
- Language: English
- Teaching Mode: Traditional lectures
- Campus: Bologna
- Corso: Second cycle degree programme (LM) in Advanced Methods in Particle Physics (cod. 5810)
Learning outcomes
The students will acquire extended knowledge about the python language and computing tools to deal with and manipulate mass data. The programming course brings to the students the pre-requisites for advanced applications in the machine learning module. The student will be able to write programs to solve simple problems using the methodologies treated in the lectures.
Course contents
Place of teaching: Université Clermont Auvergne, Clermont-Ferrand
MODULE 1
This course introduces basics of statistics and modern methodologies and algorithms to solve complex problems in data analysis with Artificial intelligence and machine learning (ML). The first part of the lecture covers samples (description and definition of basic quantities: size, dimension, iid, empirical quantities: sample mean, sample variance, quantiles, propagation of uncertainties, binned samples: definition, law of probability), statistical models (definition, ingredients of statistical models: observables, parameters of interest, nuisance parameters, dependent and independent variables, likelihood function and extended likelihood function, composite statistical models, introduction to the treatment of nuisance parameters), inference (introduction to the inference problem, introduction to the frequentist and the Bayesian approaches), and parameter estimation (definition of estimator, properties of estimators: consistency, bias, efficiency, methods for estimating parameters: maximum likelihood, least squares, Bayesian inference). The second part covers basic concepts of machine learning (introduction to ML, deep learning and representation learning, training and testing, cross validation, bias-variance decomposition, curse of dimensionality), regression with linear models (simple exemple: polynomial curve fitting, linear basis function models, regularization, likelihood and regression), and classification (linear models for classification, perceptron algorithm, linear discriminant analysis, logistic regression, Artificial Neural Networks, popular NN algorithms).
MODULE 2
The programming part of the lecture covers a practical introduction (object, collections, functions, loops and few pythonics syntax, basic file manipulation), Numpy introduction (numpy arrays vs python list, vectorization, (fancy) indexing, broadcasting), Data analysis python ecosystem (overview, data representation: matplotlib, import/manipulate data: pandas, mathematics, physics and engineering: scipy), and basics of image processing (loading/plotting, colors, grey scale, image filters: kernel, blocks, sliding windows). The second part of the lecture is about manipulation of data, so-called data mining and includes data preprocessing (data visualization, data cleaning, data space transformation), clustering (hierarchical clustering, partitional clustering), association rules, feature reduction (feature extraction, feature reduction) and hands-on sessions.
Readings/Bibliography
Scientific literature and specific publications are distributed during the class.
Teaching methods
MODULE 1
Lecture (50%) and problem-based teaching (50%).
MODULE 2
Lecture (70%) and problem-based teaching (30%).
Assessment methods
Examination: Oral or written examination.
Graded modules
Teaching tools
Classrooms equipped with computers are used for the hands-on sessions. Python, numpy, Scikit softwares and libraries are used throughout the four elements of the courses.
Office hours
See the website of Romain Jacques Madar