96542 - Statistical Methods and Data Mining

Academic Year 2022/2023

  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: First cycle degree programme (L) in Marketing and Economics of the Agro-Industrial System (cod. 5833)

Learning outcomes

The course aims to introduce the methods of descriptive and inferential statistics and the methods and models for extracting relevant information from large amounts of data. At the end of the course, the student knows: the notion of descriptive statistics; the basics of statistical inference and the tests of significance used in analysis of variance and regression; the database management system (Data Base Management System); the main Data Mining techniques and the application of methods and models to extract relevant information from large amounts of data; the main IT tools for statistical analysis and data mining. Furthermore, he is able to critically analyze the main statistical sources (national, european and global) and the related structural analyzes of the agricultural and agri-food sector, as well as build data visualization tools.

Course contents

1. Introduction to Statistics and Data Mining (total teaching unit: 2 hours)

 

2. Descriptive statistics (total teaching unit: 10 hours)

2.1 Univariate statistical analysis

2.1.1 Numerical and graphical representation of distributions

2.1.2 Average values, measures of variability and concentration

2.1.3 Density curves, normal and standard normal distributions

2.2 Bivariate statistical analysis

2.2.1 Double entry tables

2.2.2 Scatter plots, correlation and simple linear regression (notes on multiple regression)

 

3. From data analysis to statistical inference (total teaching unit: 10 hours)

3.1 Probability and sampling

3.1.1 General rules of probability and random variables

3.1.2 Sampling and central limit theorem

3.1.3 Confidence intervals

3.1.4 Test of significance

3.2 Inference on variables

3.2.1 Inference for the mean and for the proportion of a population

3.2.2 Comparison of two averages

3.3 Inference about relationships

3.3.1 The chi-square test

3.3.2 Comparison of more than two means: one-way analysis of variance

3.3.3 Regression inferences

 

4. Data Mining (total teaching unit: 18 hours)

4.1 Introduction to the Data Base

4.2 Introduction to Data Mining

4.3 Regression and Classification

4.4 Resampling

4.5 Association, principal component analysis, clustering

5. Application practical exercises (total unit 20 hours)

5.1 Database management

5.2 Statistical analysis with spreadsheet

5.3 Data Mining Applications

5.4 Creation of questionnaire and analysis

Readings/Bibliography

Fulvia Mecatti, “Statistica di base. Come, quando, perché.", McGrawHill

Lecture notes and material made available by the teacher.

Specific educational material, in electronic format.

Teaching methods

The course is divided into 4 teaching units, each of which includes:

- a theoretical component;

- a practical/application component.

For each single topic covered, the teacher, after having represented the theoretical component, introduces the application methods of the analysis tools.

The student participates in the lesson with the support of a laptop.

Assessment methods

The verification of learning takes place in two ways:

1) the first consists of a practical technical test aimed at verifying the level of knowledge acquired. This method is reserved for students who work in structured work groups during the lessons. The work consists in carrying out a business case through the application of quantitative analyses.

2) the second takes place through an individual written test through the execution of a multiple choice test and exercises.

Both methods tend to ascertain not only the skills acquired in the teaching units, but also the achievement of an organic vision of the topics developed in the lessons.

Teaching tools

1) MS Office suite made available by the University.

2) MS Forms: software used to create the questionnaires;

3) Tutorials made by the teacher;

4) ISTAT, EUROSTAT and FAOSTAT databases.

Office hours

See the website of Giuseppe Palladino