96802 - DATA MINING FOR BUSINESS AND MARKET RESEARCH

Academic Year 2025/2026

  • Docente: Ida D'Attoma
  • Credits: 10
  • SSD: SECS-S/03
  • Language: English
  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Statistics, Economics and Business (cod. 8876)

Learning outcomes

This course will present the main data mining methods used in knowledge discovery in business employing internal and external data. With an emphasis on data analysis and on the use of a software special attention will be devoted to techniques that help to single out the relationships of interdependence and patterns in business and market research phenomena. Students will learn, hands-on, how to organize and analyse market research data. In particular, at the end of the course students will be able to: - independently run a complete data mining process (from data pre-processing to the interpretation of obtained results); - choose the best suited statistical methodology for the problem at hand; - to critically interpret empirical results.

Course contents

1. INTRODUCTION: data-analytic thinking, overview of Data Mining, from business problems to Data Mining tasks, the Data Mining process; real-world business challenges.

2. DATA EXPLORATION AND PREPARATION: data objects and attributes type, data matrices and their transformations, data cleaning.

3. STATISTICAL AND DATA MINING SOFTWARE: introduction to SAS; SAS LAB tutorial on data organization  and data preprocessing using real datasets.

4. MULTIDIMENSIONAL DATA ANALYSIS & DIMENSIONALITY REDUCTION: Principal component analysis and its variants (e.g., PCA of ranks); Multiple Correspondence Analysis - categorical pattern detection. Theory and practice with SAS.

5. PROXIMITY MEASURES: distance and similarity for mixed data.

6. CLUSTERING: hierarchical, partitional and hybrid clustering. Understanding the Results of Clustering.

7. PROFILING: deriving typical behavioural segments. 

8. CO-OCCURRENCES AND ASSOCIATIONS: Finding items that go toghether. Theory and application of main association rules algorithms in SAS.

9. Data Mining SCORING: Theory and practice.

10. Causal ML and Advanced Lab:  causal inference fundamentals; application of causal ML algorithms in the context of business analytics for decision support; evaluate a marketing campaign using causal ML in SAS; targeting and interpreting causal results.

Readings/Bibliography

The primary textbooks for the course are:

  • (Required) Tufféry, S. (2011) Data Mining and Statistics for Decision Making. John Wiley & SOns, Ltd. Chapters: 1-3,7,9-10,12.

    You can check its availability at: https://sol.unibo.it/SebinaOpac/query/tuffery?context=catalogo
  • (Required) Hern´an MA, Robins JM (2020). Causal Inference:
    What If. Boca Raton: Chapman & Hall/CRC.
    This book is available online at https://www.hsph.harvard.edu/miguelhernan/
    causal-inference-book/
  • (Suggested readings

    Becker S.O., Ichino A. (2002), Estimation of average treatment effects based on propensity scores, The Stata Journal, 2(4), 358-377.

    Dehejia R.H., Wahba S. (1999), Causal effects in nonexperimental studies: reevaluating the evaluation of training programs, Journal of the America Statistical Association, 94, 1053-1062

    Dehejia R.H., Wahba S. (2002), Propensity score matching methods for nonexperimental casual studies, The Review of Economics and Statistics, 84(1), 151-161.

Additional teaching material will be made available to students using the e-learning platform https://virtuale.unibo.it/

Teaching methods

  • Lectures introducing theory & applications
  • SAS lab sessions using real market/business datasets
  • Home assignments for self-practice (solutions provided for self‑assessment)
  • In view of the type of activities and teaching methods adopted, attendance of this training activity requires prior participation of all students in Modules 1 and 2 on safety training in the workplace [https://elearning-sicurezza.unibo.it/] , in e-learning mode.

Assessment methods

Attending and non attending students will have a written examination consisting in open questions on theoretical issues (40% of final grade) and a section requiring production and/or interpretation of statistical outputs (60 % of final grade). The open questions section aims at testing the student's knowledge of the theoretical topics. In particular, the theoretical session is aimed to test students' knowledge of the main terminology and concepts associated to data mining methods used to deal with business data, the strengths and limitations of each method, as well as the data mining techniques used to analyze different type od data and business problems. The practical section is targeted at testing the ability of producing and interpreting statistical outputs, and their translation into applied conclusions in a business context. Typical exam questions will be made available during the course. All the students are given to perform tasks of the same difficulty in the same time. It is a 2-hours written exam with two open questions on theory and 2/3 practical exercises using the SAS software. Points awarded for correct answers to each question will be reported in the exam outline. The final grade is out of thirty. The exam is "closed-book". Students are not allowed to consult references and theoretical information sources while performing the task.

The student may choose to take the partial exams during the course and/or the total exam. The first partial exam covers the first 30 hours of the course ; the second partial exam covers the second part of the course and takes place during the first total exam call. The dates of the total and partial exam sessions are published in the AlmaEsami application.

Evaluation judgment scale

The assessment of the mid-term and final exam will be based on the following grid:

<18 (failed)

18-23 (sufficient): sufficient preparation but relating to a limited number of the course contents;

24-27 (good): adequate preparation but with some gaps with respect to the course contents;

28-30 (very good): very in-depth knowledge of all the course contents;

30 with honors (excellent): excellent knowledge of the course contents.

Teaching tools

The UNIBO e-learning platform (VIRTUALE) will be used to share teaching materials and to assign periodical home assignments to students. The teaching material includes:

  • Lecture notes summarising theoretical topics explained in class
  • Open data and lecture notes to follow the practical sessions
  • Miscellanea: exercises, solutions to assignments, sample exams, follow-up materials
  • Software SAS on Demand for Academics (https://www.sas.com/en_us/software/on-demand-for-academics.html)

Office hours

See the website of Ida D'Attoma

SDGs

Quality education

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.