96802 - DATA MINING FOR BUSINESS AND MARKET RESEARCH

Academic Year 2022/2023

  • Docente: Ida D'Attoma
  • Credits: 10
  • SSD: SECS-S/03
  • Language: English
  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Statistics, Economics and Business (cod. 8876)

Learning outcomes

This course will present the main data mining methods used in knowledge discovery in business employing internal and external data. With an emphasis on data analysis and on the use of a software special attention will be devoted to techniques that help to single out the relationships of interdependence and patterns in business and market research phenomena. Students will learn, hands-on, how to organize and analyse market research data. In particular, at the end of the course students will be able to: - independently run a complete data mining process (from data pre-processing to the interpretation of obtained results); - choose the best suited statistical methodology for the problem at hand; - to critically interpret empirical results.

Course contents

1. INTRODUCTION: data-analytic thinking, overview of Data Mining, from business problems to Data Mining tasks, the Data Mining process.

2. DATA EXPLORATION AND PREPARATION: data objects and attributes type, data matrices and their transformations, data cleaning.

3. STATISTICAL AND DATA MINING SOFTWARE: introduction to SAS; SAS LAB tutorial on data organization  and data preprocessing using real datasets.

4. DATA REDUCTION: Principal component analysis and its variants.

5. PROXIMITY MEASURES: distance and similarity.

6. CLUSTERING: hierarchical and partitional methods.Understanding the Results of Clustering.

7. PROFILING: finding typical behavior.

8. CO-OCCURRENCES AND ASSOCIATIONS: Finding items that go toghether.

9. Data Mining SCORING. 

10. Look-alike modeling for prospecting

Readings/Bibliography

The primary text for the course is:

  • (Necessary) Tufféry, S. (2011) Data Mining and Statistics for Decision Making. John Wiley & SOns, Ltd. Chapters: 1,2,3,7,9,10,12.

    You can check its availability at: https://sol.unibo.it/SebinaOpac/query/tuffery?context=catalogo

In addition, we will use:

  • (Suggested) Foster Provost & Tom Fawcett (2013) Data Science for Business. O'Reilly Media, Inc. Chapters: 1,2,6, 12.

You can check its availability at: https://sol.unibo.it/SebinaOpac/query/data%20science%20 for%20business?bib=UBOST&context=catalogo

Additional teaching material will be made available to students using the e-learning platform https://virtuale.unibo.it/

Teaching methods

Face-to-face lectures involve the presentation of theoretical and applied contents related to the various data mining methods. After each teoretical session a practical tutorial is devoted to applications on real datasets.

Applications are introduced and replicated during the computer laboratory session using SAS statistical software.

Students are invited to solve and discuss empirical case studies. Home assignments will serve to reinforce class concepts and get familiarity with data analysis and interpretation. Home assignments will be ungraded. However, solutions (or simply a teacher feedback) will be provided for self-assessment.

In view of the type of activities and teaching methods adopted, attendance of this training activity requires prior participation of all students in Modules 1 and 2 on safety training in the workplace [https://elearning-sicurezza.unibo.it/] , in e-learning mode.

Assessment methods

Attending and non attending students will have a written examination consisting in open questions on theoretical issues (40% of final grade) and a section requiring production and/or interpretation of statistical outputs (60 % of final grade). The open questions section aims at testing the student's knowledge of the theoretical topics. In particular, the theoretical session is aimed to test students' knowledge of the main terminology and concepts associated to data mining methods used to deal with business data, the strengths and limitations of each method, as well as the data mining techniques used to analyze different type od data and business problems. The practical section is targeted at testing the ability of producing and interpreting statistical outputs, and their translation into applied conclusions in a business context. Typical exam questions will be made available during the course. All the students are given to perform tasks of the same difficulty in the same time. It is a 2-hours written exam with two open questions on theory and 2/3 practical exercises using the SAS software. Points awarded for correct answers to each question will be reported in the exam outline. The final grade is out of thirty. The exam is "closed-book". Students are not allowed to consult references and theoretical information sources while performing the task.

Evaluation judgment scale

The assessment of the mid-term and final exam will be based on the following grid:

<18 (failed)

18-23 (sufficient): sufficient preparation but relating to a limited number of the course contents;

24-27 (good): adequate preparation but with some gaps with respect to the course contents;

28-30 (very good): very in-depth knowledge of all the course contents;

30 with honors (excellent): excellent knowledge of the course contents.

Teaching tools

The UNIBO e-learning platform (VIRTUALE) will be used to share teaching materials and to assign periodical home assignments to students. The teaching material includes:

  • Lecture notes summarising theoretical topics explained in class
  • Open data and lecture notes to follow the practical sessions
  • Miscellanea: exercises, solutions to assignments, sample exams, follow-up materials
  • Software SAS on Demand for Academics (https://www.sas.com/en_us/software/on-demand-for-academics.html)

Office hours

See the website of Ida D'Attoma

SDGs

Quality education

This teaching activity contributes to the achievement of the Sustainable Development Goals of the UN 2030 Agenda.