75194 - Data Mining M

Academic Year 2017/2018

Learning outcomes

At the end of the course the students know the principles and the main use cases of the data mining algorithms. The students are able to understand and apply a wide set of analysis algorithms to extract from large datasets useful relationships. The students can also design a process of data selection, transformation, analysis and interpretation to support strategic decisions.

Course contents

MODULE 1 (Data Mining) - Prof. Claudio Sartori

Process of knowledge discovery

  • Definition of objectives
  • Selection of data sources
  • Filtering, reconciliation and data transformation
  • Data mining
  • Validation and presentation of the results

Data Mining techniques

  • Classification with decision trees, neural networks and other algorithms
  • Association rules
  • Clustering/segmentation

Processes and systems

  • Analysis of case studies
  • Examples with commercial data mining systems
  • Architectures of systems with data mining components
  • Standards for data mining components: PMML.

 

MODULE 2 (Big Data Techniques) - Prof. Federico Ravalti

At the end of the course the students know the principles, concepts and the main use cases of Big Data. The students are able to understand and apply new types of methodologies, technologies and architectures, in particular the Hadoop ecosystem. This is also thanks to the presentation of several real case studies.

(Big) Data Revolution

  • Data growth and Big Data hype
  • Technological enablers
  • Big Data fundamentals and definitions
  • Types of (Big) Data
  • Main concepts behind Big Data

Big Data Paradigm Shift

  • New roles and opportunities
  • Organizational models and approaches
  • Technologies proliferation
  • Methodologies

Big Data architectures

  • NoSQL
  • Hadoop
  • Hadoop Ecosystem
  • The evolving role of the Enterprise Data Warehouse
  • Data Lake
  • Geospatial and Location Intelligence systems

Case Studies

Readings/Bibliography

Module 1

Tan, Steinbach, Kumar, "Introduction to Data Mining", Addison-Wesley, 2005. ISBN : 0321321367

or

Witten, Frank, Hall, "Data Mining: Practical Machine Learning Tools and Techniques", Morgan-Kaufmann, ISBN: 0123748569 (3rd edition), 2016 ISBN: 0128042915 (4th edition)

The list of the relevant parts of the textbook will be made available at the beginning of the course

The copies of the slides used in the classroom will be made available as an additional reference.

Module 2

Rizzi, Golfarelli, "Data Warehouse Design: Modern Principles and Methodologies", 2009. ISBN-10: 0071610391

The copies of the slides used in the classroom will be made available as an additional reference.


Teaching methods

Most course lectures are in "traditional" classrooms and exploit the slides. Case studies are also proposed based on open-source software.

The students can directly arrange with each teacher a Project Activity of Data Mining (4CFU)  based on their own preferences on provided topics.

Assessment methods

The exam evaluation consists of an oral examination.

The student must answer to four questions, two for each module. For each question, the evaluation is the following: fundamental ideas on the topic: 2 points, ability to discuss the technical details, at the level discussed during classes, 0-3 points, ability to apply the concepts to examples 0-2 points.

To participate to the exam, interested students have to register themselves with the usual UniBO Web application, called AlmaEsami.

Teaching tools

In traditional classrooms, the course lectures will make extensive usage of slides. 

For the exercises in classroom the students can bring their laptop, the teacher will provide the information for the installation of the relevant software.

Office hours

See the website of Claudio Sartori

See the website of Federico Ravaldi