75194 - Data Mining M

Academic Year 2016/2017

Learning outcomes

At the end of the course the students know the principles and the main use cases of the data mining algorithms. The students are able to understand and apply a wide set of analysis algorithms to extract from large datasets useful relationships. The students can also design a process of data selection, transformation, analysis and interpretation to support strategic decisions.

Course contents

MODULE 1 (Data Mining) - Prof. Claudio Sartori

Process of knowledge discovery

  • Definition of objectives
  • Selection of data sources
  • Filtering, reconciliation and data transformation . data mining
  • Validation and presentation of the results

Data Mining techniques

  • Classification with decision trees, neural networks and other algorithms
  • Association rules
  • Clustering/segmentation

Processes and systems

  • Analysis of case studies
  • Examples with commercial data mining systems
  • Architectures of systems with data mining components
  • Standards for data mining components: PMML.

 

MODULE 2 (Big Data Techniques) - Prof. Federico Ravalti

At the end of the course the students know the principles, concepts and the main use cases of Big Data. The students are able to understand and apply new types of methodologies, technologies and architectures, in particular the Hadoop ecosystem. This is also thanks to the presentation of several real case studies.

(Big) Data Revolution

  • Data growth and Big Data hype
  • Technological enablers
  • Big Data fundamentals and definitions
  • Types of (Big) Data
  • Main concepts behind Big Data

Big Data Paradigm Shift

  • New roles and opportunities
  • Organizational models and approaches
  • Technologies proliferation
  • Methodologies

Big Data architectures

  • NoSQL
  • Hadoop
  • Hadoop Ecosystem
  • The evolving role of the Enterprise Data Warehouse
  • Data Lake
  • Geospatial and Location Intelligence systems

Case Studies

Readings/Bibliography

Education material provided by the teachers (copies of the slides used in the classroom, scientific literature).

Additional reading:

Letture integrative:

Module 1: Tan, Steinbach, Kumar, "Introduction to Data Mining", Addison-Wesley, 2005. ISBN : 0321321367

Modulo 2: Rizzi, Golfarelli, "Data Warehouse Design: Modern Principles and Methodologies", 2009. ISBN-10: 0071610391

Teaching methods

Most course lectures are in "traditional" classrooms and exploit the slides. Case studies are also proposed based on open-source software.

The students can directly arrange with each teacher a Project Activity of Data Mining (4CFU)  based on their own preferences on provided topics.

Assessment methods

The exam evaluation consists of an oral examination.

The student must answer to four questions, two for each module. For each question, the evaluation is the following: fundamental ideas on the topic: 2 points, ability to discuss the technical details, at the level discussed during classes, 0-3 points, ability to apply the concepts to examples 0-2 points.

To participate to the exam, interested students have to register themselves by exploiting the usual UniBO Web application, called AlmaEsami.

Teaching tools

In traditional classrooms, the course lectures will make extensive usage of slides. 

Laboratory activity with open-source tools.

Links to further information

http://www-db.disi.unibo.it/courses/DM

Office hours

See the website of Claudio Sartori

See the website of Federico Ravaldi