90909 - Workshop 2 (WS7)

Academic Year 2021/2022

  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Politics Administration and Organization (cod. 9085)

    Also valid for Second cycle degree programme (LM) in International Relations (cod. 9084)

Learning outcomes

The course aims at introducing students to the utilization of one of the main statistical packages for data analysis and presenting to them the basic elements of micro-data management and analysis. By the end of the course students will be familiar with the statistical package interface and be able to: load different types of data and different file formats into the statistical package, perform basic data management operations and conduct monovariate and multivariate statistical analyses using the software introduced during the course.

Course contents

BIG DATA TECHNIQUES WITH R - part II

This workshop covers the machine learning techniques for classification and clustering, with a special focus on their applications for Text Mining.

Topics will be introduced theoretically but also verified in R-based softwares during the laboratory hours.

More in details, the course contents are:

  • Algorithms for classification (kNN, SVM, logistic regression);
  • Algorithms for clustering (k-means, mean-shift clustering, hierarchical clustering);
  • Techniques for pre-processing on textual data;
  • Techniques and algorithms for Text Mining;
  • Presentation of case studies and applications of Text Mining.

Readings/Bibliography

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112, p. 18). New York: springer.

Slides by the teacher

Teaching methods

Frontal (if possible) lessons

Assessment methods

Evaluation of a final project

Teaching tools

Slides by the teacher

Office hours

See the website of Elena Morotti