- Docente: Ignazio Drudi
- Credits: 8
- SSD: SECS-S/03
- Language: Italian
- Teaching Mode: Traditional lectures
- Campus: Bologna
- Corso: First cycle degree programme (L) in Statistical Sciences (cod. 8873)
Learning outcomes
the course aims to introduce to the complexity of the modern organization of statistical data, in particular those available on the Internet, both in "structured" and "deconstructed" form. The main objective of the course consists in the formation of statistical skills able to combine the indispensable foundations of the measure and of the statistical methodology in the passage that leads to the thresholds of the statistical analysis of the phenomena,which consists in the evaluation, selection and synthesis of the data fields available today with relative ease. More specifically, at the end of the course the student will have the skills to:
1. Orientate, through targeted research, in the main multiple and uneven sources of data available today. Both as regards "official" sources (Istat, Eurostat, OECD, World Bank, ...); both as regards "unconventional" sources (Google statistics, FaceBook, Twitter, E bay, trip advisor, ...)
2. Have a basic knowledge of the main query language of the large databases ie the logic and syntax of the SQL language, with particular attention to the ability to address the problems related to data integration, with specific reference to the problems of matching, both "Exact" and "statistical" using the SQL language
3. Define an effective strategy for the extraction and synthesis of data from large databases (the so-called "Big Data"), using the most appropriate statistical techniques aimed at defining the most appropriate databasfor the statistical analysis of the phenomena .
Course contents
Introduction to the world of Big Data, distinction between structured and unstructured databases
Introduction to the SQL language
Concepts and basic tools of Web mining: API, scraping, etc.
Web interaction libraries available in Cran R
Introduction to the processing of textual data, reactions and social networks
Sentiment analisys, polarity and judgment of lemmas and forms of language
Introduction to the fundamentals of machine learning and to the analysis of contextual meaning
Readings/Bibliography
Handouts distributed during the course and deposited in almadl
Teaching methods
Lectures (about 30%)
Practices with Software R (about 20%)
WEB Scraping Tutorials (about 50%)
Assessment methods
Web scraping exercise, post polarity analysis, post analysis analysis, text cluster analisys
Teaching tools
classrooms equipped with wi fi and electrically wired
computer lab
Office hours
See the website of Ignazio Drudi