79201 - Statistical Use of Online Economic Databases

Academic Year 2019/2020

  • Docente: Ignazio Drudi
  • Credits: 8
  • SSD: SECS-S/03
  • Language: Italian
  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: First cycle degree programme (L) in Statistical Sciences (cod. 8873)

Learning outcomes

the course aims to introduce to the complexity of the modern organization of statistical data, in particular those available on the Internet, both in "structured" and "deconstructed" form. The main objective of the course consists in the formation of statistical skills able to combine the indispensable foundations of the measure and of the statistical methodology in the passage that leads to the thresholds of the statistical analysis of the phenomena,which consists in the evaluation, selection and synthesis of the data fields available today with relative ease. More specifically, at the end of the course the student will have the skills to:

1. Orientate, through targeted research, in the main multiple and uneven sources of data available today. Both as regards "official" sources (Istat, Eurostat, OECD, World Bank, ...); both as regards "unconventional" sources (Google statistics, FaceBook, Twitter, E bay, trip advisor, ...)

2. Have a basic knowledge of the main query language of the large databases ie the logic and syntax of the SQL language, with particular attention to the ability to address the problems related to data integration, with specific reference to the problems of matching, both "Exact" and "statistical" using the SQL language

3. Define an effective strategy for the extraction and synthesis of data from large databases (the so-called "Big Data"), using the most appropriate statistical techniques aimed at defining the most appropriate databasfor the statistical analysis of the phenomena .

Course contents

Introduction to the world of Big Data, distinction between structured and unstructured databases

Introduction to the SQL language

Concepts and basic tools of Web mining: API, scraping, etc.

Web interaction libraries available in Cran R

Introduction to the processing of textual data, reactions and social networks

Sentiment analisys, polarity and judgment of lemmas and forms of language

Introduction to the fundamentals of machine learning and to the analysis of contextual meaning

Readings/Bibliography

Handouts distributed during the course and deposited in almadl

Teaching methods

Lectures (about 30%)

Practices with Software R (about 20%)

WEB Scraping Tutorials (about 50%)

Assessment methods

Web scraping exercise, post polarity analysis, post analysis analysis, text cluster analisys

Teaching tools

classrooms equipped with wi fi and electrically wired

computer lab

Office hours

See the website of Ignazio Drudi