- Docente: Marzia Freo
- Credits: 3
- SSD: SECS-S/03
- Language: English
- Teaching Mode: Traditional lectures
- Campus: Bologna
- Corso: Second cycle degree programme (LM) in Statistics, Economics and Business (cod. 8876)
Learning outcomes
This short course on web scraping is held by dr. Jacek Lewkowicz (University of Warsaw).
At the end of the course, the student acquires the tools for extracting the data from the web. This workshop will introduce techniques for automated extraction of content. In particular, the student is able to:
- gather and mine the information from the web/social media using the most suitable software,
- create custom web/social media scrapers,
- use R/Python packages for the data analysis.
Course contents
- the structure of webpages
- web/social media scraping with selected libraries
- authentication
- introduction to browser emulators
- alternative data scraping
- methods to prevent web scraping
- data analysis
Readings/Bibliography
- R. Lawson (2015). Web Scraping with Python. Packt Publishing.
- R. Mitchell (2015). Web Scraping with Python: Collecting Data from the Modern Web. O’Reilly Media.
- S. Munzert, Ch. Rubba, P. Meissner, D. Nyhuis (2015). Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. Wiley.
Teaching methods
Lectures and lab tutorials.
Students are invited to partecipate with their laptop.
Assessment methods
Attendance
Teaching tools
Lab tutorials & teaching notes
Office hours
See the website of Marzia Freo