86380 - Web Scraping/Social Media Scraping in R/Python Seminar

Academic Year 2017/2018

  • Docente: Marzia Freo
  • Credits: 3
  • SSD: SECS-S/03
  • Language: English
  • Teaching Mode: Traditional lectures
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Statistics, Economics and Business (cod. 8876)

Learning outcomes

This short course on web scraping is held by dr. Jacek Lewkowicz (University of Warsaw).

At the end of the course, the student acquires the tools for extracting the data from the web. This workshop will introduce techniques for automated extraction of content. In particular, the student is able to:

  • gather and mine the information from the web/social media using the most suitable software,
  • create custom web/social media scrapers,
  • use R/Python packages for the data analysis.

Course contents

  • the structure of webpages
  • web/social media scraping with selected libraries
  • authentication
  • introduction to browser emulators
  • alternative data scraping
  • methods to prevent web scraping
  • data analysis

Readings/Bibliography

  • R. Lawson (2015). Web Scraping with Python. Packt Publishing.
  • R. Mitchell (2015). Web Scraping with Python: Collecting Data from the Modern Web. O’Reilly Media.
  • S. Munzert, Ch. Rubba, P. Meissner, D. Nyhuis (2015). Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. Wiley.

Teaching methods

Lectures and lab tutorials.

Students are invited to partecipate with their laptop.

Assessment methods

Attendance

Teaching tools

Lab tutorials & teaching notes

Office hours

See the website of Marzia Freo