- Docente: Piotr Cwiakowski
- Crediti formativi: 3
- Lingua di insegnamento: Inglese
- Modalità didattica: In presenza e a distanza - Blended Learning
- Campus: Bologna
- Corso: Laurea in Scienze statistiche (cod. 8873)
-
dal 04/12/2024 al 12/12/2024
Conoscenze e abilità da conseguire
By the end of the course the student will develop advanced expertise in analyzing real-world phenomena by using statistical methods. By the end of this course students will be able to: - implement appropriate advanced statistical analysis using a statistical software (SAS or R or SPSS); - interpret the output of the procedures; - critically collate results and conclusions; - present the main results and conclusions in the form of concise summaries; - work independently on practical data analysis problems.
Contenuti
- Text cleaning and text standardization (i. a. stemming, lemmatization, stopwords)
- Creating Document Term Matrix with different weights
- Data wrangling in text mining.
- Searching for relationships and patterns between words.
- Visualization techniques for text mining analysis.
- Unsupervised machine learning methods for text analysis (clustering, sentiment analysis, dimensional reduction)
- Supervised machine learning methods and simple feature engineering of text data (Naive Bayes, KNN, Decision Trees, SVM, Random forest).
- R software and R infrastructure for the text mining analysis and machine learning (packages: tm, tidytext, quanteda, caret, mlr).
Testi/Bibliografia
- Ashish Kumar, Avinash Paul, Mastering Text Mining with R.„Packt Publishing", 2016.
- Bird, Steven, Ewan Klein, and Edward Loper. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.", 2009.
- Feldman, Ronen, and James Sanger. The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge university press, 2007.
- Friedl, Jeffrey EF. Mastering regular expressions. " O'Reilly Media, Inc.", 2006.
- Kumar, Ashish, and Avinash Paul. Mastering Text Mining with R. Packt Publishing Ltd, 2016.
- Kwartler, Ted. Text mining in practice with R. John Wiley & Sons, 1991.
- Manning, Christopher D., and Hinrich Schütze. Foundations of statistical natural language processing. Vol. 999. Cambridge: MIT press, 1999.
- Meyer, David, Kurt Hornik, and Ingo Feinerer. "Text mining infrastructure in R." Journal of statistical software 25.5 (2008): 1-54.
- Silge, Julia, and David Robinson. Text mining with R: A tidy approach. " O'Reilly Media, Inc.", 2017.
- Weiss, Sholom M., et al. Text mining: predictive methods for analyzing unstructured information. Springer Science & Business Media, 2010.
Metodi didattici
Lectures and lab tutorials
Modalità di verifica e valutazione dell'apprendimento
Attendance, take-home project.
Strumenti a supporto della didattica
Lab tutorials & teaching notes
Orario di ricevimento
Consulta il sito web di Piotr Cwiakowski