Academic Year 2018/2019
- Docente: Piotr Cwiakowski
- Credits: 3
- Language: English
- Teaching Mode: Traditional lectures
- Campus: Bologna
- Corso: First cycle degree programme (L) in Statistical Sciences (cod. 8873)
Learning outcomes
By the end of the course the student will develop advanced expertise in analyzing real-world phenomena by using statistical methods. By the end of this course students will be able to: - implement appropriate advanced statistical analysis using a statistical software (SAS or R or SPSS); - interpret the output of the procedures; - critically collate results and conclusions; - present the main results and conclusions in the form of concise summaries; - work independently on practical data analysis problems.
Course contents
-
Review of popular clusterization and classification methods (Naive Bayes, K-means, Decision Trees, SVM, kNN).
-
Searching for relationships and patterns between words.
-
Visualization techniques for text Mining analysis.
-
Case studies and examples of text Mining from i. a. social media (Facebook, Twitter).
-
R software for the analysis.
Readings/Bibliography
Handouts provided by the teacher.
Suggested readings:
-
Ashish Kumar, Avinash Paul, Mastering Text Mining with R. „Packt Publishing", 2016.
-
Feldman, Ronen, and James Sanger. The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge university press, 2007.
-
Friedl, Jeffrey EF. Mastering regular expressions. " O'Reilly Media, Inc.", 2006.
-
Manning, Christopher D., and Hinrich Schütze. Foundations of statistical natural language processing. Vol. 999. Cambridge: MIT press, 1999.
-
Weiss, Sholom M., et al. Text mining: predictive methods for analyzing unstructured information. Springer Science & Business Media, 2010.
Teaching methods
Computer lab sessions.
Assessment methods
Students are requested to write a report (of approximatively 15 pages) about an applied project in text mining. The report should contain the statement of the applied problem chosen by the student, a description of the appropriate methodology and comments about the obtained results.
Teaching tools
Lab tutorials & teaching notes.
Office hours
See the website of Piotr Cwiakowski