79301 - Lab 1

Course Unit Page

Academic Year 2018/2019

Learning outcomes

By the end of the course the student will develop advanced expertise in analyzing real-world phenomena by using statistical methods. By the end of this course students will be able to: - implement appropriate advanced statistical analysis using a statistical software (SAS or R or SPSS); - interpret the output of the procedures; - critically collate results and conclusions; - present the main results and conclusions in the form of concise summaries; - work independently on practical data analysis problems.

Course contents

  • Review of popular clusterization and classification methods (Naive Bayes, K-means, Decision Trees, SVM, kNN).

  • Searching for relationships and patterns between words.

  • Visualization techniques for text Mining analysis.

  • Case studies and examples of text Mining from i. a. social media (Facebook, Twitter).

  • R software for the analysis.

Readings/Bibliography

Handouts provided by the teacher.

Suggested readings:

  • Ashish Kumar, Avinash Paul, Mastering Text Mining with R. „Packt Publishing", 2016.

  • Feldman, Ronen, and James Sanger. The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge university press, 2007.

  • Friedl, Jeffrey EF. Mastering regular expressions. " O'Reilly Media, Inc.", 2006.

  • Manning, Christopher D., and Hinrich Schütze. Foundations of statistical natural language processing. Vol. 999. Cambridge: MIT press, 1999.

  • Weiss, Sholom M., et al. Text mining: predictive methods for analyzing unstructured information. Springer Science & Business Media, 2010.

Teaching methods

Computer lab sessions.

Assessment methods

Students are requested to write a report (of approximatively 15 pages) about an applied project in text mining. The report should contain the statement of the applied problem chosen by the student, a description of the appropriate methodology and comments about the obtained results.

Teaching tools

Lab tutorials & teaching notes.

Office hours

See the website of Piotr Cwiakowski