CIS573 - Classifying Documents With Supervised Machine Learning

Course Description

In this course, you will start to use machine learning methods to further your exploration of document term matrices (DTM). You will use a DTM to create train and test sets with the scikit-learn package in Python — an important first step in categorizing different documents. You will also examine different models, determining how to select the most appropriate model for your particular natural language processing task. Finally, after you have chosen a model, trained it, and tested it, you will work with several evaluation metrics to measure how well your model performed. The technical skills and evaluation processes you study in the course will provide valuable experience for the workplace and beyond.

You are required to have completed the following courses or have equivalent experience before taking this course:

Natural Language Processing Fundamentals
Transforming Text Into Numeric Vectors

Faculty Author

Dr. Oleg Melnikov

Benefits to the Learner

Create train and test sets from document term matrices
Train classification models to categorize documents
Evaluate the model on the test set to measure how well it generalizes

Target Audience

Engineers
Software developers
Computer scientists new to NLP
Data scientists
Analysts
Researchers
Linguists

Cornell Bowers College of Computing and Information Science