Loading...

Course Description

In this course, you will start to use machine learning methods to further your exploration of document term matrices (DTM). You will use a DTM to create train and test sets with the scikit-learn package in Python — an important first step in categorizing different documents. You will also examine different models, determining how to select the most appropriate model for your particular natural language processing task. Finally, after you have chosen a model, trained it, and tested it, you will work with several evaluation metrics to measure how well your model performed. The technical skills and evaluation processes you study in the course will provide valuable experience for the workplace and beyond.

You are required to have completed the following courses or have equivalent experience before taking this course:

  • Natural Language Processing Fundamentals
  • Transforming Text Into Numeric Vectors

Faculty Author

Dr. Oleg Melnikov

Benefits to the Learner

  • Create train and test sets from document term matrices
  • Train classification models to categorize documents
  • Evaluate the model on the test set to measure how well it generalizes

Target Audience

  • Engineers
  • Software developers
  • Computer scientists new to NLP
  • Data scientists
  • Analysts
  • Researchers
  • Linguists
Loading...
Cornell Bowers College of Computing and Information Science
Thank you for your interest in this course. Unfortunately, the course you have selected is currently not open for enrollment. Please complete a Course Inquiry so that we may promptly notify you when enrollment opens.
Required fields are indicated by .