Loading...

Course Description

If you want to compare two large bodies of text with each other, you can do that by making comparisons with the text itself: Turn the text into tokens then compare the overlap in tokens. Sometimes, however, you don’t just want to know that two texts are different (a binary comparison), but you want to know how different, which is a fuzzy comparison. In this course, you will transform text into numeric vectors, which allows us to perform arithmetic operations on textual information to calculate similarity. This is a classical natural language processing (NLP) technique, and it begins by creating different kinds of vectors. You will create both sparse and dense vectors, and you will compare vectors of different sizes to see how information is captured. Finally, you will measure similarity among document vectors, which is the real power of turning text into vectors. The ability to determine how similar two or more documents are is a common use of NLP, and you will practice this technique through hands-on exercises and projects.

You are required to have completed the following course or have equivalent experience before taking this course:

  • Natural Language Processing Fundamentals

Faculty Author

Dr. Oleg Melnikov

Benefits to the Learner

  • Apply models to a body of text in order to create document vectors
  • Create document vectors using advanced feature engineering techniques
  • Measure similarity among different document vectors

Target Audience

  • Engineers
  • Software developers
  • Computer scientists new to NLP
  • Data scientists
  • Analysts
  • Researchers
  • Linguists
Loading...
Cornell Bowers College of Computing and Information Science
Thank you for your interest in this course. Unfortunately, the course you have selected is currently not open for enrollment. Please complete a Course Inquiry so that we may promptly notify you when enrollment opens.
Required fields are indicated by .