CIS572 - Transforming Text Into Numeric Vectors

Course Description

If you want to compare two large bodies of text with each other, you can do that by making comparisons with the text itself: Turn the text into tokens then compare the overlap in tokens. Sometimes, however, you don’t just want to know that two texts are different (a binary comparison), but you want to know how different, which is a fuzzy comparison. In this course, you will transform text into numeric vectors, which allows us to perform arithmetic operations on textual information to calculate similarity. This is a classical natural language processing (NLP) technique, and it begins by creating different kinds of vectors. You will create both sparse and dense vectors, and you will compare vectors of different sizes to see how information is captured. Finally, you will measure similarity among document vectors, which is the real power of turning text into vectors. The ability to determine how similar two or more documents are is a common use of NLP, and you will practice this technique through hands-on exercises and projects.

You are required to have completed the following course or have equivalent experience before taking this course:

Natural Language Processing Fundamentals

Faculty Author

Dr. Oleg Melnikov

Benefits to the Learner

Apply models to a body of text in order to create document vectors
Create document vectors using advanced feature engineering techniques
Measure similarity among different document vectors

Target Audience

Engineers
Software developers
Computer scientists new to NLP
Data scientists
Analysts
Researchers
Linguists

Cornell Bowers College of Computing and Information Science