Understanding Vector Space Models

4 minute read

You shall know a word by the company it keeps. - John Firth

We will delve into the concept of Vector Spaces by developing an intuition behind this idea through some simple examples. There are many applications and algorithms that make use of Vector Space models, but this post focuses on the general idea behind vector spaces.

Beginner’s guide to understanding Vector Space Models

So suppose you have two questions. The first one is, where are you heading? And the second one is, where are you from? .These sentences have identical words, except for the last ones. However, they both have a different meaning.

On the other hand, say you have two more questions whose words are completely different but both sentences mean the same thing.

Vector space models will help you identify whether the first pair of questions (or the second pair) are similar in meaning even if they do not share the same words.

They can be used to identify similarity for a question answering, paraphrasing, and summarization.

Vector space models will also allow you to capture dependencies between words.

Example 1: Consider this sentence. You eat cereal from a bowl. Here, you can see that the word cereal and the word bowl are related.

Word relations

Example 2: Now let’s look at this other sentence. You buy something and someone else sells it. So what it’s saying is that someone sells something because someone else buys it. The second half of the sentence is dependent on the first half. With vector space models, you will be able to capture this and many other types of relationships among different sets of words.

Phrase relations

Vector space models are used in information extraction to answer questions in the style of who, what, where, how, and etc., in machine translation and in chatbots programming.

Essence of Vector Space Models: When using vector space models, the way these representations are made is by identifying the context around each word in the text, and this captures the relative meaning.

To recap, vector space models allow you to represent words and documents as vectors. This captures the relative meaning.

Co-occurrence matrices

We can construct our vectors based off a co-occurrence matrix. Specifically, depending on the task we are trying to solve, we can have several possible designs. We will also see how we can encode a word or a document as a vector. To get a vector space model using a word by word design, we’ll make a co-occurrence matrix and extract vector representations for the words in our corpus.

Similarly, we can get a Vector Space Model using a word by document design.

Finally, we’ll see how in a vector space we can find relationships between words and vectors, also known as their similarity.

The co-occurrence of two different words is the number of times that they appear in your corpus together within a certain word distance k.

Word-by-word design

For instance, suppose that your corpus has the following two sentences.

The row of the co-occurrence matrix corresponding to the word data with a k value equal to two would be populated with the above values.

For the column corresponding to the word simple, you’d get a value equal to two. Because data and simple co-occur in the first sentence within a distance of one word, and in the second sentence within a distance of two words.

The row of the co-occurrence matrix corresponding to the word data would look like this if you consider the co-occurrence with the words simple, raw, like, and I. In this case, the vector representation of the word data would be equal to 2, 1, 1, 0.

What is n here?: With a word by word design, you can get a representation with n entries, with n between one and the size of your entire vocabulary.

Word-by-Document design

For a word by document design, the process is quite similar. In this case, you’ll count the times that words from your vocabulary appear in documents that belong to specific categories.

For instance, you could have a corpus consisting of documents between different topics like entertainment, economy, and machine learning. Here, you’d have to count the number of times that your words appear on the document that belong to each of the three categories.

In this example, suppose that the word data appears 500 times in documents from your corpus related to entertainment, 6,620 times in economy documents, and 9,320 in documents related to machine learning. The word film appears in each document’s category 7,000, 4,000, and 1,000 times respectively.

Once you’ve constructed the representations for multiple sets of documents or words, you’ll get your vector space.

Vector spaces

Let’s take the matrix from the last example. Here, you could take a representation for the words data and film from the rows of the table. However, we can also take the representation for every category of documents by looking at the columns. So the vector space will have two dimensions.

If the number of times that the words data and film appear on the type of document is plotted on 2-D plane, we get the vector representation for the categories: entertainment, economy and ML.

Note that in this space, it is easy to see that the economy and machine learning documents are much more similar than they are to the entertainment category.

In my next post, we’ll make comparisons between vector representations using the cosine similarity and the Euclidean distance in order to get the angle and distance between them.


  • So far, you’ve seen how to get vector spaces by two different designs, word by word and word by document, by either counting the co-occurrence of words or the co-occurrence of words in the document’s corpora.
  • I also showed you that in vector spaces, you can determine relationships between types of documents like similarity.