Vector Spaces for Document Similarity
Archived; click post to view.
Excerpt: Introduction How can a search engine determine whether two documents are similar? In our post on search engine operation we looked at how search engines can retrieve information on documents given a query string. Vector methods are also used to decide whether documents are related. Example Let’s make this a concrete example. Say we have three toy documents which have been parsed into their keywords so that all the unimportant word have been removed. For example, commonly occurring words such as: as, of, the, … and so on are removed. We might be left with: document 1 = {news, information, campaign, raise, awareness} document 2…
Tags: search engine operation, similarity, term space, vector representation


![Validate my RSS feed [Valid RSS]](http://www.seothegame.com/wp-content/uploads/2008/11/valid-rss.png)