Full Text Search

Full text search is a technique for searching a computer-stored text document based on its entire contents. Full-text searching techniques became common around the middle of the 1970s. Prior to that, document retrieval was typically done by having a researcher assign keywords to a piece of text which represented its subject. Documents could be retrieved using boolean combinations (e.g. ("encyclopedia" AND "online") NOT "encarta"). Such a mechanism had limitations, particularly in that it relied on indexers, a skilled and time-consuming task. Full text searching was expected to get around this, in that the entire text of a document could be stored for retrieval. In theory any document which mentioned the word(s) the searcher was interested in would be retrieved, bypassing the indexing process and making for more complete retrieval. In practice there were difficulties. Primarily a document might mention a search-word in passing in a way that was tangential to its subject, and be retrieved along with documents that were really relevant. Full-text searching is also open to the problem where a document might use a different word to describe a concept from the one used by the searcher. More sophisticated search techniques were therefore required to support full-text searching. One was to record the position of each word in the document, allowing the retrieval of documents containing the phrase "full text" rather than just the words "full" and "text". Another was the ability to rank documents retrieved according to the frequency of occurrence of search words, on the assumption that those that mention a word more often are more 'relevant' to it. The most common approach to full-text search is to generate a complete index or concordance for all of the searchable documents. For each word (excepting stop words which are too common to be useful) an entry is made which lists the exact position of every occurrence of it within the database of documents. From such a list it is relatively simple to retrieve all the documents that match a query, without having to scan each document. Although for very small document collections full-text searching can be done by serial scanning, indexing is the preferred method for almost all full-text searching.

 

<< PreviousWord BrowserNext >>
mahoning township
lynn township
long creek
lockwood
low moor
monson
manvel
paremoremo
leroy township
lewis township
lindale
litchfield township
manley
morenci
mullen
crow
super buu
pacinian corpuscle
local interstellar cloud
rail trail
uss howorth (dd 592)
list of washington state routes
barbara bermudo
the stories of vladimir nabokov
seal of salt lake city
wenham lake ice company
welcome (album)
dadasaheb phalke
lindale, new zealand
text retrieval
dan majerle
lowell high school
b.r. chopra
new reformed orthodox order of the golden dawn
pankaj mullick
rando animals
advanced bonewits cult danger evaluation frame
equestrian at the 2004 summer olympics eventing individual
rishi kapoor
digital media players
bell jetranger
chota, ecuador
french repertoire part i
steve malette