Information Retrieval (IR)
Information retrieval is essentially a matter of deciding which documents in a collection should be retrieved to satisfy a user’s need for information.
The retrieval decision is made by comparing the terms of the query with the index terms (important words or phrases) appearing in the document itself.
The decision may be binary (retrieve/reject), or it may involve estimating the degree of relevance that the document has to the query.
System |
Data Object |
Primary Operation |
Database Size |
IR (Information Retrieval) |
document |
(probabilistic) retrieval |
small to very large |
(relational) DBMS |
table |
(deterministic) retrieval |
small to very large |
AI (Artificial Intelligence) |
logical statements |
inference |
usually small |
Retrieval Algorithms
Retrieve documents or text with information content that is relevant to a user’s information need.
- Sequential scanning of the text:
No extra memory is needed.
The running time is proportional to the size of the text.
- Indexed text:
This speeds up the search.
The index size is proportional to the database size.
To determine in which order to display documents to the users, the search engine uses an algorithm to rank pages that contain the keywords.
For example, it may count the number of times the keyword appears on a page.
Query Processing
Query processing is the activity of analyzing a query and comparing it to indexes to find relevant items.
A user enters a keyword or keywords, along with Boolean modifiers such as “and,” “or,” or “not” into a search engine, which then scans indexed Web pages for the keywords.