World Wide Web Search Technologies
Traditional information retrieval technologies have been applied to the web searches but without much success.
The biggest hurdles include
- Space:
The huge amount of web data is not seen in the past.
It is estimated there are about 50 billion pages stored on the Web, which is increased by more than one million pages each day.
- Speed:
The data size and real-time response requirement make most of the information retrieval technologies impracticable.
Numerous search technologies have been applied to the web searches, and the dominant search method is yet to be found.
Other than the traditional indexing, current web search technologies are classified into six categories:
- Hyperlink exploration:
A hyperlink of a web page represents an implicit endorsement of the page being pointed to.
- Information retrieval:
Relevance feedback and data clustering are two of the most popular IR techniques used by search engines.
Compared to data clustering, relevance feedback has not so far been applied to any commercial products because it requires some interaction with users.
- Metasearches:
Metasearch engines search several other engines simultaneously, and present results in some sort of integrated format.
- SQL approaches:
This approach views the Web as a huge database where each record matches a page and uses SQL-like language to support effective and flexible query processing.
- Content-based multimedia searches:
The reason for the low number of content-based multimedia search engines is mainly due to the difficulty of automated multimedia indexing.
- Others:
These are ad hoc methods apart from the above techniques.
For example, work aimed at making the components needed for web searches more efficient and effective, such as better ranking algorithms and more efficient crawlers.