Slide 5.4: Information provided by HTML files

Slide 5.3: Accessibility of information on the Web (cont.)
Slide 5.5: Automatic indexing methods
Home Print version

Information Provided by HTML Files

The Web flourishes because of its format-free style. Lacking a unifying structure popularizes the Web, but this level of complexity also makes Web searches difficult. HTML pages provide the following information:

Audio/figure/flash/table/video captions: A caption is usually a description of the subject.

Content: Web page content provides the most accurate and full-text information. However, it is also the least-used information for a search engine since content extraction is still far less practical.

Descriptions: Web page descriptions can either be constructed from the meta tags or submitted by webmasters or reviewers. A metatag is an HTML tag that provides information such as author, expiration date, a list of keywords, about a web page.

Hyperlinks: Hyperlinks contain high-quality semantic clues to a page’s topic. A hyperlink to a web page represents an implicit endorsement of the page being pointed to.

Hyperlink text: Hyperlink text is normally a title or brief summary of the target page.

Keywords: Keywords can be extracted from full-text documents or metatags. Filtering operations are applied to a document before obtaining keywords from the full-text document. Typical operations include the removal of common words using a list of stopwords, the transformation of upper-case letters to lower-case letters, etc.

Page titles: The title tag defines the title of an HTML document.

Text with a different font, style, color, or size: Emphasized text is usually given a different font to highlight its importance.

The first sentences: The first sentence of a web page is usually an introduction or an abstract.

◀
Previous

Slide 5.3: Accessibility of information on the Web (cont.)
Slide 5.5: Automatic indexing methods
Home Print version

▶
Next

Wife: “What are you doing?”
Husband : “Nothing.”
Wife : “Nothing…? You’ve been reading our marriage certificate for an hour.”
Husband : “I was looking for the expiration date.”