The structure of a typical web graph consists of web pages as nodes, and hyperlinks as edges connecting between two related pages.
Web structure mining is the process of using graph theory to analyse the node and connection structure of a web site.
According to the type of web structural data, web structure mining can be divided into two kinds:
Inter-page hyperlink level:
The first kind of web structure mining is extracting patterns from hyperlinks in the Web.
A hyperlink is a structural component that connects the web page to a different location.
Intra-page document level:
The other kind of the web structure mining is mining the document structure.
It is using the tree-like structure to analyse and describe the HTML (Hyper Text Markup Language) or XML (eXtensible Markup Language) tags within the web page.
Motivation to Study Hyperlink Structure
Hyperlinks serve two main purposes.
Pure navigation
Point to pages with authority, which is a set of ideas or statements supporting a topic, on the same topic of the page containing the link.
This can be used to retrieve useful information from the Web.