B-Tree Evloution
The major goal of a file structure design is to discover a general method for efficiently accessing and maintaining an index that is too large to hold in memory. The problems the designers confronted were
- Searching the index is slow because it involves seeking to different disk tracks. The ideal searching must be faster than binary searching.
- By using a simple index such as the one in Chapter 7, inserting a key into that index involves moving a large number of the other keys. Insertion and deletion must be as fast as search.
The following file structures were designed to solve the two problems:
- Binary search tree (late 1950s): It is a binary tree where every node's left subtree has values less than the node's value, and every right subtree has values greater. The major problem is the lack of an effective strategy of balancing the tree.
- AVL tree (1962): It is a balanced binary search tree where the height of the two subtrees of a node differs by at most one. It provides an acceptable solutions to the second problem above but not the first.
- B-tree (early 1970s): It is a balanced search tree in which every node has between ⌈m/2⌉ and m children, where m>1 is a fixed integer. The B-tree is, de facto, the standard organization for indexes in a database system.
- B+ tree (late 1970s): It is a B-tree in which keys are stored in the leaves. By using this structure, a file is able to be accessed sequentially with efficiency, which is not possible for B-tree.
- Hashing (1970s): It is a scheme for providing rapid access to data items which are distinguished by some key. Each data item to be stored is associated with a key, e.g. the name of a person.