CSci351 Introduction to File Processing: Homework 3 Solutions

Due date: Wednesday, March 19, 2008 in class
Each homework may have a different weight, which is based on its difficulty level.
Absolutely no copying others' work

    1.     (20%)
    2. How many comparisons would be required on the average to find a record using sequential search in a 300,000-record disk file?
      Ans>
        150,000

    3. If the record is not in the file, how many comparisons are required?
      Ans>
        300,000

    4. If the file is blocked so that 40 records are stored per block, how many disk accesses are required on average?
      Ans>
        If 40 records are stored per block, then there are 300,000 ÷ 40 = 7,500 blocks. So, the average number of accesses, assuming that the record in is the file, is 3,750.

    5. What if only one record is stored per block?
      Ans>
        If only one record is stored in a block, 150,000 accesses are required.

  1. In our discussion of the uses of relative record numbers (RRNs), we suggest that you can create a file in which there is a direct correspondence between a primary key, such as membership number, and RRN, so we can find a person's record by knowing just the name or membership number.     (25%)
    1. What kinds of difficulties can you envision with this simple correspondence between membership number and RRN?
      Ans>
        The problems arise when there are deletions from the file.

    2. What happens if we want to delete a name?
      Ans>
        Unless we assign old membership numbers to new members, a confusing practice in most cases, we can not reuse the space corresponding to deleted records. So, if there are a lot of deletions, a lot of space will be wasted.

    3. What happens if we change the information in a record in a variable-length record file and the new record is longer?
      Ans>
        If we change a variable length record and make it longer, the record will not fit in its old space. It will have to be placed at some different relative position in the file, hence will have to be given a new RRN. This in turn requires that the corresponding membership number has to be changed.

  2. Suppose a file must remain sorted. How does this affect the range of placement strategies available?     (15%)
    Ans>

  3. Why do placement strategies make sense only with variable-length record files?     (10%)
    Ans>

  4. Compare the average case performance of binary search with sequential search for a file of 10,000 records, assuming     (30%)
    1. That the records being sought are guaranteed to be in the file,
      Ans>
        We assume that sequential search is performed on an unsorted file. Our measure of performance will be based on number of disk accesses for reading or writing. To be realistic, we need to distinguish between random accesses (ra) and sequential accesses (sa). An access can involve a read or a write operation; we assume that reads and writes take the same amount of time. For simplicity, we also assume that the file is unblocked.
        Sequential: n/2 sa = 10,000 ÷ 2 sa = 5,000 sa
        Binary: ⌊log2 n⌋ + ½ ra = ⌊log2 10,000⌋ + ½ ra = 13.5 ra

    2. That half of the time the records being sought are not in the file,
      Ans>
        Sequential: (n/2 + n) / 2 = 3n/4 sa = 7,500 sa
        Binary: [(⌊log2 n⌋ + ½) + (⌊log2 n⌋ + 1)] / 2 = ⌊log2 n⌋ + ¾ ra = ⌊log2 10,000⌋ + ¾ ra = 13.75 ra

    3. That half of the time the records being sought are not in the file and that missing records must be inserted, and
      Ans>
      • Sequential: To determine whether the record is or is not in the file: 3n/4 sa = 3×10,000÷4 sa = 7,500 sa
        When the record being sought is not in the file, one extra access, a sequential write, appends the new record, This happens half the time, so it represents an average of ½ sa.
        Result: 3n/4 + ½ sa = (3n + 2) / 4 sa = (3×10,000+2)÷4 sa = 7,500.5 sa
      • Binary: When the record being sought is not in the file, we need to insert the new record, keeping the file in sorted order. We could conceivably view the file as a sorted array, then starting at the end of the file move backwards through the file where the new record was to be inserted. This process would require n/2 backward sequential reads and n/2 corresponding backward sequential writes, or a total of n sa.
        Result: ⌊log2 n⌋ + ¾ ra + n/2 sa = ⌊log2 10,000⌋ + ¾ ra + 10,000/2 sa = 13.75 ra + 5,000 sa

    4. If the records are blocked with 40 records per block.
      Ans>
        It decreases the number of sequential accesses by a factor of 40, and it decreases the number of random accesses by ⌊log2 40⌋ + ½, or about 5.5 .