4
Introduction to Database Systems4 Files of Records v Page or block is OK when doing I/O, but higher levels of DBMS operate on records, and files of records. v FILE: a collection of pages, each containing a collection of records. Must support: –insert/delete/modify record –read a particular record (specified using record id ) –scan all records (possibly with some conditions on the records to be retrieved)

7
Introduction to Database Systems7 Unordered (Heap) Files v Simplest file structure contains records in no particular order. v As file grows and shrinks, disk pages are allocated and de-allocated. v To support record level operations, we must: –keep track of the pages in a file –keep track of free space on pages –keep track of the records on a page v There are many alternatives for keeping track of this.

9
Introduction to Database Systems9 Heap File Using a Page Directory v The entry for a page can include the number of free bytes on the page. v The directory is a collection of pages; linked list implementation is just one alternative. Data Page 1 Data Page 2 Data Page N Header Page DIRECTORY

10
Introduction to Database Systems10 Indexes v A Heap file allows us to retrieve records: –by specifying the rid, or –by scanning all records sequentially v Sometimes, we want to retrieve records by specifying the values in one or more fields, e.g., –Find all students in the “CS” department –Find all students with a gpa > 3 v Indexes enable us to answer value-based (associative) queries efficiently.

11
Introduction to Database Systems11 Alternative File Organizations Many alternatives exist, each ideal for some situation, and not so good in others: – Heap files: Suitable when typical access is a file scan retrieving all records. – Sorted Files: Best if records must be retrieved in some order, or only a `range’ of records is needed. – Hashed Files: Good for equality selections. u File is a collection of buckets. Bucket = primary page plus zero or more overflow pages. u Hashing function h : h ( r ) = bucket in which record r belongs. h looks at only some of the fields of r, called the search fields.

12
Introduction to Database Systems12 Cost Model for Analysis Ignore CPU costs, for simplicity: – B: The number of data pages – R: Number of records per page – D: (Average) time to read or write disk page –Measuring number of page I/O’s ignores gains of sequential I/O; thus, even I/O cost is only approximated. –Average-case analysis; based on several simplistic assumptions. * Good enough to show the overall trends!