Wednesday, 28 March 2018

Buffer pages required for external sorting of a file in database

Buffer pages required for external sorting of a file in database

Let us suppose that you have a file
with 8500 pages and 3 available buffer pages. If you use external sorting to
sort the above said file, answer the following questions;

a) How many runs will you produce in
the first pass?

b) To sort the entire file, how many
passes do you need?

c) How much is involved in terms of I/O
cost to sort the file?

d) If you like to sort the file in
just two passes, then what is the number of buffer pages you need?

Solution:

a) If Nf is the number of file pages
and Nb is the number of available buffer pages then the first pass requires ⌈Nf/Nb⌉ [ceiling(Nf/Nb)]
sorted runs.

For our case,

Number of file pages, Nf =
8500

Number of available buffer pages, Nb
= 3

The number of sorted runs in the first
pass = ⌈8500/3⌉
= 2834 runs

b) The number of passes required to
sort the entire file can be calculated using the equation ⌈logNb-1⌈Nf/Nb⌉⌉+1.

Number of passes to sort complete file
= ⌈log2⌈8500/3⌉⌉+1

=
⌈log22834⌉+1 = 9 passes

c)
Since each page is read and written once per pass, the total number of page
I/Os for sorting the file is 2 * Nf * (#passes):

Total
I/O cost for sorting the file = 2 * 8500
* 9 = 1,53,000

d) In Pass 0, ⌈Nf/Nb⌉ runs
are produced. In Pass 1, we must be able to merge this many runs; i.e., Nb − 1 ≥⌈Nf/Nb⌉.
This implies that Nb must
at least be large enough to satisfy Nb
* (Nb − 1) ≥ Nf; this can be used to
guess at Nb, and the
guess must be validated by checking the first inequality. Thus;

with 8500 pages in the file, Nb = 93 satisfies both
inequalities, but Nb = 92
does not. So we need 93 buffer pages to sort the entire file in just two
passes.

[Note:
to find the value of Nb, factorize the equation Nb * (Nb-1)
= Nf which can be written as a quadratic equation Nb2-Nb-Nf
= 0. The equation to be solved is Nb2 - Nb - 8500
= 0]