I know the formula of formula for finding median for grouped data that is $$\mathrm{Median} = L_m + \left [ \frac { \frac{n}{2} - F_{m-1} }{f_m} \right ] \times c$$
and I know what all the letters stand for. But can anyone provide a derivation of this. Because I am very curious on how this comes.

@Shahab thank you for a bounty on such an old question
–
Shivam PatelJul 22 '14 at 17:12

1

"I know what all the letters stand for." Good for you! I don't. Why don't you tell us what they stand for?
–
RahulJul 28 '14 at 4:27

@Rahul: $L_m$ is the lower limit of the median class, $n$ is the total number of observations, $F_{m-1}$ is the cumulative frequency of the class preceding the median class, $f_m$ is the frequency of the median class, $c$ is the class width.
–
ShahabJul 29 '14 at 10:10

1 Answer
1

This formula is the result of a linear interpolation, which identifies the median under the assumption that data are uniformly distributed within the median class.

To derive the formula, we can note that since $N/2$ is the number
of observations below the median, then $N/2 - F_{m-1}$ is the number of observations that are within the median class and that are below the median ($F_{m-1}$ is the cumulative frequency of the interval below the median class, i.e. of all classes below the median class).

As a result, the fraction $\displaystyle\frac {N/2 - F_{m-1}}{f_m}$ (where $f_m$ is the frequency of the median class) represents the proportion of data values in the median class that are below the median.

Now if we assume that data are uniformly distributed (i.e., equally spaced) within the median class, multiplying the last fraction by $c$ (total width of the median class) we obtain the fraction of median class width corresponding to the position of the median. Adding the result to $L_m$ (lower limit of the median class), we get the final formula $\displaystyle L_m + \left [ \frac { \frac{N}{2} - F_{m-1} }{f_m} \right ] \cdot c$, which identifies the median.