We discuss the problem of selectivity estimation for range queries in real
metric datasets, which include spatial, or dimensional, datasets as a special
case. The main contribution of this paper is that, surprisingly, many diverse
datasets follow a "power law". This is the first analysis of distance
distributions for metric datasets.

We named the exponent of our power law as the "Distance Exponent". We
show that it plays an important role for the analysis of real, metric
datasets. Specifically, we show (a) how to use it to derive formulas
for selectivity estimation of range queries and (b) how to measure it
quickly from a metric index tree (like an M-tree).

We do experiments on many real datasets (road intersections of U.S.
counties, vectors characteristics extracted from face matching systems,
distance matrixes) and synthetic datasets (Sierpinsky triangle and
2-dimensional line). The experiments show that our selectivity estimation
formulas are accurate, always being within one standard deviation from the
measurements. Moreover, that our algorithm to estimate the "distance exponent"
gives less than 20% error, while it saves orders of magnitude in computation
time.

17 pages

*Currently on leave at Carnegie Mellon University from Mathematics and
Computer Science Institute, University of Sao Paulo at S. Carlos,
13560-970 Sao Carlos, Sao Paulo, Brazil.