Does $\phi \in C$ correspond to hyperrectangles of arbitrary dimension?

When the text says "assign the smallest hyperrectangle containing
these points", I assume it means both in the sense of smallest
dimension as well as size. I'm not sure whether this means the
$\phi_i$ are of fixed dimension. I think so.

However, my main source of perplexity is the sentence beginning
"Clearly, for each $\phi$..." and ending "on the boundary of the
hyperrectangle". I've no idea why this would be true.

There is a similar setup earlier in the book, Section 4.5, but this
talks about hyperplanes, and is easier to follow.

No doubt I'm misreading or something. Clarifications
appreciated.

Regards, Faheem.

(Some background to the extract that follows.)

In what follows, d is just a fixed integer. I guess it represents the dimension.

Suppose you have a classifier

$$ \phi: \mathbb{R}^d \longrightarrow \{0, 1\}$$

and $n$ ordered pairs $(X_i, Y_i)$, where $X_i \in \mathbb{R}^d$ are the data values,
and $Y_i\in \{0,1\}$ are the correct classifications of the $X_i$.
Then the empirical error (or risk) of $\phi$ is

Let $C$ be the class of classifiers assigning 1 to those $x$ contained
in a closed hyperrectangle and 0 to all other points, Then a
classifier minimizing the empirical error $\hat{L}_n(\phi)$ over all
$\phi \in C$ may be obtained by the following algorithm: to each
2d-tuple $(X_{i_1}, X_{i_{2d}})$ of points from $X_1,\dots, X_n$,
assign the smallest hyperrectangle containing these points. If we
assume that $X$ has a density, then the points $X_1,\dots, X_n$ are in
general position with probability one. This way we obtain at most ${n
\choose 2d}$ sets. Let $\phi_i$ correspond to the $i$-th such
hyperrectangle, that is, the one assigning 1 to those $x$ contained in
the hyperrectangle, and 0 to other points. Clearly, for each $\phi \in
C$, there exists a $\phi_i$, i = 1, ${n \choose 2d}$, such that

$$\phi(X_j) = \phi_i(X_j)$$

for all $X_k$, except possibly for those on the boundary of the
hyperrectangle. Since the points are in general position, there are at
most $2d$ such points. Therefore, if we select a classifier
$\hat{\phi}$ among $\phi_1, \phi_{{n \choose 2d}}$ to minimize the
empirical error, then it approximately minimized the empirical error
over the whole class $C$ as well.

1 Answer
1

Yes, a hyperrectangle is a generalisation of rectangle to higher dimensions. Here the data is given by points in $\mathbb{R}^d$, so the hyperrectangles are all of that dimension. As with all such algorithms you need to find a way to get a handle on the set of classifiers, so rather than the infinite class of all hyperrectangles of dimension $d$, the choice will be from the $n \choose 2 d$ hyperrectangles of dimension $d$, each of which is the smallest for some choice of $2 d$ of the data points.

For example, if the data consisted of 1000 points in $\mathbb{R}^2$, rather then the infinite class of all rectangles, we confine ourselves to the $1000 \choose 4$ rectangles which minimally contain a subset of 4 of the data points.

One task then is to show that the best of this finite set is almost as good as the best of all the hyperrectangles -- good in the sense that were the data points each labelled $+$ or $-$, the $+$s would be best separated from the $-$s. The argument claims that for each hyperrectangle there will be one from the finite set agreeing with it except for a small number of points on the boundary equal to the number of faces of the hyperrectangles. In the example above, it says that for any rectangle, there is one in the set of $1000 \choose 4$ which encloses exactly the same points, except possibly for 4 on the boundary. Not completely obvious, I agree.

Edit: If hyperrectangles are restricted to have lines parallel to the axes, then it is obvious. Judging from problem 11.6 on page 183, this may well be the case.

Hi David, Thanks very much for your kind and helpful answer. Right, the points and hrs are of dimension d. This makes things clearer. It is still not clear to me what is meant by "smallest hr containing these points". If there is a fixed set of axes, and the sides of the hr are parallel to it, then it is clear, if we take "smallest" to mean "smallest area". Otherwise it is not clear to me that of the infinite number of hrs containing a set of points, one has unique smallest area. Regards, Faheem
–
Faheem MithaMay 26 '10 at 16:34

Also, the statement, "Clearly, for each $\phi \in C$, there exists a $\phi_i$" can't be true without the fixed axes assumption.
–
Faheem MithaMay 26 '10 at 16:37

"Smallest" must mean "smallest d-dimensional volume". I tend to think axes parallel to axes is meant. Otherwise, in the plane, if you had 6 points arranged as the vertices of a regular hexagon, and enclosed these in a rectangle, it seems to me that this would classify differently from a smallest rectangle about any 4 of the 6 points. With lines parallel to axes, you can just shrink the rectangle until sides each hit a point.
–
David CorfieldMay 26 '10 at 18:48