There are no rules constraining the specific hypotheses in a hypothesis set. It can be any subset of the power set of the underlying set of points.

OK, so if we assume that H is such that it "takes any 1 point and sets it to + or -, with the other points--if any--getting the opposite sign" as proposed by nkatz, then:
for N=1, H={+,-} and shatters this 1 point
for N=2, H={+-,-+} and does not shatter these two points.
for N=3, H={++-, +-+, -++, --+, -+-, +--} and does shatter two points.

What is the VC dimension of this hypothesis set? From the N=2 case, I might conclude that it's 1 (it fails to shatter two points), but from the N=3 case I might conclude it's 2 (it shatters 2 of the points within this set of points). Where am I going wrong?

OK, so if we assume that H is such that it "takes any 1 point and sets it to + or -, with the other points--if any--getting the opposite sign" as proposed by nkatz, then:
for N=1, H={+,-} and shatters this 1 point
for N=2, H={+-,-+} and does not shatter these two points.
for N=3, H={++-, +-+, -++, --+, -+-, +--} and does shatter two points.

What is the VC dimension of this hypothesis set? From the N=2 case, I might conclude that it's 1 (it fails to shatter two points), but from the N=3 case I might conclude it's 2 (it shatters 2 of the points within this set of points). Where am I going wrong?

Unquestionably, the VC-dimensions of the first two hypothesis sets are 1, and that of the third hypothesis set is 2, by the definition.

There is a proviso to my glib claim above that absolutely any subset can be a hypothesis. While this is true when the underlying set is finite or countable, when it is uncountably infinite I believe it only makes sense to have hypotheses which are measurable sets, according to the probability distribution associated with samples.

[Almost anything that makes any sense is likely to satisfy this, but it is not difficult to construct pathological examples even when the sample space is as simple as the unit interval with the uniform distribution].

I am still confused on how to piece everything together. I like your much cleaner version, but I do have one comment. Having disjoint hypothesis sets does not necessarily mean that the set of dichotomies they create will also be disjoint.

For example, let H1 and H2 be the positive and negative 1d rays, respectively. These two hypothesis sets are disjoint. However, given any data set they can both create the dichotomies of all +1 or all -1. They won't have much overlap beyond that (and maybe THAT is the point), but they won't be entirely disjoint as far as our inequalities are concerned.

Quote:

Originally Posted by TTHotShot

Great work everyone - this question had me scratching my head until I found this post. The one thing I don't understand is the above statment. Am I missing something? It seems like it should be an inequality.

@ marek, TTHotShot: Well, I agree with your remark, I was too quick: I have not explored the link between a hypothesis set and the dichotomies it generates on a set of points well enough. As a consequence the assertion that in general is wrong in general. Instead, it should be an inequality (≤), and probably quite a loose one, since there can be an overlap in the dichotomies generated by and . Additionally, the definition of involves the determination of a maximum, and the independent search of 2 max on the RHS is a lot less constraining than one max on the LHS.

@ marek, TTHotShot: Well, I agree with your remark, I was too quick: I have not explored the link between a hypothesis set and the dichotomies it generates on a set of points well enough. As a consequence the assertion that in general is wrong in general. Instead, it should be an inequality (≤), and probably quite a loose one, since there can be an overlap in the dichotomies generated by and . Additionally, the definition of involves the determination of a maximum, and the independent search of 2 max on the RHS is a lot less constraining than one max on the LHS.

I agree that the inequality here feels rather loose, as that is what troubled me initially (since it feels like we can do better). However once we were able to generate examples of hypothesis sets that did result in disjoint dichotomies, and since those examples could readily be extended to broad families of such examples, that seals the deal. We are proving a general case argument. So even though many times we should expect to be well under the bound, those special examples that reach equality prevent us from making the bound any tighter.

The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.