I don't understand - seems that the initial W* is already a good separator for S, that's because for each sample Xi: <Xi,W*> = [(-1)*m^(-0.5)] which is negative (as we wish to predict for every point) so the algorithm would not make any mistakes on S', where in section C it is claimed the algorithm would make (m) mistakes,
What am I missing, should the tag of all of the sample points be 1 instead of -1?