1. matches(...) returns a boolean.
2. HTML isn't a regular grammar, so regex isn't the best tool. Use a HTML parser.

However, if all you want to do is obtain the only subsequence that consists entirely of digits, then the fact that the input String is valid HTML becomes irrelevant. Just replaceAll(...) non-digits with the empty String.