Discussing this with Ezio on IRC, we decided that it probably makes more sense to do this check outside the scanner as preliminary validation of the input passed in via the public API. That will minimise the overhead and also avoids any potential side effects if "idx==0" is ever true in cases we're not currently testing.
The tests from the current patches should be OK, though.
Ezio also found that, for Py3k, adding an explicit check for non-str input and throwing an appropriate error would also be an improvement over the status quo:
>>> import json
>>> json.loads(b'')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ncoghlan/devel/py3k/Lib/json/__init__.py", line 316, in loads
return _default_decoder.decode(s)
File "/home/ncoghlan/devel/py3k/Lib/json/decoder.py", line 344, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: can't use a string pattern on a bytes-like object