Broken crawler behavior with my binary protofeed file

I detected a disturbing uptick in the number of 404s coming from a
certain big web indexing robot in recent days. It was completely
nonsensical stuff like this:

"GET /w/2013/03/31/snark/filesystem.png&quot;&gt;&lt;img HTTP/1.1"

Got that? It's actually picking up a quotation mark, a greater-than,
and then a less-than, and the beginning of an "img src"!

I finally figured out what was going on this morning. They've been
fetching my protofeed file and have parsing it as if it was HTML!
Yes, my
half-baked protobuf-based feed file
from last month has been linked a few times, and it started being
indexed. Then, for some reason, they decided the blobs of text within
that binary protobuf were indexable, and went to it. The result is that
mess above.

I will note that I have been serving it as "text/plain" for lack of a
better MIME type. It's definitely not going out as "text/html", in
other words.

For now, I've "solved" it by blocking this file in robots.txt. Let this
be a warning to anyone who links to binary data from their web pages.
If you have something resembling HTML in that binary blob, they might
start following the links, and this is probably not what you want.