Also, it can give great information to other programmers - they might be browsing tests to see how they're supposed to use this functionality and when they see this test they'll immediately know that it's not yet implemented.

How about that:
You are changing the behavior of your API. If you'd follow test first approach you'd need to write a new test for your changes. Let it fail with the not supported exception, remove the assert throws from the other test and then implement the new feature. This is not a bad test. It is you not following the principles.

I didn't want a broken test failing my build for a week while I waited for a new RavenDB build so I wrote it as a negative test. When I upgraded RavenDB and the test failed it was a nice reminder about the issue.

In some ways I think the negative test had /some/ value - if I ever wondered if a problem still existed I had some proof for it. But having a test 'fail' because something 'worked' did leave a strange feeling. I'm not sure what the solution for this should be. Maybe Assert.Inconclusive()?

As mentioned before the problem aint the test itself. It may be that it is hidden in an improper location without giving a semantic clue.

I do have a test file dedicated to "unsupported" or "defects" in third party components. When I update any of them (including RavenDB) and they are fixed I know right away and fix the workarounds in the code.

So I dont think it is a test, I believe it is a "note" to your team or yourself in the future where you are not actively thinking about it.

Parts of that test are actually quite important. shardedDocumentStore.Url can't possibly return a result that makes sense - if it returned (for example) the first shard's URL that would have been a bug. Same goes for DatabaseCommands and AsyncDatabaseCommands.

I think it's even important for GetLastWrittenEtag and Defer - because as long as you've not implemented it, an incorrect result can be far, far worse than a NotSupportedException.

Tests are designed to enforce the expectations made of the code. Those expectations can be set by interfaces, but can be supplemented by traditional documentation or other out-of-code communications.

In this case, if such communications have established that these methods are not supported, then the tests correctly enforce that (and are "good"). By failing when one of these methods is implemented, the developer is reminded that the expectations of the code have changed, and out-of-code communication artifacts must be updated to reflect that.

There should have been separate tests for each of those functions, positioned where you would have naturally written the tests for the real, working functionality. That way, it wouldn't have shown up at the build server; it would have shown up as soon as you considered working on the feature that wasn't supported.

Glomming everything together means you have a submarine test, which, no matter what it's testing, is bad.

Had these tests been closer to where you were expecting, you wouldn't have thought to write this post, most likely.