I have a dataset which contains a string key field and up to 50 keywords associated with that information. Once the data has been inserted into the database there will be very few writes (INSERTS) but mostly queries for one or more keywords.

I have read "Tagsystems: performance tests" which is MySQL based and it seems 2NF appears to be a good method for implementing this, however I was wondering if anyone had experience with doing this with SQL Server 2008 and very large datasets.

I am likely to initially have 1 million key fields which could have up to 50 keywords each.

Would a structure of

keyfield, keyword1, keyword2, ... , keyword50

be the best solution or two tables

keyid
keyfield
| 1
|
| M
keyid
keyword

Be a better idea if my queries are mostly going to be looking for results that have one or more keywords?

Normalized is probably your better bet, but only a simulate work load will know for sure. You're comparing 50 increasingly sparse indexes of 1 million rows each vs 1 index of 50 million rows. I suspect that if I was a genius at MS writing a algorithm to search one index, I would pick up the values I was looking for as I went a long in one pass.

But if there are 50 indexes, I'd have to scan 50 indexes.

Also, in the denormalized schema, the 1st column will have a high quality index, the 50th column will have a low selectivity and probably result in scans rather than index lookups.