Assuming relativly small text blocks compared to the word list and processing each text block only once, I suggest to put the words from the word list into a hash table. Then you can perform a hash lookup for each word in the text block and find out if the word list contains the word.

If you have to process the text blocks multiple times, I suggest to invert the text blocks. Inverting a text block means creating a list for each word that containing all the text blocks containing the specific word.

In still other situations it might be helpful to generate a bit vector for each text block with one bit per word indicating if the word is contained in the text block.

Build up a trie of your words, and then use that to find which words are in the text.

you can build a graph used as a state machine and when you process the ith character of your input word – Ci – you try to go to the ith level of your graph by checking if your previous node, linked to Ci-1, has a child node linked to Ci

during your process if you go through a leaf while you are processing the last character of your input word, and only in this case, it means that your input is in your corpus.

This method is more complicated to implement than a single Dictionary or Hashtable but it will be much more optimized in term of memory use

The Boyer-Moore string algorithm should work. depending on the size/# or words in the block of text, you might want to use it as the key to search the word list (are there more words in the list then in the block). Also – you probably want to remove any dups from both lists.