It's time to tackle the set cover problem. We want a set of ngrams that match all the strings. Obviously, if we pick an ngram from each of the strings we want to cover, we'll have a solution. For instance,

But you can see that we won't be able to go much beyond this because there are just too many combinations.
We can cut down on the intermediate partial solutions by noticing that many of them are redundant. We don't need to keep partial solutions that cannot possibly lead to a shortest final solution. The various partial solutions each (potentially) match different sets of words. We only need keep the shortest solution for each different set of matched words. Furthermore, if a solution's matches are a superset of another's matches, and the other is the same length or longer, then the solution is dominated by the other and will always be at least the length of the longer.

The cover procedure takes a table that maps values to the keys that cover them. If we can reduce the size of that table without changing the solution, we'll run faster. If there are two entries in the table such that the keys of one are a superset of the keys of the other, we can discard the superset: the smaller of the two entries will be in the solution, and any key that matches the smaller one will automatically match the larger one as well.
Also, if two values have the same set of keys that match them, we need only include one of the values in the table.