Representative Protein Database

Hi, I have a large database (11,000 or so) of predicted structures for 1d3b.
What is notable about them is that the member of this distribution most similar
to the native structure is 8.7A RMSD away from it, the structures themselves
have nearest neighbors within the distributiong between 3-7 A RMSD. Since these
structures were generated from independent conformational searches on independent
computers (with a 32 bit random seed), this leads me to believe that the folding
algorithm itself is basically sound. However, one would immediately suspect the
potential function used to drive the search is off.
While this is probably the case, the plot thickens here. The "energy" of a monomer
of the native structure is about -80 arbitrary units. The structures generated
by conformational search range from -120 to -190 arbitrary units. However, if
I calculate the energy of the monomer in the context of its natural multimeric
complex, that energy becomes -125 to -155 such units or slightly better than
the average of what I could produce by conformational search.
This leads me to suspect that I have a normalization problem. The isolated monomer
must endeavor to construct an independent hydrophobic core during conformational
search. This leads to a super-compact conformation with approximately the same
number of contacts as the native conformation in the context of its natural
multimer.
Which leads to my question. How big does a protein need to be before it can form
a stable hydrophobic core (without any disulfides) that can exist long-term in
solution? I want to extract a database of such proteins both to generate
energy function parameters and to use as targets for conformational search. I
would like to limit the size of my proteins so that conformational serach is
manageable.
Any ideas?
Scott