GRANT: Machine-Learning Scoring Functions

Random Ideas Until You Get Organized

Different scoring functions for full ligands vs. fragments. Even just getting fragments with high ligand efficiency and then searching ligand databases for ones that contain those fragments, followed by docking, could be effective.

Benchmark training times. It’s not always good to train longer.

You can visualize the importance of different molecules by removing them and restoring. You can do it on an atom by atom basis, or you can fragment and do it that way. David Koes thinks fragmentation might be better. He wonders if it’s truly additive.

Look up clustered cross-validation. Clustering on the receptor sequence alone should be sufficient, not the ligand, because even if you have an identical ligand binding to two very different receptors, that is essentially an independent data point.

For David Koes, accuracy improved substantially when he created arbitrary rotations of his data. I believe this is only relevant to convolutional neural networks, however.

Test different subsets rigorously. Test pdb resolution. Auto-dock Vina atom types, or just elements as he does? Things like that.

Using gaussian for atomic positions rather than just the positions themselves is also an excellent idea. Combine them with a quadratic function so that they go to exactly zero after a certain distance, perhaps some factor of the van der waals radius.