Using machine learning to score potential drug candidates may offer an advantage over traditional imprecise scoring functions, because the parameters and model structure can be learned from the data. However, models may lack interpretability, are often overfit to the data, and are not generalizable to drug targets and chemotypes not in the training data. Benchmark datasets are prone to artificial enrichment and analogue bias due to the overrepresentation of certain scaffolds in experimentally determined active sets. Datasets can be evaluated using spatial statistics to quantify the dataset topology and better understand potential biases. Dataset clumping comprises a combination of self-similarity of actives and separation from decoys in chemical space and is associated with overoptimistic virtual screening results. This talk explores data, methods, and potential data biases relevant to computational drug binding predictions.

Registration is required. Email CCTS@uky.edu by Monday, October 8, 2018. If you require special physical arrangements to attend this program, please call 859-323-8545.