Bottom Line:
We used this strategy to identify properties that make certain proteins more likely to cause harmful effects when targeted; such proteins usually have domains commonly found throughout the human proteome.This approach enabled us to distinguish between approved and problematic drugs with an accuracy of 60-70%.Moreover, our approach can be applied as soon as candidate drugs are available, as demonstrated with predictions for more than 5000 experimental drugs.

ABSTRACTIdentifying promising compounds during the early stages of drug development is a major challenge for both academia and the pharmaceutical industry. The difficulties are even more pronounced when we consider multi-target pharmacology, where the compounds often target more than one protein, or multiple compounds are used together. Here, we address this problem by using machine learning and network analysis to process sequence and interaction data from human proteins to identify promising compounds. We used this strategy to identify properties that make certain proteins more likely to cause harmful effects when targeted; such proteins usually have domains commonly found throughout the human proteome. Additionally, since currently marketed drugs hit multiple targets simultaneously, we combined the information from individual proteins to devise a score that quantifies the likelihood of a compound being harmful to humans. This approach enabled us to distinguish between approved and problematic drugs with an accuracy of 60-70%. Moreover, our approach can be applied as soon as candidate drugs are available, as demonstrated with predictions for more than 5000 experimental drugs. These resources are available at http://sourceforge.net/projects/psin/.

Figure 2: (A) Although most targets of approved drugs are exclusive, the problematic targets are almost entirely covered by the approved category. Between parentheses are the number of singleton proteins in the PSIN. (B) Approved and problematic drugs have different numbers of reported targets. While most problematic drugs have only one target reported, approved drugs have several—identified either by the community after the drug is marketed or by companies as part of the drug-approval process. (C) The Burt's constraint was proposed in a sociological context to study positions of advantage for individuals in a group. In this simple example, if the nodes are individuals, on the left no node can negotiate or bargain with the others, since they all have alternative connections. However, on the right, if a structural hole exists, Node 1 is in a better position, since the other two nodes may not be aware of each other's existence;hence, Node 1 is less “constrained” than the other two. In a protein similarity context, proteins with low constraint values are generally those with several common domains, located between different protein families. In contrast, proteins with large constraint values are the peripheral nodes, with a few domains shared among only a few other proteins.

Mentions:
We observed that the targets of approved and problematic drugs largely overlapped (Figure 2A), and there were more reported targets in the combined databases for the approved drugs than for the problematic drugs (Figure 2B). This is due to the strict requirements for drug approval by regulatory agencies, since before going to market, companies must provide detailed reports about modes of action, and after a compound is released, researchers from academia often report additional targets.

Figure 2: (A) Although most targets of approved drugs are exclusive, the problematic targets are almost entirely covered by the approved category. Between parentheses are the number of singleton proteins in the PSIN. (B) Approved and problematic drugs have different numbers of reported targets. While most problematic drugs have only one target reported, approved drugs have several—identified either by the community after the drug is marketed or by companies as part of the drug-approval process. (C) The Burt's constraint was proposed in a sociological context to study positions of advantage for individuals in a group. In this simple example, if the nodes are individuals, on the left no node can negotiate or bargain with the others, since they all have alternative connections. However, on the right, if a structural hole exists, Node 1 is in a better position, since the other two nodes may not be aware of each other's existence;hence, Node 1 is less “constrained” than the other two. In a protein similarity context, proteins with low constraint values are generally those with several common domains, located between different protein families. In contrast, proteins with large constraint values are the peripheral nodes, with a few domains shared among only a few other proteins.

Mentions:
We observed that the targets of approved and problematic drugs largely overlapped (Figure 2A), and there were more reported targets in the combined databases for the approved drugs than for the problematic drugs (Figure 2B). This is due to the strict requirements for drug approval by regulatory agencies, since before going to market, companies must provide detailed reports about modes of action, and after a compound is released, researchers from academia often report additional targets.

Bottom Line:
We used this strategy to identify properties that make certain proteins more likely to cause harmful effects when targeted; such proteins usually have domains commonly found throughout the human proteome.This approach enabled us to distinguish between approved and problematic drugs with an accuracy of 60-70%.Moreover, our approach can be applied as soon as candidate drugs are available, as demonstrated with predictions for more than 5000 experimental drugs.

ABSTRACTIdentifying promising compounds during the early stages of drug development is a major challenge for both academia and the pharmaceutical industry. The difficulties are even more pronounced when we consider multi-target pharmacology, where the compounds often target more than one protein, or multiple compounds are used together. Here, we address this problem by using machine learning and network analysis to process sequence and interaction data from human proteins to identify promising compounds. We used this strategy to identify properties that make certain proteins more likely to cause harmful effects when targeted; such proteins usually have domains commonly found throughout the human proteome. Additionally, since currently marketed drugs hit multiple targets simultaneously, we combined the information from individual proteins to devise a score that quantifies the likelihood of a compound being harmful to humans. This approach enabled us to distinguish between approved and problematic drugs with an accuracy of 60-70%. Moreover, our approach can be applied as soon as candidate drugs are available, as demonstrated with predictions for more than 5000 experimental drugs. These resources are available at http://sourceforge.net/projects/psin/.