Background and Objective: The emergence of cancer genomics has expanded the background of protein-protein interaction network applications for identification of protein targets for drug development. The objective of the present study was to analyse breast cancer pathway genes to determine the potential drug targets. Methodology: In this study, breast cancer subnetwork was constructed from the pathway genes and potential targets in breast cancer pathway were identified using node or gene deletion analysis. The most popular centrality measures, such as betweenness and closeness centrality play a major role in the network robustness. Deleting the genes with the highest centrality values may result in significant destruction of the network. The significantly mutated values of the genes involved assists in selecting a precise target were determined using z-score. Results: On deleting the top 10 genes with highest betweenness centrality, significant (p<0.05) changes were observed in both shortest path length (L) and clustering coefficient (c) values as compared to breast cancer subnetwork. Out of these top 10 genes two of them had positive significant mutation values. Conclusion: These genes were identified to be NOTCH 1 (NOTCH family of proteins) and epidermal growth factor receptor (EGFR family) and therefore, these genes are possible target for drug therapy.

Cancer is a group of diseases causing abnormal out of control cell growth which has the ability to spread to other parts of the body. Mutations in the growth regulating genes as well as abnormal changes in the genes may result in cancer. The term "breast cancer" refers to a malignant tumor that has developed from cells in the breast. Usually breast cancer either begins in the cells of the lobules, which are the milk-producing glands or the ducts, the passages that drain milk from the lobules to the nipple. Less commonly, breast cancer can begin in the stromal tissues, which include the fatty and fibrous connective tissues of the breast. Over time, cancer cells can invade nearby healthy breast tissues and make their way into the underarm lymph nodes, small organs that filter out foreign substances in the body. If cancer cells get into the lymph nodes, then they have a pathway into other parts of the body. The most common type of breast cancer is ductal carcinoma, which begins in the cells of the ducts1. Ductal carcinoma in situ is a condition in which abnormal cells are found in the lining of the ducts but they haven't spread outside the duct. Breast cancer that has spread from where it began in the ducts or lobules to surrounding tissues is called invasive breast cancer. In inflammatory breast cancer, the breast looks red and swollen and feels warm because the cancer cells block the lymph vessels in the skin. Recently there has been several review articles on breast cancer which describes the updates in diagnosis and treatment2-4.

All known or predicted protein interactions in an organism are summarized as a protein network5. Proteins function and cellular localization of yeast can be uncovered using large scale properties of a protein-protein network6. Topological and dynamic features of protein networks can be inferred by systematic mapping of protein interactions. This can also be used to illuminate the mechanisms and development of diseases7. Protein-protein interaction network has also been used to predict the outcome of breast cancer patients and it was revealed that human interactome may work as an indicator of the same8.

Protein-protein interaction network of human genome was constructed. Then the list of genes in breast cancer pathway from KEGG database was used to construct the breast cancer subnetwork. Network analyzer plug-in was used to analyse the network. The network was analyzed on the basis of clustering coefficient (C) values and shortest path length (L) values. To determine the robustness of the network, deletion analysis was performed for the subnetwork. The top 10 genes with the highest values of hubs, betweenness centrality and closeness centrality were used in the deletion analysis. Betweenness centrality was chosen as the primary parameter. The mutation count of the genes involved was obtained from dbSNP and the significant mutated genes were identified using the z-score value. The purpose of the study is to analyse the breast cancer pathways genes using protein-protein interaction network and to determine the potential target genes that are responsible for the disease progression.

MATERIALS AND METHODS

The human interactome network was obtained from freely accessible BioGRID on December, 2016 which is a unified database of physical and genetic interactions making it useful for analysis of protein network9. Cytoscape (3.4.0) was used to construct the human Interactome network. Cytoscape is open source software used to integrate biomolecular interaction networks with high-through put expression data and other molecular states into a unified conceptual framework10. The breast cancer pathway genes were obtained from KEGG disease which is a database containing higher order functional information linking disease genes, pathways, drugs and diagnostic markers11. There are 130 genes involved in the breast cancer pathway as shown in Table 1. To determine the genes involved in breast cancer pathway that undergo frequent mutations, the dbSNP database was used12. The gene mutations were normalized with the gene length by dividing the gene mutation count with its respective length.

Statistical analysis: The statistically (p<0.05) significant mutated genes were evaluated using the z-score values of the gene mutation count13:

where, X is the normalized mutation count for the gene, μ is the average of the normalized score and σ is the corresponding standard deviation.

RESULTS

The human interactome was obtained from BioGRID database that contains 19634 nodes and 270970 edges as shown in Fig. 1a. The nodes are denoted by the colour blue while the edges are of colour grey. The grid layout was selected to represent the human protein-protein interaction network. To determine the breast cancer subnetwork, the breast cancer pathway entry was selected from the KEGG database.

The numbers of genes obtained were 130 and then Cytoscape version 3.4.0 was used to construct a subnetwork of breast cancer proteins as shown in the Fig. 1b. The breast cancer subnetwork contains 130 nodes and 1538 edges.

Basic network parameters were calculated using Network Analyzer. The shortest path length (L) was found to be 2.85 and the clustering coefficient (C) was found to be 0.286 for the breast cancer subnetwork. For deletion analysis, the highest 10 values of hubs, betweenness centrality and closeness centrality were considered. On deleting the genes with the highest number of hubs, the shortest path length was found to be 4.177 and the clustering coefficient was found to be 0.188. From the above result the value of L was found to be increased whereas the value of C was decreased. Therefore, the results suggest that the hubs play a major role in the robustness of the network. The highest 10 closeness centrality genes were deleted and the parameters were found to be 3.547 for L and 0.251 for C. These values were not much deviated from the breast cancer subnetwork parameters but the L value was found to be increased.

After deletion of the genes with the highest 10 values of betweenness centrality, the shortest path length was found to be 3.915 and the clustering coefficient was found to be 0.187. The value of C was found to lower than that of the breast cancer subnetwork analysis while L was higher than the breast cancer subnetwork. According to the Kolmogorov-Smirnov test, the difference in L and C were found to be statistically significant (p<0.05).

Fig. 2:

Clustering coefficient values for breast cancer subnetworks, removal of 10 genes with the highest value of hubs, betweenness and closeness centrality. Deletion analysis indicates that the top 10 genes are essential in communication with in the breast cancer pathways

Then the betweenness centrality was chosen as the parameter to compare genes are the clustering coefficient (Fig. 2) was lower and the shortest path length (Fig. 3) was higher when the genes with top 10 betweenness centrality were deleted than that of top 10 genes with the highest closeness centrality deleted.

The SNPs for all the genes involved in the breast cancer subnetwork were obtained and normalized based on the gene length. The statistically significant central genes were determined using z-score function represents the top 10 genes with the highest betweenness centrality values and their respective z-score values represented in Fig. 4. Among the 10 genes, the two genes with the positive z-score values were selected and their NCBI gene ID was found to be 4851 (NOTCH family of proteins) and 1956 (EGFR family).

DISCUSSION

The protein-protein interaction network of breast cancer proteins was constructed from KEGG pathway database. The network showed characters of scale free network and hierarchical network as well. Node deletion or gene deletion in a network may help to understand the robustness and attack tolerance of the network. When the largest 10 central genes were removed, the shortest path length increased as well as the clustering coefficient decreased. The deletion analysis indicates that cancer pathway related genes are highly robust to removal of central genes.

The study concluded that NOTCH-1 family of proteins and EGFR protein are the most effective target for drug development for the destruction of the breast cancer protein-protein interaction subnetwork. These genes have a high value of betweenness centrality. Betweenness centrality signifies the centrality of the protein in the network. It has been shown that highly connected vertices in protein interaction networks are often functionally important and the deletion of such vertices is related to lethality14.

Hence, the two genes exhibit a relatively low value of clustering coefficient and a high value of shortest path length in their respective deletion analysis. For further precision of the proteins to be targeted out of the proteins with the top 10 highest betweenness centrality, positive significant mutation values are taken into account. It has also been argued that network analysis can be used in general to infer novel functions, to quantify positional importance of protein in a disease associated pathway15. The significant mutation value is useful in finding protein targets that themselves are susceptible to a high number of mutations per gene length. Thus the targets play central role in the destruction of the breast cancer network while possessing a value of significant mutation a well making them an ideal candidate for drug development.

It has been found that targeting NOTCH signalling maybe of therapeutic value in breast cancer as it is over expressed and highly activated16. A recent review suggested that NOTCH signalling pathways play a vital role development and progression of breast cancer17. The EGFR receptor has been extensively studied in breast cancer. The EGFR is known to be over expressed in triple negative breast cancer. Growth factors such as EGFR, c-kit or p53 mutation status and several proliferative mechanisms like mitogen-activated protein kinase (MAPK) and protein kinase components of the extracellular signal-regulated kinases (ERK) pathway have been indicated as possible determinants of sensitivity to chemotherapy in TNBC7. The TNBC is strongly associated with EGFR expression. Yet, the most benefit of tandem high-dose chemotherapy was shown among TNBCs but not in the small subgroup of EGFR-positive tumors indicating the need for additional targeted therapies in this fraction7.

Complex network analyses in breast cancer disease were also used to identify the target genes. The gene regulatory network comprising transcription factors and target genes were used to identify the regulatory circuits governing breast cancer disease18. Even the structural changes in the gene regulatory network have been analyzed using network analysis of breast cancer genes19. The genes associated to the disease were determined using breast cancer networks' centrality20. Another study in cancer genomics, that describes the overview of pathway and network analysis techniques in tumour biology21.

CONCLUSION

The breast cancer subnetwork was created from the genes that take part in cancer pathway and the mutations were mapped to their corresponding genes. Deletion analysis was carried out for the top 10 genes with highest hub values, betweenness centrality and closeness centrality values, respectively. Among the three network parameters, betweenness centrality values were found to have a major deviation in comparison with the breast cancer network. The genes with the top 10 betweenness centrality values were isolated out of which the genes with positive significant mutation values were selected. Further analysis shows that two of these have been reported as breast cancer associated genes. It is suggested that NOTCH family of proteins and EGFR family genes could be used as a target for drug development.

SIGNIFICANCE STATEMENTS

This study discovers the potential drug targets for breast cancer that can be beneficial for the cancer research community to develop new drugs for the treatment of cancer. This study will help the researchers to uncover the critical areas of determining the receptors that are crucial for the disease progression and also it will enhance the knowledge in target identification that many researchers were not able to explore. Thus a new theory on node deletion or gene deletion analysis for identifying the drug targets and in combination of significantly mutated genes, may help to identify new drug targets for breast cancer disease.

ACKNOWLEDGMENT

The authors thank to the management of Vellore Institute of Technology for providing the computational facility required for this study.