Abstract

Breast cancer is the most common cancer in women both in the developed and less developed countries, and it imposes a considerable threat to human health. Therefore, in order to develop effective targeted therapies against Breast cancer, a deep understanding of its underlying molecular mechanisms is required. The application of deep transcriptional sequencing has been found to be reported to provide an efficient genomic assay to delve into the insights of the diseases and may prove to be useful in the study of Breast cancer. In this study, ChIP-Seq data for normal samples and Breast cancer were compared, and differential peaks identified, based upon fold enrichment (with P-values obtained via t-tests). The Protein-protein interaction (PPI) network analysis was carried out, following which the highly connected genes were screened and studied, and the most promising ones were selected. Biological pathway involved in the process were then identified. Our findings regarding potential Breast cancer-related genes enhances the understanding of the disease and provides prognostic information in addition to standard tumor prognostic factors for future research.

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1

5

ChIP-Seq analysis workflow used in…

Figure 1

13

ChIP-Seq analysis workflow used in this study. Diagrammatic representation of the steps performed…

Figure 1

ChIP-Seq analysis workflow used in this study. Diagrammatic representation of the steps performed for Chip-Seq data analysis used in this study.

Comparison of average peak signal intensity of MCF10A, MCF7 and MDA-MB-231. ( A…

Figure 3

Comparison of average peak signal intensity of MCF10A, MCF7 and MDA-MB-231. (A) Average signal (μ) and standard deviation (1σ, 2σ) for MCF10A with intensity at y-axis for 2006 regions. 20000 bp surrounding each region, segmented into 400 bins were represented on the x-axis. (B) Average signal and standard deviation for MCF7 with intensity at y-axis for 2589 regions. 20000 bp surrounding each region, segmented into 400 bins were represented on the x-axis. (C) Average signal and standard deviation for MDA-MB-231 with intensity at y-axis for 2945 regions. 20000 bp surrounding each region, segmented into 400 bins were represented on the x-axis. (D) The average overlay peaks for all three samples.

Figure 4

5

Comparison of enriched regions between…

Figure 4

13

Comparison of enriched regions between MCF10A, MCF7 and MDA-MB-231. ( A B and…

Figure 4

Comparison of enriched regions between MCF10A, MCF7 and MDA-MB-231. (A B and C) are 2D histograms of the –log (p-value) on the x-axis and the enrichment value (FE) on the y-axis for MCF10A (Normal), MCF7 and MDA-MB-231 respectively. Each region on the x and y axis were segmented into 75 and 75 bins. The number of regions within each bin were counted. (D,E and F) shows the histogram of the enrichment parameter found in MCF10A (Normal), MCF7 and MDA-MB-231 peaks respectively. The values on the x-axis were segmented into 50 bins and presented on a log scale with a base of 10. (G) Density plot of the distribution selected parameters for MCF10A (Normal), MCF7 and MDA-MB-231 respectively. The y-axis was segmented into 49 bins and the number of regions within each bin was represented by a color code. The color coding is shown in the bottom corner of the plot. (H) Venn diagram of common locations shared between MCF10A, MCF7 and MDA-MB-231.

Figure 5

5

Distribution of ChIP regions over…

Figure 5

13

Distribution of ChIP regions over chromosomes for MCF7. ( A ) The blue…

Figure 5

Distribution of ChIP regions over chromosomes for MCF7. (A) The blue bars represent the percentages of the whole tiled or mappable regions in the chromosomes (genome background) and the red bars the percentages of the whole ChIP. These percentages are also marked right next to the bars. P-values for the significance of the relative enrichment of ChIP regions with respect to the gnome background are shown in parentheses next to the percentages of the red bars. (B) Relative enrichments of ChIP regions in important genomic features, such as promoters, immediate downstream of genes, and gene bodies, with respect to the genome background. (C) Pie chart showing how ChIP regions are distributed over important genomic features.

Figure 6

5

Motif analysis and distribution of…

Figure 6

13

Motif analysis and distribution of sites for MCF7 obtained from RSAT. ( A…

Figure 6

Motif analysis and distribution of sites for MCF7 obtained from RSAT. (A) Distribution of sites shows the position relative to the sequence center vs. the number of sites on the x and y axis respectively. (B) The number of sites predicted per peak is represented by a graph between the number of sites on the x-axis and the number of peaks on the y-axis. n and n_dcum stands for occurrences and decreasing cumulative occurrences (inclusive) respectively. (C) Forward and reverse sequence logo for LEF1 obtained using RSAT.

Figure 7

5

Distribution of ChIP regions over…

Figure 7

13

Distribution of ChIP regions over chromosomes for MDA-MB-231. ( A ) The blue…

Figure 7

Distribution of ChIP regions over chromosomes for MDA-MB-231. (A) The blue bars represent the percentages of the whole tiled or mappable regions in the chromosomes (genome background) and the red bars the percentages of the whole ChIP. These percentages are also marked right next to the bars. P-values for the significance of the relative enrichment of ChIP regions with respect to the gnome background are shown in parentheses next to the percentages of the red bars. (B) Relative enrichments of ChIP regions in important genomic features, such as promoters, immediate downstream of genes, and gene bodies, with respect to the genome background. (C) Pie chart showing how ChIP regions are distributed over important genomic features.

Figure 8

5

Motif analysis and distribution of…

Figure 8

13

Motif analysis and distribution of sites for MDA-MB-231 obtained from RSAT. ( A…

Figure 8

Motif analysis and distribution of sites for MDA-MB-231 obtained from RSAT. (A) Distribution of sites shows the position relative to the sequence center vs. the number of sites on the x and y axis respectively. (B) The number of sites predicted per peak is represented by a graph between the number of sites on the x axis and the number of peaks on the y axis. n and n_dcum stands for occurrences and decreasing cumulative occurrences (inclusive) respectively. (C) Forward and reverse sequence logo for MZF1 obtained using RSAT.

Figure 9

5

Network analysis of top 12…

Figure 9

13

Network analysis of top 12 genes with their key interactions for MCF7. (…

Figure 9

Network analysis of top 12 genes with their key interactions for MCF7. (A) Genes were represented as nodes while edges were used to represent interactions between genes. The top genes were highlighted in yellow. (B) Top 12 genes were plotted with name of genes on the x-axis and number of neighbors on the y-axis.

Figure 10

5

Network analysis of top 11…

Figure 10

13

Network analysis of top 11 genes with their key interactions for MDA-MB-231. (…

Figure 10

Network analysis of top 11 genes with their key interactions for MDA-MB-231. (A) Genes were represented as nodes while edges were used to represent interactions between genes. The top genes were highlighted in yellow. (B) Top 11 genes were plotted with name of genes on the x-axis and number of neighbors on the y-axis.

Figure 11

5

The estrogen signaling pathway. CREB…

Figure 11

13

The estrogen signaling pathway. CREB highlighted in red is among the most prominent…

Figure 11

The estrogen signaling pathway. CREB highlighted in red is among the most prominent oncotarget involved in its signaling, .

Figure 12

5

The ARF/p53 pathway. ZBTB7A (highlighted…

Figure 12

13

The ARF/p53 pathway. ZBTB7A (highlighted in red) is a part of the oncogenic…

Figure 12

The ARF/p53 pathway. ZBTB7A (highlighted in red) is a part of the oncogenic activators responsible for triggering the ARF pathway, .