Past Research

I worked as a Master Student, focusing on the research of protein evolutionary and functional information. How these information is encoded in the local conservation of amino-acid residues.
I examined the sequence conservation and position of protein family signatures or motifs for the annotation of protein sequence and to facilitate the analysis of their domains. Developed Bioinformatic
tool for remote homology detection, which arises through circular permutation and discontinuous domains and thus becomes difficult to associate. It is also helpful in the detecting small domain proteins, which
have few conserved residues.

There is vast information available on the physico-chemical characteristics of amino-acids that can lead to a greater understanding of protein sequence.
I have used wavelet transforms to decompose protein sequences, represented numerically by different amino-acid property indices (such as polarity, accessible
surface area etc). The numerical representation of a protein sequence has significant correlation with its biological activity, thus common motifs are expected
to be observable from the wavelet spectrum. The decomposed signals (i.e the numerically represented residues of protein) are then up-sampled and similarity
search techniques are used to identify similar regions across all the proteins at multiple scales. I have explored the substrate specificity of low sequence
identity in SAM-Methyltransferease by signal representation of protein sequences and result indicate that wavelet transform techniques are
promising for motif detection.

The overall objective of my docoroal thesis is to analyse and understand the intricate network of protein interactions inside the cell.
Proteins are molecular machines, which interact and communicate to perform different cellular functions. Research effort in molecular and
cellular biology enables the detection of molecular interactions on a large scale. The experimental results generated by high-throughput studies
are archived in various public databases. In this study, statistical and computational approach is used to integrate information from relative
inhomogeneous data sources (public databases) derived from high-throughput experiments. Further, the integrated approach is used to explore
the relationships within the interacting protein pairs. Graph-based network model is used to determine the protein relationships based on gene
ontology (GO) biological processes, molecular functions and cellular components.Application of the network study is further demonstrated using ovarian tumour samples.
Gene Expression data from the TCGA (The Cancer Genome Atlas) dataset were collected to encode the functional attributes in a Boolean logic framework for the identification
of potential genes in the prognosis and therapy risk assessment in the human diseased condition. The differentially expressed genes were then validated in a co-expression network
derived from the ovarian samples deposited in the GEO (Gene Expression Omnibus).

Design and developed by:Gaurav Kumar
Last modified: Tuesday, May 20th 2012
Comment to Webmaster, gaurav@gauravkumar.org