My research interests are broadly in the
areas of network science, cheminformatics & bioinformatics,
graph querying and mining, and databases (recent papers).

NetworkScience

Network science is a new and emerging
scientific discipline that examines the interconnections among
diverse physical or engineered networks, information networks,
biological networks, cognitive and semantic networks, and social
networks. This field of science seeks to discover common
principles, algorithms and tools that govern network behavior.
The National Research Council defines Network Science as "the
study of network representations of physical, biological, and
social phenomena leading to predictive models of these
phenomena." My group is developing methodologies, algorithms,
and implementations needed for scalable, dynamic, and resilient
networks. Specific problems include querying composite networks,
modeling dynamic networks, sentiment analysis, analysis of
content and user behavior, discovering unusual patterns, and
sampling in composite networks.

Graph Querying and Mining

A number of scientific endeavors are
generating data that can be modeled as graphs: high-throughput
genome analysis, screening of chemical compounds, social
networks, and ecological networks and food webs. Mining
and analysis of these annotated and probabilistic graphs is
crucial for advancing the state of scientific research, accurate
modeling and analysis of existing systems, and engineering of
new systems. The goal of this research project is to develop a
set of scalable querying and mining tools for graph databases by
integrating techniques from the fields of databases,
bioinformatics, machine learning, and algorithms.

Bioinformatics

Intensive investigations over several decades have revealed the
functions of many individual genes, proteins, and pathways. There
has been an explosion of data of widely diverse types, arising
from genome-wide characterization of transcriptional profiles,
protein-protein interactions, genomic structure, genetic
phenotype, gene interactions, gene expression, and proteomics. We
are developing techniques that can integrate and analyze data from
multiple sources and models efficiently. One research thrust
quantifies phenotypic variation using image analysis and pattern
recognition tools, develops a causal model for gene regulatory
processes, and validates the model experimentally. In another
research thrust, high resolution images of molecules and cells are
being analyzed for understanding complex systems such as
localization of specific neuron types, branching patterns of
dendritic trees, and localization of molecules at the subcellular
level. These efforts are being augmented by a unique distributed
digital library of bio-molecular image data. Such searchable
databases will make it possible to optimally understand and
interpret the data, leading to a more complete and integrated
understanding of cellular structure, function and regulation.

Data Mining in Chemoinformatics and
Drug Discovery

Increased availability of large repositories
of chemical compounds and other biochemical data has created new
challenges and opportunities for data-mining in chemical
informatics and drug discovery: identification of active
substructures and compounds, prediction of physicochemical
properties and structure-activity relationships, diversity
analysis of compound collections, drug repurposing, and pathway
mining for identification of network fragments responsible for
disease progression. My group has developed several graph-based
and 3D-based methods for such analyses. These ideas are being
pursued by Acelot, Inc., a
local drug discovery startup.