Subscribe to the latest research through IGI Global's new InfoSci-OnDemand Plus

InfoSci®-OnDemand Plus, a subscription-based service, provides researchers the ability to access full-text content from over 100,000 peer-reviewed book chapters and 26,000+ scholarly journal articles covering 11 core subjects. Users can select articles or chapters that meet their interests and gain access to the full content permanently in their personal online InfoSci-OnDemand Plus library.

When ordering directly through IGI Global's Online Bookstore, receive the complimentary e-books for the first, second, and third editions with the purchase of the Encyclopedia of Information Science and Technology, Fourth Edition e-book.

InfoSci®-Journals Annual Subscription Price for New Customers: As Low As US$ 5,100

This collection of over 175 e-journals offers unlimited access to highly-cited, forward-thinking content in full-text PDF and HTML with no DRM. There are no platform or maintenance fees and a guarantee of no more than 5% increase annually.

Abstract

Clustering is one of the most important techniques, which group genes of similar expression pattern into a small number of meaningful homogeneous groups or clusters. Gene expression data has certain special characteristics and is a challenging research problem. There are many applications for clustering gene expression data. Clustering can be applied for genes called gene clustering. Hard clustering allows a gene to get placed in exactly one cluster and converges in local optima. Soft clustering approach allows gene to get placed in all the clusters with some membership values. As the hard clustering approach converges in local optimum, an evolutionary computation technique like swarm clustering is required to find the global optimum solution. This chapter studies swarm clustering techniques such as Particle Swarm Clustering K-Means, Cuckoo Search Clustering, Cuckoo Search Clustering with levy flight, harmony search, Fuzzy PSO and Ant Colony Optimization based Clustering for clustering gene expression data. Evaluation measures for clustering gene expression data are also discussed.

1. Introduction

The revolution in the development of DNA microarray technology for examining gene expression has created a new era for further exploration of living systems, source of disease and drug development (He & Hui, 2009). Clustering is concerned with representing a new cancer or disease as a new class. It involves analyzing a given set of gene expression profiles with the goal of discovering subgroups that share common features. It involves grouping together specimens that are based on the similarity of their expression profiles with regard to the genes represented on the array (Tarca, Romero, & Draghici, 2006). Clustering of microarray gene expression data helps to understand the gene functions, gene regulation and cellular processes (Daxin, Chaun, & Aidong, 2004). Genes in the same cluster exhibit similar expression patterns and are likely to be co-regulated. Clustering gene expression data emphases on finding new biological classes or refining the existing ones (Gregory & Pablo, 2003). Gene groups enable researchers to predict the functional role or regulatory control of a novel gene, based on the similarity in expression patterns of tissue samples collected from various people including healthy persons and people affected by cancer helps in effective classification of unknown samples which in turn can lead in the early diagnosis of diseases (Marcilio, Ivan, Daniel, Teresa, & Alaxander, 2008). According to Jiang et al., (2004), elucidating the patterns hidden in gene expression data offers a tremendous opportunity for enhanced understanding of functional genomics. In cancer studies, (Golub et al., 1999; Alon et al., 1999; Spellman et al., 1998; Eisen, Spellman, Brown, & Botstein, 1998; Wen et al., 1998) both gene expression, signatures for cell types and signatures for biological processes have been successfully identified by clustering (Alizadeh et al., 2000). GenClust is a gene based clustering approach which is capable of identifying clusters and sub-clusters of arbitrary shapes of any gene expression dataset is proposed (Sauravjyoti & Dhruba, 2010). A novel harmony search K-Means hybrid algorithm for clustering gene expression dataset is proposed by Abdul, Sebastian, & Madhu (2013). Fuzzy C-Means (Bezdek, 1981) and Genetic Algorithms (Bandyopadhyay, Mukhopadhyay, & Maulik, 2007; Maulik, Mukhopadhyay, & Bandyopadhyay, 2009) have been used effectively in clustering gene expression data. Lu, Lu, Fotouhi, Deng, & Brown, (2004) has applied Fast Genetic K-means Algorithm (FGKA) for clustering genes.