This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

DNA-based species identification is a challenging in the case of deficiency of reference database for a particular species and presence of many false sequences in the online reference repository. Total of a 650 base pair of cytochrome c oxidase subunit I (COI) gene were sequenced from 10 samples each of Puntius chola and Puntius sophore from Uttarakhand, India. In NCBI data base, all samples were identified as the targeted species but some other sequences (n=3) which have been tagged as of different species were also found. We used four different DNA sequence based computational methods that allows the successful species identification where single method may lead misidentification. Therefore we propose use of different computational methods in assigning species identification to avoid false identification. These methods also enable us to identify the three sequence that was tagged as different species which was the actual sequence of the P. chola and P. sophore in the NCBI database.

Mostly the species identification rely on the online reference database where the identification of species is not possible through morphology or samples used in wildlife forensics that morphology such as bones, skins, hair and fish scales that are used for different purposes (TRAFFIC, 2011). Various mitochondrial primers have been used for the species identification (Ward et al., 2005; Guha et al., 2006). Apart from these markers, COI gene has been popularly used and selected as the barcode region for members of the animal kingdom (Hebert et al., 2003a; Ward et al., 2005; Hajibabaei et al., 2006a). DNA barcoding practices have been done covering a large number of species (Hogg and Hebert, 2004; Marshall, 2005). Since this gene has sufficient variation to distinguish species, the barcode region sharing has been found between few congeneric species, largely among taxa that are known to hybridize (Zou et al., 2012). As many of fish species such as genus Puntius contains more than 130 valid species with the many of its hidden species or species complex (Jayaram, 1991; Pethiyagoda et al., 2012) which make it difficult to identify the species simply using single methods applied in DNA database. Sometime this provide false results using the online reference repository (Koski and Golding, 2001; van Velzen et al., 2012). In addition, avoiding the chance of false positive different statistical methods used in the species identification eg., BLAST, phylogenetic and character based methods as these methods has been practiced to accurate species identification (Sarkar et al., 2002; Galan et al., 2012; van Velzen et al., 2012). The present study is aimed to identify the two species of Puntius using the different computational methods e.g., distance, phylogenetic, (topology) similarity based and character attribute (CA) that used in the species identification as single methods may provide sometime false results (van Velzen et al., 2012; Negi et al., 2016).

1 Results

Puntius chola (n=9) and P. sophore (n=10) were sequenced successfully using COI gene of 650 bp. All the samples generated with good quality of sequences and identify with 100% similarity of P. chola and the P. sophore respectively in the NCBI blast in which, three ambiguous sequences were also found which are tagged as different species. Total 8 parsimonial sites were observed (Table 1). The sequence generated in the present study were showing one pure character attribute at position 5676 as T for the P. chola (Table 1). The sequence of the Puntius fraseri was showing only one base pair sequence variation at the position of T into C at 5675. All the species used for the species identification showing the sequence divergence between the 0.000-0.120. Also used the P. sophore samples for the identification which were showing the sequence divergence of 12% with P. chola. Whereas, mean sequence divergence between the P. fraseri and P. chola were found only 0.01 (1%) (Table 2).

In the phylogenetic identification, all the sequences of the P. chola clustered in the single clade or haplogroup with 100% bootstrap value which further splits into the intra-species clade of different sequence (Figure 1). In addition, there were two sequences of wrong tags were also identified, one as P. chola which was named as P. conchonius (JN965201.1) and P. sophore named as P. chola (JX260945.1).

Table 1 Character attributes observed in the different species of genus Punitus nucleotide position noted from complete genome of Pethia ticto

Note: “.”Represented the similarity of sequences and species name in bold are wrong species tags; PCHD= Pethia chola; PSB1= Puntius sophore

Table 2Pairwise sequence divergence in the different species of the Puntius

Figure 1 Phylogenetic identification using neighbour-joining tree constructed with a bootstrap value of 1000 and P. sophore as the out group using the MEGA. 6 software package

2 Discussion

Minimum sequence divergence threshold level which has been generally used (Ward et al., 2009) in fish species identification (2%) and it was not followed by P. fraseri sequences and therefore more sequences of this species are needed to clarify with more known reference samples. Puntius fraseri is endangered species in the IUCN and endemic to the Western Ghats of India that found in Darna river, an upper tributary of the Godavari river system (Dahanukar, 2013).

The P. chola sequences (JX260945.1) is showing only 1% sequence divergence with P. sophore which is fall within species sequence divergence and the authenticity of this sequences need to clarify with suitable reference sequences as many times the false sequence may have been submitted (Negi et al., 2016; Present study). However, there is low inter-species sequence divergence were found with some other species of fish which are morphologically identified as separate species (Mabragana et al., 2011). Similarly, there were more studies where the low inter genetic divergence observed in the some other species of Puntius (John et al., 2013; Pereira et al., 2013; Kannan et al., 2014).

3 Conclusion

Present study come out with two important conclusions first, the NCBI sequences data base contained some ambiguous sequences of P. conchonius and P. chola which may be submitted as wrong species tags which may provide the false positive or negative (Koski and Golding, 2001). Only one sequence available in NCBI of P. fraseri showing very close proximity with the P. chola with the sequence divergence of (1%). However, this species categorized as endangered and need to take care while assigning species identification in forensics purposes as many of Punitus are in the aquarium trade.

4 Materials and Methods

Samples of fin clip tissue of P. chola (n=10) and P. sophore (n=10) were collected from Kho River (29.73N 78.52E), Kotdwar, Uttarakhand, India. Genomic DNA was obtained from these samples using a QIAamp blood and tissue kit according to the manufacturer’s protocol (Qiagen kit, Germany). A fragment (650 bp) was amplified using mitochondrial COI gene (Folmer et al., 1994) from DNA. Polymerase chain reaction amplification was carried out using the PCR master mix of the 15 µl reaction volume was 1.5× PCR buffer, 2.5 mM MgCl2, 200 µM dNTP, 0.4 µM of each primer, 0.5 U Taq gold polymerase (MBI Fermantas) and 40 ng of genomic DNA. The PCR thermal cycling parameters included initial denaturation of 94°C for 2 minutes, 94°C for 45 seconds, annealing at 45°C for 1 minute and 72°C for 1 minute, with one cycle of a final extension for 20 minutes at 72°C. PCR amplification was checked by loading 4 μl of the reaction mixture on a 2% (w/v) agarose gel. Amplified PCR products were then processed for cycle sequencing PCR with respective forward primers using a master mixture of the composition suggested by Applied Biosystems. These products were then subjected to DNA sequencing on the ABI 3130 Genetic Analyzer.

5 Data Analysis

The quality of the sequences were checked on the Sequencher 4.7 (Gene Codes, USA). The obtained sequences were edited and cleaned, and 550 bp of clean sequence were used. ClustalW multiple alignment (CMA) in Bioedit Vr. 7.0.9.0 (Hall, 1999) were used for further analysis. The generated sequences were compared with the published sequences of the COI gene and verified in the NCBI. The phylogenetic identification were done using neighbour-joining (NJ) tree applying a bootstrap value of 1000 in MEGA 6 software package (Tamura et al., 2013). Character attributes were identified manually to find the specific nucleotide in the data sets. To obtained character attributes, all the partial sequences were aligned with complete genome of Pethia conchonius (NC022856), and only those character or nucleotide have noticed which are specific to one species following Sarkar et al., (2002). Sequence divergence were calculated and identify those sequences which found lower than threshold level that used to define the species boundary (Ward et al., 2009), and those sequences showing lower sequences divergence were noticed for the character attribute (specific nucleotide) which can differentiate them. False negative, which are the sequence of target species but named as different species were also identified (van Velzen et al., 2012; Negi et al., 2016).

Acknowledgements

Authors would like to thanks to Dean, Life sciences and HOD of the Department for their consistent support during the study. Authors also thank the Director and Dean, Wildlife Institute of India, for encouraging this work. Authors also would like to express gratitude to researcher for sharing their valuable knowledge with us.