Neuraminidases, also called sialidases, catalyze the hydrolysis of terminal sialic acid residues from the newly formed virions and from the host cell receptors.[3] Sialidase activities include assistance in the mobility of virus particles through the respiratory tract mucus and in the elution of virion progeny from the infected cell.[4][5]

Swiss-Prot lists 137 types of neuraminidase from various species as of October 18, 2006.[6] Nine subtypes of influenza neuraminidase are known; many occur only in various species of duck and chicken. Subtypes N1 and N2 have been positively linked to epidemics in man, and strains with N3 or N7 subtypes have been identified in a number of isolated deaths.[citation needed]

Influenza neuraminidase is a mushroom-shaped projection on the surface of the influenza virus. It has a head consisting of four co-planar and roughly spherical subunits, and a hydrophobic region that is embedded within the interior of the virus' membrane, it comprises a single polypeptide chain that is oriented in the opposite direction to the hemagglutinin antigen. The composition of the polypeptide is a single chain of six conserved polar amino acids, followed by hydrophilic, variable amino acids. β-Sheets predominate as the secondary level of protein conformation.

Recent emergence of oseltamivir and zanamivir resistant human influenza A(H1N1) H274Y has emphasized the need for suitable expression systems to obtain large quantities of highly pure and stable, recombinant neuraminidase through two separate artificial tetramerization domains that facilitate the formation of catalytically active neuraminidase homotetramers from yeast and Staphylothermus marinus, which allow for secretion of FLAG-taggedproteins and further purification.[7]

The enzymatic mechanism of influenza virus sialidase has been studied by Taylor et al., shown in Figure 1. The enzyme catalysis process has four steps, the first step involves the distortion of the α-sialoside from a 2C5 chair conformation (the lowest-energy form in solution) to a pseudoboat conformation when the sialoside binds to the sialidase. The second step leads to an oxocarbocation intermediate, the sialosyl cation, the third step is the formation of Neu5Ac initially as the α-anomer, and then mutarotation and release as the more thermodynamically-stable β-Neu5Ac.[8]

There are two major proteins on the surface of influenza virus particles. One is the lectin haemagglutinin protein with three relatively shallow sialic acid-binding sites and the other is enzyme sialidase with the active site in a pocket, because of the relative deep active site in which low-molecular-weight inhibitors can make multiple favorable interactions and approachable methods of designing transition-state analogues in the hydrolysis of sialosides, the sialidase becomes more attractive anti-influenza drug target than the haemagglutinin.[9] After the X-ray crystal structures of several influenza virus sialidases were available, the structure-based inhibitor design was applied to discover potent inhibitors of this enzyme.[10]

Many Neu5Ac2en-based compounds have been synthesized and tested for their influenza virus sialidase inhibitory potential, for example: The 4-substituted Neu5Ac2en derivatives (Figure 3), 4-amino-Neu5Ac2en (Compound 1), which showed two orders of magnitude better inhibition of influenza virus sialidase than Neu5Ac2en5 and 4-guanidino-Neu5Ac2en (Compound 2), known as Zanamivir, which is now marketed for treatment of influenza virus as a drug, have been designed by von Itzstein and coworkers.[12] A series of amide-linked C9 modified Neu5Ac2en have been reported by Megesh and colleagues as NEU1 inhibitors.[13]

1.
Glycoside hydrolase
–
Glycoside hydrolases assist in the hydrolysis of glycosidic bonds in complex sugars. Together with glycosyltransferases, glycosidases form the catalytic machinery for the synthesis. Glycoside hydrolases are found in all domains of life. In prokaryotes, they are both as intracellular and extracellular enzymes that are largely involved in nutrient acquisition. One of the important occurrences of glycoside hydrolases in bacteria is the enzyme beta-galactosidase, deficiency in specific lysosomal glycoside hydrolases can lead to a range of lysosomal storage disorders that result in developmental problems or death. Glycoside hydrolases are found in the tract and in saliva where they degrade complex carbohydrates such as lactose, starch. In the gut they are found as glycosylphosphatidyl anchored enzymes on endothelial cells, the enzyme O-GlcNAcase is involved in removal of N-acetylglucosamine groups from serine and threonine residues in the cytoplasm and nucleus of the cell. The glycoside hydrolases are involved in the biosynthesis and degradation of glycogen in the body, glycoside hydrolases are classified into EC3.2.1 as enzymes catalyzing the hydrolysis of O- or S-glycosides. Glycoside hydrolases can also be classified according to the outcome of the hydrolysis reaction. Glycoside hydrolases can also be classified as exo or endo acting, dependent upon whether they act at the end or in the middle, respectively, glycoside hydrolases may also be classified by sequence or structure based methods. Sequence-based classifications are among the most powerful method for suggesting function for newly sequenced enzymes for which function has not been biochemically demonstrated. A classification system for glycosyl hydrolases, based on similarity, has led to the definition of more than 100 different families. This classification is available on the CAZy web site, the database provides a series of regularly updated sequence based classification that allow reliable prediction of mechanism, active site residues and possible substrates. The online database is supported by CAZypedia, an encyclopedia of carbohydrate active enzymes. Based on three-dimensional structural similarities, the families have been classified into clans of related structure. Recent progress in glycosidase sequence analysis and 3D structure comparison has allowed the proposal of a hierarchical classification of the glycoside hydrolases. Again, two residues are involved, which are usually enzyme-borne carboxylates, one acts as a nucleophile and the other as an acid/base. In the first step the nucleophile attacks the centre, resulting in the formation of a glycosyl enzyme intermediate

2.
Enzyme
–
Enzymes /ˈɛnzaɪmz/ are macromolecular biological catalysts. Enzymes accelerate, or catalyze, chemical reactions, the molecules at the beginning of the process upon which enzymes may act are called substrates and the enzyme converts these into different molecules, called products. Almost all metabolic processes in the cell need enzymes in order to occur at rates fast enough to sustain life, the set of enzymes made in a cell determines which metabolic pathways occur in that cell. The study of enzymes is called enzymology, enzymes are known to catalyze more than 5,000 biochemical reaction types. Most enzymes are proteins, although a few are catalytic RNA molecules, enzymes specificity comes from their unique three-dimensional structures. Like all catalysts, enzymes increase the rate of a reaction by lowering its activation energy, some enzymes can make their conversion of substrate to product occur many millions of times faster. An extreme example is orotidine 5-phosphate decarboxylase, which allows a reaction that would take millions of years to occur in milliseconds. Chemically, enzymes are like any catalyst and are not consumed in chemical reactions, enzymes differ from most other catalysts by being much more specific. Enzyme activity can be affected by other molecules, inhibitors are molecules that decrease enzyme activity, many drugs and poisons are enzyme inhibitors. An enzymes activity decreases markedly outside its optimal temperature and pH, some enzymes are used commercially, for example, in the synthesis of antibiotics. French chemist Anselme Payen was the first to discover an enzyme, diastase and he wrote that alcoholic fermentation is an act correlated with the life and organization of the yeast cells, not with the death or putrefaction of the cells. In 1877, German physiologist Wilhelm Kühne first used the term enzyme, the word enzyme was used later to refer to nonliving substances such as pepsin, and the word ferment was used to refer to chemical activity produced by living organisms. Eduard Buchner submitted his first paper on the study of yeast extracts in 1897, in a series of experiments at the University of Berlin, he found that sugar was fermented by yeast extracts even when there were no living yeast cells in the mixture. He named the enzyme that brought about the fermentation of sucrose zymase, in 1907, he received the Nobel Prize in Chemistry for his discovery of cell-free fermentation. Following Buchners example, enzymes are usually named according to the reaction they carry out, the biochemical identity of enzymes was still unknown in the early 1900s. Sumner showed that the enzyme urease was a protein and crystallized it. These three scientists were awarded the 1946 Nobel Prize in Chemistry, the discovery that enzymes could be crystallized eventually allowed their structures to be solved by x-ray crystallography. This high-resolution structure of lysozyme marked the beginning of the field of structural biology, an enzymes name is often derived from its substrate or the chemical reaction it catalyzes, with the word ending in -ase

3.
Glycosidic bond
–
In chemistry, a glycosidic bond or glycosidic linkage is a type of covalent bond that joins a carbohydrate molecule to another group, which may or may not be another carbohydrate. A glycosidic bond is formed between the hemiacetal or hemiketal group of a saccharide and the group of some compound such as an alcohol. A substance containing a glycosidic bond is a glycoside, glycosidic bonds of the form discussed above are known as O-glycosidic bonds, in reference to the glycosidic oxygen that links the glycoside to the aglycone or reducing end sugar. In analogy, one also considers S-glycosidic bonds, where the oxygen of the bond is replaced with a sulfur atom. In the same way, N-glycosidic bonds, have the glycosidic oxygen replaced with nitrogen. Substances containing N-glycosidic bonds are known as glycosylamines. C-glycosyl bonds have the oxygen replaced by a carbon, the term C-glycoside is considered a misnomer by IUPAC and is discouraged. All of these modified glycosidic bonds have different susceptibility to hydrolysis, one distinguishes between α- and β-glycosidic bonds based on the relative stereochemistry of the anomeric position and the stereocenter furthest from C1 in the saccharide. An α-glycosidic bond is formed when both carbons have the same stereochemistry, whereas a β-glycosidic bond occurs when the two carbons have different stereochemistry. One complicating issue is that the alpha and beta conformations were originally defined based on the orientation of the major constituents in a Haworth projection. In this case, for D-sugars, a beta conformation would see the major constituent at each carbon drawn above the plane of the ring, for L-sugars, the definitions would then, necessarily, reverse. This is worth noting as these older definitions still permeate the literature, pharmacologists often join substances to glucuronic acid via glycosidic bonds in order to increase their water solubility, this is known as glucuronidation. Many other glycosides have important physiological functions, Nüchter et al. have shown a new approach to Fischer Glycosylation. Employing a microwave oven equipped with refluxing apparatus in a reactor with pressure bombs, Nüchter et al. were able to achieve 100% yield of α-. This method can be performed on a multi-kilogram scale, ondruschka, and W. Lautenschläger, Synthetic Communications 31,1277. Vishal Y Joshis method Joshi et al, d-glucose is first protected by forming the peracetate by addition of acetic anhydride in acetic acid, and then addition of hydrogen bromide which brominates at the 5-position. On addition of the alcohol ROH and lithium carbonate, the OR replaces the bromine and it was suggested by Joshi et al. that lithium acts as the nucleophile that attacks the carbon at the 5-position and through a transition state the alcohol is substituted for the bromine group. Glycoside hydrolases, are enzymes that break glycosidic bonds, glycoside hydrolases typically can act either on α- or on β-glycosidic bonds, but not on both

4.
Viral neuraminidase
–
Viral neuraminidase is a type of neuraminidase found on the surface of influenza viruses that enables the virus to be released from the host cell. Neuraminidases are enzymes that cleave sialic acid groups from glycoproteins and are required for influenza virus replication, when influenza virus replicates, it attaches to the interior cell surface using hemagglutinin, a molecule found on the surface of the virus that binds to sialic acid groups. Sialic acids are found on various glycoproteins at the host cell surface, in order for the virus to be released from the cell, neuraminidase must enzymatically cleave the sialic acid groups from host glycoproteins. Since the cleavage of the groups is an integral part of influenza replication. A single hemagglutinin-neuraminidase protein can combine neuraminidase and hemagglutinin functions, such as in mumps virus, the enzyme helps viruses to be released from a host cell. Influenza virus membranes contain two glycoproteins, hemagglutinin and neuraminidase, while the hemagglutinin on the surface of the virion is needed for infection, its presence inhibits release of the particle after budding. Viral neuraminidase cleaves terminal neuraminic acid residues from glycan structures on the surface of the infected cell and this promotes the release of progeny viruses and the spread of the virus from the host cell to uninfected surrounding cells. Neuraminidase also cleaves sialic acid residues from viral proteins, preventing aggregation of viruses, Neuraminidase has been targeted in structure-based enzyme inhibitor design programmes that have resulted in the production of two drugs, zanamivir and oseltamivir. Administration of neuraminidase inhibitors is a treatment that limits the severity, on February 27,2005, a 14-year-old Vietnamese girl was documented to be carrying an H5N1 influenza virus strain that was resistant to the drug oseltamivir. The drug is used to treat patients that have contracted influenza, however, the Vietnamese girl who had received a prophylactic dose was found to be non-responsive to the medication. In growing fears of an avian flu pandemic, scientists began to look for a cause of resistance to the Tamiflu medication. The cause was determined to be a substitution at position 274 in its neuraminidase protein. A new class of inhibitors that covalently attach to the enzyme have shown activity against drug-resistant virus in vitro. In ideal circumstances, influenza virus neuraminidase should act on the type of receptor the virus hemagglutinin binds to. It is not quite clear how the virus manages to function there is no close match between the specificities of NA and HA. Neuraminidase enzymes can have endo- or exo-glycosidase activity, and are classified as EC3.2.1.29, H5N1 genetic structure Antigenic shift Influenza research Hemagglutinin Influenza Research Database Database of influenza sequences. Proteopedia Influenza Neuraminidase, Tamiflu and Relenza Avian Influenza Neuraminidase, Tamiflu and Relenza

5.
Human genome
–
The human genome is the complete set of nucleic acid sequence for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. Human genomes include both protein-coding DNA genes and noncoding DNA, haploid human genomes, which are contained in germ cells consist of three billion DNA base pairs, while diploid genomes have twice the DNA content. The Human Genome Project produced the first complete sequences of human genomes, with the first draft sequence. The human genome was the first of all vertebrates to be completely sequenced, as of 2012, thousands of human genomes have been completely sequenced, and many more have been mapped at lower levels of resolution. The resulting data are used worldwide in biomedical science, anthropology, forensics, there is a widely held expectation that genomic studies will lead to advances in the diagnosis and treatment of diseases, and to new insights in many fields of biology, including human evolution. Although the sequence of the genome has been completely determined by DNA sequencing. There are an estimated 19, 000-20,000 human protein-coding genes, in June 2016, scientists formally announced HGP-Write, a plan to synthesize the human genome. The total length of the genome is over 3 billion base pairs. The genome is organized into 22 paired chromosomes, plus the X chromosome and, in males only and these are all large linear DNA molecules contained within the cell nucleus. The genome also includes the mitochondrial DNA, a small circular molecule present in each mitochondrion. Basic information about these molecules and their content, based on a reference genome that does not represent the sequence of any specific individual, are provided in the following table. Chromosome lengths were estimated by multiplying the number of base pairs by 0.34 nanometers, variations are unique DNA sequence differences that have been identified in the individual human genome sequences analyzed by Ensembl as of December,2016. The number of identified variations is expected to increase as further personal genomes are sequenced and analyzed, in addition to the gene content shown in this table, a large number of non-expressed functional sequences have been identified throughout the human genome. Links open windows to the reference chromosome sequences in the EBI genome browser, small non-coding RNAs are RNAs of as many as 200 bases that do not have protein-coding potential. These include, microRNAs, or miRNAs, small nuclear RNAs, or snRNAs, long non-coding RNAs are RNA molecules longer than 200 bases that do not have protein-coding potential. Although the human genome has been sequenced for all practical purposes. A recent study noted more than 160 euchromatic gaps of which 50 gaps were closed, however, there are still numerous gaps in the heterochromatic parts of the genome which is much harder to sequence due to numerous repeats and other intractable sequence features. The content of the genome is commonly divided into coding and noncoding DNA sequences

6.
NEU1
–
Sialidase 1, also known as NEU1 is a mammalian lysosomal neuraminidase enzyme which in humans is encoded by the NEU1 gene. The protein encoded by this gene encodes the enzyme, which cleaves terminal sialic acid residues from substrates such as glycoproteins. In the lysosome, this enzyme is part of a complex together with beta-galactosidase. Mutations in this gene can lead to sialidosis, deficiencies in the human enzyme NEU1 leads to sialidosis, a rare lysosomal storage disease. Sialidase has also shown to enhance recovery from spinal cord contusion injury when injected in rats. NEU1 has been shown to interact with Cathepsin A

7.
NEU3
–
Sialidase-3 is an enzyme that in humans is encoded by the NEU3 gene. This gene product belongs to a family of enzymes which remove sialic acid residues from glycoproteins. It is localized in the membrane, and its activity is specific for gangliosides. It may play a role in modulating the ganglioside content of the lipid bilayer, NEU3 has been shown to interact with Grb2

8.
BRENDA
–
BRENDA is an information system representing one of the most comprehensive enzyme repositories. It is a resource that comprises molecular and biochemical information on enzymes that have been classified by the IUBMB. Every classified enzyme is characterized with respect to its catalyzed biochemical reaction, kinetic properties of the corresponding reactants are described in detail. BRENDA contains enzyme-specific data manually extracted from scientific literature and additional data derived from automatic information retrieval methods such as text mining. It provides a user interface that allows a convenient and sophisticated access to the data. BRENDA was founded in 1987 at the former German National Research Centre for Biotechnology in Braunschweig and was published as a series of books. Its name was originally an acronym for the Braunschweig Enzyme Database, from 1996 to 2007, BRENDA was located at the University of Cologne. There, BRENDA developed into a publicly accessible enzyme information system, in 2007, BRENDA returned to Braunschweig. Currently, BRENDA is maintained and further developed at the Department of Bioinformatics, a major update of the data in BRENDA is performed twice a year. Besides the upgrade of its content, improvements of the interface are also incorporated into the BRENDA database. The latest update was performed in January 2015, Database, The database contains more than 40 data fields with enzyme-specific information on more than 7000 EC numbers that are classified according to the IUBMB. Currently, BRENDA contains manually annotated data from over 140,000 different scientific articles, each enzyme entry is clearly linked to at least one literature reference, to its source organism, and, where available, to the protein sequence of the enzyme. An important part of BRENDA represent the more than 110,000 enzyme ligands, the term ligand is used in this context to all low molecular weight compounds which interact with enzymes. These include not only metabolites of primary metabolism, co-substrates or cofactors, the origin of these molecules ranges from naturally occurring antibiotics to synthetic compounds that have been synthesized for the development of drugs or pesticides. Furthermore, cross-references to external resources such as sequence and 3D-structure databases. Extensions, Since 2006, the data in BRENDA is supplemented with information extracted from the literature by a co-occurrence based text mining approach. For this purpose, four text-mining repositories FRENDA, AMENDA, DRENDA and KENDA were introduced and these text-mining results were derived from the titles and abstracts of all articles in the literature database PubMed. Data access, There are several tools to access to the data in BRENDA

9.
MetaCyc
–
The MetaCyc database contains extensive information on metabolic pathways and enzymes from many organisms. MetaCyc is also used in engineering and metabolomics research. MetaCyc contains extensive data on individual enzymes, describing their subunit structure, cofactors, activators and inhibitors, substrate specificity, MetaCyc data on reactions includes predicted atom mappings that describe the correspondence between atoms in the reactant compounds and the product compounds. It also provides enzyme mini-reviews and literature references, MetaCyc data on metabolites includes chemical structures, predicted Gibbs free energies of formation, and links to external databases

10.
Protein Data Bank
–
The Protein Data Bank is a crystallographic database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The PDB is overseen by a called the Worldwide Protein Data Bank. The PDB is a key resource in areas of structural biology, most major scientific journals, and some funding agencies, now require scientists to submit their structure data to the PDB. Many other databases use protein structures deposited in the PDB, for example, SCOP and CATH classify protein structures, while PDBsum provides a graphic overview of PDB entries using information from other sources, such as Gene ontology. By 1971, one of Meyers programs, SEARCH, enabled researchers to access information from the database to study protein structures offline. SEARCH was instrumental in enabling networking, thus marking the beginning of the PDB. Upon Hamiltons death in 1973, Tom Koeztle took over direction of the PDB for the subsequent 20 years, in January 1994, Joel Sussman of Israels Weizmann Institute of Science was appointed head of the PDB. In October 1998, the PDB was transferred to the Research Collaboratory for Structural Bioinformatics, the new director was Helen M. Berman of Rutgers University. In 2003, with the formation of the wwPDB, the PDB became an international organization, the founding members are PDBe, RCSB, and PDBj. Each of the four members of wwPDB can act as deposition, data processing, the data processing refers to the fact that wwPDB staff review and annotate each submitted entry. The data are automatically checked for plausibility. The PDB database is updated weekly, likewise, the PDB holdings list is also updated weekly. As of 14 March 2017, the breakdown of current holdings is as follows,103,514 structures in the PDB have a structure factor file,9,057 structures have an NMR restraint file. 2,826 structures in the PDB have a chemical shifts file, therefore, the final conformation of the protein is obtained, in the latter case, by solving a distance geometry problem. A few proteins are determined by cryo-electron microscopy, the significance of the structure factor files, mentioned above, is that, for PDB structures determined by X-ray diffraction that have a structure file, the electron density map may be viewed. The data of such structures is stored on the electron density server, however, since 2007, the rate of accumulation of new protein structures appears to have plateaued. The file format used by the PDB was called the PDB file format. This original format was restricted by the width of computer punch cards to 80 characters per line, around 1996, the macromolecular Crystallographic Information file format, mmCIF, which is an extension of the CIF format started to be phased in

11.
PubMed
–
PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine at the National Institutes of Health maintains the database as part of the Entrez system of information retrieval, from 1971 to 1997, MEDLINE online access to the MEDLARS Online computerized database primarily had been through institutional facilities, such as university libraries. PubMed, first released in January 1996, ushered in the era of private, free, home-, the PubMed system was offered free to the public in June 1997, when MEDLINE searches via the Web were demonstrated, in a ceremony, by Vice President Al Gore. Information about the journals indexed in MEDLINE, and available through PubMed, is found in the NLM Catalog. As of 5 January 2017, PubMed has more than 26.8 million records going back to 1966, selectively to the year 1865, and very selectively to 1809, about 500,000 new records are added each year. As of the date,13.1 million of PubMeds records are listed with their abstracts. In 2016, NLM changed the system so that publishers will be able to directly correct typos. Simple searches on PubMed can be carried out by entering key aspects of a subject into PubMeds search window, when a journal article is indexed, numerous article parameters are extracted and stored as structured information. Such parameters are, Article Type, Secondary identifiers, Language, publication type parameter enables many special features. As these clinical girish can generate small sets of robust studies with considerable precision, since July 2005, the MEDLINE article indexing process extracts important identifiers from the article abstract and puts those in a field called Secondary Identifier. The secondary identifier field is to store numbers to various databases of molecular sequence data, gene expression or chemical compounds. For clinical trials, PubMed extracts trial IDs for the two largest trial registries, ClinicalTrials. gov and the International Standard Randomized Controlled Trial Number Register, a reference which is judged particularly relevant can be marked and related articles can be identified. If relevant, several studies can be selected and related articles to all of them can be generated using the Find related data option, the related articles are then listed in order of relatedness. To create these lists of related articles, PubMed compares words from the title and abstract of each citation, as well as the MeSH headings assigned, using a powerful word-weighted algorithm. The related articles function has been judged to be so precise that some researchers suggest it can be used instead of a full search, a strong feature of PubMed is its ability to automatically link to MeSH terms and subheadings. Examples would be, bad breath links to halitosis, heart attack to myocardial infarction, where appropriate, these MeSH terms are automatically expanded, that is, include more specific terms. Terms like nursing are automatically linked to Nursing or Nursing and this important feature makes PubMed searches automatically more sensitive and avoids false-negative hits by compensating for the diversity of medical terminology. The My NCBI area can be accessed from any computer with web-access, an earlier version of My NCBI was called PubMed Cubby

12.
National Center for Biotechnology Information
–
The National Center for Biotechnology Information is part of the United States National Library of Medicine, a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper, the NCBI houses a series of databases relevant to biotechnology and biomedicine and is an important resource for bioinformatics tools and services. Major databases include GenBank for DNA sequences and PubMed, a database for the biomedical literature. Other databases include the NCBI Epigenomics database, all these databases are available online through the Entrez search engine. NCBI is directed by David Lipman, one of the authors of the BLAST sequence alignment program. He also leads a research program, including groups led by Stephen Altschul, David Landsman, Eugene Koonin, John Wilbur, Teresa Przytycka. NCBI is listed in the Registry of Research Data Repositories re3data. org, NCBI has had responsibility for making available the GenBank DNA sequence database since 1992. GenBank coordinates with individual laboratories and other databases such as those of the European Molecular Biology Laboratory. Since 1992, NCBI has grown to other databases in addition to GenBank. The NCBI assigns a unique identifier to each species of organism, the NCBI has software tools that are available by WWW browsing or by FTP. For example, BLAST is a sequence similarity searching program, BLAST can do sequence comparisons against the GenBank DNA database in less than 15 seconds. RAG2/IL2RG The NCBI Bookshelf is a collection of freely accessible, downloadable, some of the books are online versions of previously published books, while others, such as Coffee Break, are written and edited by NCBI staff. BLAST is a used for calculating sequence similarity between biological sequences such as nucleotide sequences of DNA and amino acid sequences of proteins. BLAST is a tool for finding sequences similar to the query sequence within the same organism or in different organisms. It searches the query sequence on NCBI databases and servers and post the results back to the browser in chosen format. Input sequences to the BLAST are mostly in FASTA or Genbank format while output could be delivered in variety of such as HTML, XML formatting. HTML is the output format for NCBIs web-page. Entrez is both indexing and retrieval system having data from sources for biomedical research

13.
Sialic acid
–
Sialic acid is a generic term for the N- or O-substituted derivatives of neuraminic acid, a monosaccharide with a nine-carbon backbone. It is also the name for the most common member of this group, Sialic acids are found widely distributed in animal tissues and to a lesser extent in other organisms, ranging from plants and fungi to yeasts and bacteria, mostly in glycoproteins and gangliosides. That is because it seems to have appeared late in evolution, however, it has been observed in Drosophila embryos and other insects and in the capsular polysaccharides of certain strains of bacteria. In humans the brain has the highest sialic acid concentration, where these acids play an important role in neural transmission, in general, the amino group bears either an acetyl or a glycolyl group, but other modifications have been described. The hydroxyl substituents may vary considerably, acetyl, lactyl, methyl, sulfate, the term sialic acid was first introduced by Swedish biochemist Gunnar Blix in 1952. The sialic acid family includes 43 derivatives of the nine-carbon sugar neuraminic acid, the numbering of the sialic acid structure begins at the carboxylate carbon and continues around the chain. The configuration that places the carboxylate in the position is the alpha-anomer. The alpha-anomer is the form that is found when sialic acid is bound to glycans, however, in solution, it is mainly in the beta-anomeric form. Sialic acid is synthesized by glucosamine 6 phosphate and acetyl CoA through a transferase and this becomes N-acetylmannosamine-6-P through epimerization, which reacts with phosphoenolpyruvate producing N-acetylneuraminic-9-P. This compound is synthesized in the nucleus of the animal cell, in bacterial systems, sialic acids are biosynthesized by an aldolase enzyme. The enzyme uses a derivative as a substrate, inserting three carbons from pyruvate into the resulting sialic acid structure. These enzymes can be used for synthesis of sialic acid derivatives. Sialic acid-rich glycoproteins bind selectin in humans and other organisms, metastatic cancer cells often express a high density of sialic acid-rich glycoproteins. This overexpression of sialic acid on surfaces creates a charge on cell membranes. This creates repulsion between cells and helps these late-stage cancer cells enter the blood stream, many bacteria also use sialic acid in their biology, although this is usually limited to bacteria that live in association with higher animals. Many of these incorporate sialic acid into cell surface features like their lipopolysaccharide and capsule, other bacteria simply use sialic acid as a good nutrient source, as it contains both carbon and nitrogen and can be converted to fructose-6-phosphate, which can then enter central metabolism. Sialic acid-rich oligosaccharides on the glycoconjugates found on surface membranes help keep water at the surface of cells, the sialic acid-rich regions contribute to creating a negative charge on the cells surfaces. Since water is a molecule with partial positive charges on both hydrogen atoms, it is attracted to cell surfaces and membranes

14.
UniProt
–
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains an amount of information about the biological function of proteins derived from the research literature. The UniProt consortium comprises the European Bioinformatics Institute, the Swiss Institute of Bioinformatics, EBI, located at the Wellcome Trust Genome Campus in Hinxton, UK, hosts a large resource of bioinformatics databases and services. SIB, located in Geneva, Switzerland, maintains the ExPASy servers that are a resource for proteomics tools. In 2002, EBI, SIB, and PIR joined forces as the UniProt consortium, each consortium member is heavily involved in protein database maintenance and annotation. Until recently, EBI and SIB together produced the Swiss-Prot and TrEMBL databases and these databases coexisted with differing protein sequence coverage and annotation priorities. Swiss-Prot aimed to provide reliable protein sequences associated with a level of annotation. Recognizing that sequence data were being generated at a pace exceeding Swiss-Prots ability to keep up, meanwhile, PIR maintained the PIR-PSD and related databases, including iProClass, a database of protein sequences and curated families. The consortium members pooled their resources and expertise, and launched UniProt in December 2003. UniProt provides four core databases, UniProtKB, UniParc, UniRef, UniProt Knowledgebase is a protein database partially curated by experts, consisting of two sections, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. As of 19 March 2014, release 2014_03 of UniProtKB/Swiss-Prot contains 542,782 sequence entries, UniProtKB/Swiss-Prot is a manually annotated, non-redundant protein sequence database. It combines information extracted from literature and biocurator-evaluated computational analysis. The aim of UniProtKB/Swiss-Prot is to all known relevant information about a particular protein. Annotation is regularly reviewed to keep up with current scientific findings, the manual annotation of an entry involves detailed analysis of the protein sequence and of the scientific literature. Sequences from the gene and the same species are merged into the same database entry. Differences between sequences are identified, and their cause documented, a range of sequence analysis tools is used in the annotation of UniProtKB/Swiss-Prot entries. Computer-predictions are manually evaluated, and relevant results selected for inclusion in the entry and these predictions include post-translational modifications, transmembrane domains and topology, signal peptides, domain identification, and protein family classification. Relevant publications are identified by searching databases such as PubMed, the full text of each paper is read, and information is extracted and added to the entry

15.
Entrez
–
The name Entrez was chosen to reflect the spirit of welcoming the public to search the content available from the NLM. Entrez Global Query is a search and retrieval system that provides access to all databases simultaneously with a single query string. Entrez can efficiently retrieve related sequences, structures, and references, the Entrez system can provide views of gene and protein sequences and chromosome maps. Some textbooks are available online through the Entrez system. The Entrez front page provides, by default, access to the global query, all databases indexed by Entrez can be searched via a single query string, supporting boolean operators and search term tags to limit parts of the search statement to particular fields. This returns a unified results page, that shows the number of hits for the search in each of the databases, Entrez also provides a similar interface for searching each particular database and for refining search results. The Limits feature allows the user to narrow a search a web forms interface, the History feature gives a numbered list of recently performed queries. Results of previous queries can be referred to by number and combined via boolean operators, search results can be saved temporarily in a Clipboard. Users with a MyNCBI account can save queries indefinitely and also choose to have updates with new search results e-mailed for saved queries of most databases and it is widely used in the field of biotechnology as a reference tool for students and professionals alike. Entrez searches the following databases, PubMed, biomedical literature citations and abstracts, including Medline - articles from journals, in addition to using the search engine forms to query the data in Entrez, NCBI provides the Entrez Programming Utilities for more direct access to query results. The eUtils are accessed by posting specially formed URLs to the NCBI server, there was also an eUtils SOAP interface which was terminated on July 2015. In 1991, entrez was introduced in CD form, in 1993, a client-server version of the software provided connectivity with the internet. In 1994, NCBI established a website, and Entrez was a part of initial release. In 2001, Entrez bookshelf was released and in 2003, the Entrez Gene database was developed, Entrez search engine form Entrez Help

16.
Locus (genetics)
–
A locus in genetics is the position on a chromosome. Each chromosome carries many genes, humans estimated haploid protein coding genes are 19, 000-20,000, a variant of the similar DNA sequence located at a given locus is called an allele. The ordered list of known for a particular genome is called a gene map. Gene mapping is the process of determining the locus for a biological trait. The chromosomal locus of a gene might be written 3p22.1, here 3 means chromosome 3, p means p-arm. And 22 refers to region 2, band 2 and this is read as two two, not as twenty-two. So the entire locus is read as three P two two point one, the cytogenetic bands are counting from the centromere out toward the telomeres. A range of loci is specified in a similar way. For example, the locus of gene OCA1 may be written 11q1. 4-q2.1, meaning it is on the arm of chromosome 11. The ends of a chromosome are labeled pter and qter, a centisome is defined as 1% of a chromosome length. Chromosomal translocation Cytogenetic notation Karyotype Null allele Michael, R. Cummings, belmont, California, Brooks/Cole Overview at ornl. gov Chromosome Banding and Nomenclature from NCBI

17.
Chromosome 6 (human)
–
Chromosome 6 is one of the 23 pairs of chromosomes in humans. People normally have two copies of this chromosome, chromosome 6 spans more than 170 million base pairs and represents between 5.5 and 6% of the total DNA in cells. It contains the Major Histocompatibility Complex, which contains over 100 genes related to the immune response, identifying genes on each chromosome is an active area of genetic research. Because researchers use different approaches to genome annotation their predictions of the number of genes on each chromosome varies, in January 2017, two estimates differed by 16%, with one estimate giving 3,000 genes, and the other estimate giving 2,516 genes. In 2003, the entirety of chromosome 6 was manually annotated for proteins, resulting in the identification of 1,557 genes, the human leukocyte antigen lies on chromosome 6, and encodes cell-surface antigen-presenting proteins among other functions

18.
Chromosome 2 (human)
–
Chromosome 2 is one of the 23 pairs of chromosomes in humans. People normally have two copies of this chromosome, chromosome 2 is the second-largest human chromosome, spanning more than 242 million base pairs and representing almost 8% of the total DNA in human cells. Identifying genes on each chromosome is an area of genetic research. Because researchers use different approaches to genome annotation their predictions of the number of genes on each chromosome varies, in January 2017, two estimates differed by 12%, with one estimate giving 3,862 genes, and the other estimate giving 3,399 genes. Chromosome had the HOXD homeobox gene cluster, all members of Hominidae except humans, Neanderthals, and Denisovans have 24 pairs of chromosomes. Humans have only 23 pairs of chromosomes, human chromosome 2 is a result of an end-to-end fusion of two ancestral chromosomes. The evidence for this includes, The correspondence of chromosome 2 to two ape chromosomes, the closest human relative, the chimpanzee, has near-identical DNA sequences to human chromosome 2, but they are found in two separate chromosomes. The same is true of the more distant gorilla and orangutan, the presence of a vestigial centromere. Normally a chromosome has just one centromere, but in chromosome 2 there are remnants of a second centromere in the q21. 3–q22.1 region. These are normally only at the ends of a chromosome

19.
Chromosome 11 (human)
–
Chromosome 11 is one of the 23 pairs of chromosomes in humans. Humans normally have two copies of this chromosome, Chromosome 11 spans about 135 million base pairs and represents between 4 and 4.5 percent of the total DNA in cells. Identifying genes on each chromosome is an area of genetic research. Because researchers use different approaches to genome annotation their predictions of the number of genes on each chromosome varies, in January 2017, two estimates differed insignificantly, with one estimate giving 2,920 genes, and the other estimate giving 2,893 genes. At 21.5 genes per megabase, Chromosome 11 is one of the most gene-rich, more than 40% of the 856 olfactory receptor genes in the human genome are located in 28 single-gene, and multi-gene, clusters along this chromosome

20.
Zanamivir
–
Zanamivir is a medication used to treat and prevent influenza caused by influenza A and B viruses. It is an inhibitor and was developed by the Australian biotech firm Biota Holdings. It was licensed to Glaxo in 1990 and approved in the US in 1999, in 2006, it was approved for prevention of influenza A and B. Zanamivir was the first neuraminidase inhibitor commercially developed and it is currently marketed by GlaxoSmithKline under the trade name Relenza as a powder for oral inhalation. Zanamivir is used for the treatment of infections caused by influenza A and influenza B viruses and it decreases the risk of ones getting symptomatic, but not asymptomatic influenza. As of 2009, no influenza has shown any signs of resistance in the US, since then, genes expressing resistance to zanamivir were found in Chinese people infected with avian influenza A H7N9 during treatment with zanamivir. In otherwise-healthy individuals, benefits overall appear to be small, zanamivir shortens the duration of symptoms of influenza-like illness by less than a day. In children with asthma there was no effect on the time to first alleviation of symptoms. Whether it affects the risk of ones need to be hospitalized or the risk of death is not clear, there is no proof that zanamivir reduced hospitalizations or pneumonia and other complications of influenza, such as bronchitis, middle ear infection, and sinusitis. Zanamivir did not reduce the risk of self reported investigator mediated pneumonia or radiologically confirmed pneumonia in adults, the effect on pneumonia in children was also not significant. Low to moderate evidence indicates it decreases the risk of ones getting influenza by 1% to 12% in those exposed, also there was no evidence of reduction of risk of person-to-person spread of the influenza virus. The evidence for a benefit in preventing influenza is weak in children, as of 2009, no influenza had shown any signs of resistance in the US. A meta-analysis from 2011 found that zanamivir resistance had been rarely reported, antiviral resistance can emerge during or after treatment with antivirals in certain people. In 2013 genes expressing resistance to zanamivir were found in Chinese patients infected with avian influenza A H7N9, dosing is limited to the inhalation route. This restricts its usage, as treating asthmatics could induce bronchospasms, in 2006 the Food and Drug Administration found that breathing problems, including deaths, were reported in some patients after the initial approval of Relenza. Most of these patients had asthma or chronic obstructive pulmonary disease, Relenza therefore was not recommended for treatment or prophylaxis of seasonal influenza in individuals with asthma or chronic obstructive pulmonary disease. In 2009 the zanamivir package insert contains precautionary information regarding risk of bronchospasm in patients with respiratory disease, in adults there was no increased risk of reported adverse events in trials. There was little evidence of the harms associated with the treatment of children with zanamivir

21.
Influenza A virus subtype H1N1
–
Influenza A virus is the subtype of influenza A virus that was the most common cause of human influenza in 2009, and is associated with the 1918 outbreak known as the Spanish Flu. It is an orthomyxovirus that contains the glycoproteins haemagglutinin and neuraminidase, for this reason, they are described as H1N1, H1N2 etc. depending on the type of H or N antigens they express with metabolic synergy. Haemagglutinin causes red blood cells to clump together and binds the virus to the infected cell, neuraminidase are a type of glycoside hydrolase enzyme which help to move the virus particles through the infected cell and assist in budding from the host cells. Some strains of H1N1 are endemic in humans and cause a small fraction of all influenza-like illness, H1N1 strains caused a small percentage of all human flu infections in 2004–2005. Other strains of H1N1 are endemic in pigs and in birds, in June 2009, the World Health Organization declared the new strain of swine-origin H1N1 as a pandemic. This strain is often called swine flu by the public media and this novel virus spread worldwide and had caused about 17,000 deaths by the start of 2010. On August 10,2010, the World Health Organization declared the H1N1 influenza pandemic over, Swine influenza is a respiratory disease that occurs in pigs that is caused by the Influenza A virus. Influenza viruses that are found in swine are known as swine influenza viruses. The known SIV strains include influenza C and the subtypes of influenza A known as H1N1, H1N2, H3N1, H3N2, pigs can also become infected with the H4N6 and H9N2 subtypes. Swine influenza virus is common throughout pig populations worldwide, transmission of the virus from pigs to humans is not common and does not always lead to human influenza, often resulting only in the production of antibodies in the blood. If transmission does cause human influenza, it is called zoonotic swine flu or a variant virus, people with regular exposure to pigs are at increased risk of swine flu infection. The meat of an infected animal poses no risk of infection when properly cooked, during the mid-20th century, identification of influenza subtypes became possible, allowing accurate diagnosis of transmission to humans. Since then, only 50 such transmissions have been confirmed and these strains of swine flu rarely pass from human to human. The recommended time of isolation is about five days and it is thought to be one of the deadliest pandemics in human history. The 1918 flu caused a number of deaths, possibly due to it causing a cytokine storm in the body. The Spanish flu virus infected cells, leading to overstimulation of the immune system via release of cytokines into the lung tissue. This leads to leukocyte migration towards the lungs, causing destruction of lung tissue. This makes it difficult for the patient to breathe, other countries suppressed the news in order to protect morale

22.
Yeast
–
Yeasts are eukaryotic, single-celled microorganisms classified as members of the fungus kingdom. The yeast lineage originated hundreds of millions of years ago, and 1,500 species are currently identified and they are estimated to constitute 1% of all described fungal species. Yeast sizes vary greatly, depending on species and environment, typically measuring 3–4 µm in diameter, most yeasts reproduce asexually by mitosis, and many do so by the asymmetric division process known as budding. Yeasts, with their growth habit, can be contrasted with molds. Fungal species that can take both forms are called dimorphic fungi and it is also a centrally important model organism in modern cell biology research, and is one of the most thoroughly researched eukaryotic microorganisms. Researchers have used it to information about the biology of the eukaryotic cell. Other species of yeasts, such as Candida albicans, are opportunistic pathogens, yeasts have recently been used to generate electricity in microbial fuel cells, and produce ethanol for the biofuel industry. Yeasts do not form a taxonomic or phylogenetic grouping. The budding yeasts are classified in the order Saccharomycetales, within the phylum Ascomycota, the word yeast comes from Old English gist, gyst, and from the Indo-European root yes-, meaning boil, foam, or bubble. Yeast microbes are probably one of the earliest domesticated organisms, archaeologists digging in Egyptian ruins found early grinding stones and baking chambers for yeast-raised bread, as well as drawings of 4, 000-year-old bakeries and breweries. In 1680, Dutch naturalist Anton van Leeuwenhoek first microscopically observed yeast, but at the time did not consider them to be living organisms, researchers were doubtful whether yeasts were algae or fungi, but in 1837 Theodor Schwann recognized them as fungi. In 1857, French microbiologist Louis Pasteur proved in the paper Mémoire sur la fermentation alcoolique that alcoholic fermentation was conducted by living yeasts and not by a chemical catalyst. Pasteur showed that by bubbling oxygen into the yeast broth, cell growth could be increased, by the late 18th century, two yeast strains used in brewing had been identified, Saccharomyces cerevisiae and S. carlsbergensis. S. cerevisiae has been sold commercially by the Dutch for bread-making since 1780, while, around 1800, in 1825, a method was developed to remove the liquid so the yeast could be prepared as solid blocks. The industrial production of yeast blocks was enhanced by the introduction of the press in 1867. In 1872, Baron Max de Springer developed a process to create granulated yeast. Yeasts are chemoorganotrophs, as they use organic compounds as a source of energy, carbon is obtained mostly from hexose sugars, such as glucose and fructose, or disaccharides such as sucrose and maltose. Some species can metabolize pentose sugars such as ribose, alcohols, Yeast species either require oxygen for aerobic cellular respiration or are anaerobic, but also have aerobic methods of energy production

23.
Protein
–
Proteins are large biomolecules, or macromolecules, consisting of one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, a linear chain of amino acid residues is called a polypeptide. A protein contains at least one long polypeptide, short polypeptides, containing less than 20–30 residues, are rarely considered to be proteins and are commonly called peptides, or sometimes oligopeptides. The individual amino acid residues are bonded together by peptide bonds, the sequence of amino acid residues in a protein is defined by the sequence of a gene, which is encoded in the genetic code. In general, the code specifies 20 standard amino acids, however. Sometimes proteins have non-peptide groups attached, which can be called prosthetic groups or cofactors, proteins can also work together to achieve a particular function, and they often associate to form stable protein complexes. Once formed, proteins only exist for a period of time and are then degraded and recycled by the cells machinery through the process of protein turnover. A proteins lifespan is measured in terms of its half-life and covers a wide range and they can exist for minutes or years with an average lifespan of 1–2 days in mammalian cells. Abnormal and or misfolded proteins are degraded more rapidly due to being targeted for destruction or due to being unstable. Like other biological macromolecules such as polysaccharides and nucleic acids, proteins are essential parts of organisms, many proteins are enzymes that catalyse biochemical reactions and are vital to metabolism. Proteins also have structural or mechanical functions, such as actin and myosin in muscle and the proteins in the cytoskeleton, other proteins are important in cell signaling, immune responses, cell adhesion, and the cell cycle. In animals, proteins are needed in the diet to provide the essential amino acids that cannot be synthesized, digestion breaks the proteins down for use in the metabolism. Methods commonly used to study structure and function include immunohistochemistry, site-directed mutagenesis, X-ray crystallography, nuclear magnetic resonance. Most proteins consist of linear polymers built from series of up to 20 different L-α-amino acids, all proteinogenic amino acids possess common structural features, including an α-carbon to which an amino group, a carboxyl group, and a variable side chain are bonded. Only proline differs from this structure as it contains an unusual ring to the N-end amine group. The amino acids in a chain are linked by peptide bonds. Once linked in the chain, an individual amino acid is called a residue, and the linked series of carbon, nitrogen. The peptide bond has two forms that contribute some double-bond character and inhibit rotation around its axis, so that the alpha carbons are roughly coplanar

The energies of the stages of a chemical reaction. Uncatalysed (dashed line), substrates need a lot of activation energy to reach a transition state, which then decays into lower-energy products. When enzyme catalysed (solid line), the enzyme binds the substrates (ES), then stabilizes the transition state (ES‡) to reduce the activation energy required to produce products (EP) which are finally released.

The human genome is the complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome …

Graphical representation of the idealized human diploid karyotype, showing the organization of the genome into chromosomes. This drawing shows both the female (XX) and male (XY) versions of the 23rd chromosome pair. Chromosomes are shown aligned at their centromeres. The mitochondrial DNA is not shown.

TSC SNP distribution along the long arm of chromosome 22 (from http://snp.cshl.org/ ). Each column represents a 1 Mb interval; the approximate cytogenetic position is given on the x-axis. Clear peaks and troughs of SNP density can be seen, possibly reflecting different rates of mutation, recombination and selection.

G-banding ideogram of human chromosome 6 in resolution 850 bphs. Band length in this diagram is proportional to base-pair length. This type of ideogram is generally used in genome browsers (e.g. Ensembl, UCSC Genome Browser).

G-banding ideogram of human chromosome 11 in resolution 850 bphs. Band length in this diagram is proportional to base-pair length. This type of ideogram is generally used in genome browsers (e.g. Ensembl, UCSC Genome Browser).