This workflow discovers proteins from plain text. It is built around the AIDA 'Named Entity Recognize' web service by Sophia Katrenko (service based on LingPipe), from which output it filters out proteins. The Named Recognizer services uses the pre-learned genomics model, named 'MedLine', to find genomics concepts in plain text.MedLineThis workflow filters protein_molecule-labeled terms from an input string(list). The result is a tagged list of proteins (disregarding false positives in the input).
Internal information:
This workflow is a copy of 'filter_protein_molecule_MR3' used for the NBIC poster (now in Archive).This workflow filters protein_molecule-labeled terms from an input string(list). The result is a tagged list of proteins (disregarding false positives in the input).
Internal information:
This workflow is a copy of 'filter_protein_molecule_MR3' used for the NBIC poster (now in Archive).<protein_molecule>\w*</protein_molecule>org.embl.ebi.escience.scuflworkers.java.SplitByRegexorg.embl.ebi.escience.scuflworkers.java.FilterStringListorg.embl.ebi.escience.scuflworkers.java.StringStripDuplicates(?=<protein_molecule>)|(?<=</protein_molecule>)text/xmlThis workflow contains the 'Named Entity Recognize' web service from the AIDA toolbox, created by Sophia Katrenko. It can be used to discover entities of a certain type (determined by 'learned_model') in documents provided in a lucene output format.This workflow contains the 'Named Entity Recognize' web service from the AIDA toolbox, created by Sophia Katrenko. It can be used to discover entities of a certain type (determined by 'learned_model') in documents provided in a lucene output format.
Known issues:
The output of NErecognize contains concepts with / characters, breaking the xml. For post-processing its results it is better to use string manipulation than xml manipulations.
The output is per document, which means entities will be redundant if they occur in more than one document.'lucene' for lucene's XML output (NER done on 'content' field only)
'text' for plain texttextNElisthttp://ws.adaptivedisclosure.org/axis/services/NERecognizerService?wsdlNErecognizeModel to discover a set of specific concepts; e.g. the prelearned model named 'MedLine' will make the service discover genomics concepts.plain text, example:
Polycomb-group (PcG) proteins form multimeric protein complexes, which are involved in maintaining the transcriptional repressive state of genes over successive cell generations. Components of PcG complexes and their mutual interactions have been identified and analysed through extensive genetic and biochemical analyses. Molecular mechanisms underlying PcG-mediated repression of gene activity, however, have remained largely unknown. Previously we reported the existence of two distinct human PcG protein complexes. The EED/EZH protein complex contains the embryonic ectoderm development (EED) and enhancer of zeste 2 (EZH2; refs 9,10) PcG proteins. The HPC/HPH PcG complex contains the human polycomb 2 (HPC2; ref. 11), human polyhomeotic (HPH), BMI1 (ref. 13 ) and RING1 (refs 14, 15) proteins. Here we show that EED (refs 4, 5, 6, 7, 8) interacts, both in vitro and in vivo, with histone deacetylase (HDAC) proteins. This interaction is highly specific because the HDAC proteins do not interact with other vertebrate PcG proteins. We further find that histone deacetylation activity co-immunoprecipitates with the EED protein. Finally, the histone deacetylase inhibitor trichostatin A (ref. 17) relieves transcriptional repression mediated by EED, but not by HPC2, a human homologue of polycomb. Our data indicate that PcG-mediated repression of gene activity involves histone deacetylation. This mechanistic link between two distinct, global gene repression systems is accomplished through the interaction of HDAC proteins with a particular PcG protein, EED.text/rdftext/xmlEntities discoverd in documents provided in lucene output format.plain text, example:
Polycomb-group (PcG) proteins form multimeric protein complexes, which are involved in maintaining the transcriptional repressive state of genes over successive cell generations. Components of PcG complexes and their mutual interactions have been identified and analysed through extensive genetic and biochemical analyses. Molecular mechanisms underlying PcG-mediated repression of gene activity, however, have remained largely unknown. Previously we reported the existence of two distinct human PcG protein complexes. The EED/EZH protein complex contains the embryonic ectoderm development (EED) and enhancer of zeste 2 (EZH2; refs 9,10) PcG proteins. The HPC/HPH PcG complex contains the human polycomb 2 (HPC2; ref. 11), human polyhomeotic (HPH), BMI1 (ref. 13 ) and RING1 (refs 14, 15) proteins. Here we show that EED (refs 4, 5, 6, 7, 8) interacts, both in vitro and in vivo, with histone deacetylase (HDAC) proteins. This interaction is highly specific because the HDAC proteins do not interact with other vertebrate PcG proteins. We further find that histone deacetylation activity co-immunoprecipitates with the EED protein. Finally, the histone deacetylase inhibitor trichostatin A (ref. 17) relieves transcriptional repression mediated by EED, but not by HPC2, a human homologue of polycomb. Our data indicate that PcG-mediated repression of gene activity involves histone deacetylation. This mechanistic link between two distinct, global gene repression systems is accomplished through the interaction of HDAC proteins with a particular PcG protein, EED.text/rdftext/xml