at OpenHelix

Tag Archives: comparative genomics

Usually when we think about comparative genomics data, we are thinking about genomes that are pretty well sequenced, and we want to look at that data with variety of tools and algorithms. But this past week we saw a question about less-well-sequenced genomes, and we thought it was an interesting inquiry. The question was: is there a web site that displays comparative karyotype data? So we went looking. And we found Chromhome.

Chromhome has a very straightforward interface. You choose a target species. You choose the probe species. You click paint–and you get a look at the chromosome level homology. When the data was performed with actual probes and reported in the literature, that data is provided. At the time the paper was published this consisted of more than 100 data sets.

There is also the opportunity to see inferred painting as well. I’ll let the Chromhome paper authors describe that strategy:

If species A and species B are mapped on species N, then it is possible to deduce some of the chromosomal arrangements of A on B or B on A with respect to the arrangements of N chromosomes. Many of the species in Chromhome have been mapped on human chromosomes using chromosome painting. It is therefore possible to infer homologies between two species each of which have been hybridized with human probes.

So if this type of comparative genomics may be of interest to you, check out Chromhome.

(btw, there is an interesting photo, copyrighted… so I won’t post it here, you might want to check out. There’s an interesting story there, how our illustrations of Neanderthal have evolved over the years to be more ‘humanizing’ as we learn that they made tools, had culture and now… are part of our ancestry…”)

I am itching to go play there and see what I can see, as I am sure many scientists are. It’s also fascinating to be in this world of huge amounts of data coming quickly. I think a lot of paradigms will be shifting for a while.

This next post in our continuing semi-regular Guest Post series is from Eric Lyons, of CoGe at the University of California, Berkeley. If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com.

Thanks both for the prior CoGe post (editors note: a tip of the week on GoGe) and the invitation to write a bit about CoGe. Since most people are probably not familiar with CoGe, let me begin with how it is designed:

CoGe’s architecture and philosophy: Solve a problem once

CoGe is a web-based platform for comparative genomics and consists of many interconnected web-based tools. The entire system is hooked up to a database that can store any version of any genome in any state of assembly from any organism (currently ~9000 genomes from ~8000 organisms). Each of CoGe’s tools is designed to do one task (e.g. search and display information about a genome, compare two genomes and generate syntenic dotplots, search any number of genomes for similar sequence, manage a list of genes, etc.), and are linked to one another. This means that there is no predefined analysis workflow. Instead, people can begin exploring a genome of interest, compare it to what they want, find something interesting, explore that, finding something else, explore that, etc.) People anywhere in the world can perform computationally intense analyses by clicking a few buttons on a web-page, and letting our servers crunch away on whatever genomes we have currently loaded in our system . Since each tool is web-based, links are used to move from tool to tool which creates an easy way to save an analysis for future work or to send to a colleague. This also has the benefit that as we develop new tools to solve a specific problem, we can generalize the solution, and plug it into CoGe’s database and connect it to its pre-existing tool set. Overall, this allows an easy way for us to expand CoGe’s functionality.

VISTA recently added VISTA Point which combines capabilities of the three tools currently available at the site – VISTA Gateway, VISTA Browser, and Text Browser. VISTA Point makes analyzing multiple and pairwise genome alignments and extracting relevant numerical data much more straightforward.

“As we continually update and improve the VISTA suite of tools, it is critical we give users the training they need to use the tools efficiently and effectively,” said Inna Dubchak, Principal Investigator for the VISTA project, “Our history with OpenHelix has proven that their tutorial suite is an excellent and cost-effective method for us to provide that training.”

The online narrated tutorial (www.openhelix.com/vista), which runs in just about any browser, can be viewed from beginning to end or navigated using chapters and forward and backward sliders. The approximately 60 minute tutorial highlights and explains the features and functionality needed to start using VISTA effectively. The tutorial can be used by new users to introduce them to VISTA, or by previous users to view new features and functionality or simply as a reference tool to understand specific features.
In addition to the tutorial, VISTA users can also access useful training materials including the animated PowerPoint slides used as a basis for the tutorial, suggested script for the slides, slide handouts, and exercises. This can save a tremendous amount time and effort for teachers and professors to create classroom content.

In addition to the VISTA tutorial, OpenHelix offers nearly 90 tutorial suites on some of the most powerful and popular bioinformatics and genomics tools available on the web. Some of the tutorials suites are freely available through support from the resource providers. The whole catalog of tutorials suites is available through a subscription. Users can view the tutorials and download the free materials atwww.openhelix.com. ;

About VISTA and LBNL
VISTA family of tools has been developed and hosted at Genomics Division of Lawrence Berkeley National Laboratory. This project was originally supported by the Programs for Genomic Applications grant from the NHLBI/NIH and is currently supported by the Office of Biological and Environmental Research, Office of Science, US Department of Energy.

Lawrence Berkeley National Laboratory (Berkeley Lab) has been a leader in science and engineering research for more than 70 years. Located on a 200 acre site in the hills above the University of California’s Berkeley campus, adjacent to the San Francisco Bay, Berkeley Lab holds the distinction of being the oldest of the U.S. Department of Energy’s National Laboratories. The Lab is managed by the University of California, operating with an annual budget of more than $500 million and a staff of about 3,800 employees, including more than 500 students.

About OpenHelix
OpenHelix, LLC, (www.openhelix.com) provides a bioinformatics and genomics search and training portal, giving researchers one place to find and learn how to use resources and databases on the web. The OpenHelix Search portal searches hundreds of resources, tutorial suites and other material to direct researchers to the most relevant resources and OpenHelix training materials for their needs. Researchers and institutions can save time, budget and staff resources by leveraging a subscription to nearly 100 online tutorial suites available through the portal. More efficient use of the most relevant resources means quicker and more effective research.

Our first guest post in our new semi-regular Guest Post series is from Inna Dubchak , principal investigator at the LBNL/JGI group, developers of the VISTA comparative genomics resource (who sponsors a tutorial, free to the users). If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com.

I would like to give you a heads up on some new VISTA updates and ongoing development!

Updates:As you probably know from this blog, a new, still freeVISTA tutorial is available now. We have introduced a lot of updates to these tools - built new programs, improved the existing ones, andentirely changed the design of the site to make it more up-to-date and convenient.

Main addition to the site – VISTA Point – combines capabilities of the three tools currently available at the site – VISTA Gateway, VISTA Browser, and Text Browser usually used step-by-step. VISTA Point makes analyzing multiple and pairwise genome alignments and extracting relevant numerical data much more straightforward, it is easy to update, expand and add new programs.

Soon: We are actively working on visualizing synteny at scales ranging from whole-genome alignment to the conservation of individual genes, with seamless navigation across different levels of resolution. In our upcoming VISTA-Dot tool we used the concept of two-dimensional “dot-plots”, historically employed in the analysis of local alignment, and an interactive Google-map-like interface to visualize whole-genome alignments. Youwill be able to get adisplay and analyze large-scale duplication in plants in one click!It can also be useful in genome assembly and finishing. Another additioncoming in the near future, VISTA SyntenyViewer, presents a novel interface as three cross-navigable panels representing different scales of the alignment.

Attention: do not forget to use our whole-genome capabilities – Whole-genome VISTA to align sequence of any quality, from draft to finished, up to 10MB long, and Whole Genome rVISTA to evaluate which transcription factor binding sites (TFBS) are over-represented in upstream regions in a group of genes.

The homepage has undergone a very nice redesign. Much of the underlying VISTA browser and other tools functionality and use is similar (though updated of course). We understand also that there will be upcoming updates to some tools and the addition of others. Look for that here :D.

Also, we’ve updated our tutorial to reflect the new site and functions. As before, this tutorial is free to users and sponsored by VISTA. Check it out.

So I’m all excited about the genome festival that I’m seeing, related to the publication of the new sequence version of corn. You can access the main paper in Science, and there’s a very neat diagram in figure 1 that is like looking across time at the sequence data and into the corn nebula. But the thing that cracked me up was this line from the abstract:

Nearly 85% of the genome iscomposed of hundreds of families of transposable elements, dispersednonuniformly across the genome.

That means 85% of corn isn’t corn!! And what business do those elements have messing with the genomes?? I am told all the time that messing with plant genomes is wrong and unnatural. Heh.

For full coverage of the big news today I’ll point you to James and the Giant Corn (appropriately enough) who seems to be the CNN (Corn News Network) of 24-hour coverage of many aspects of the work.

I spent my morning looking over the PLoS Maize Special Collection papers, including the intriguing appetizer: 10 Reasons to be Tantalized by the B73 Maize Genome. But I spent longer looking at the CNVs and PAVs paper. I’ve been thinking about CNVs a lot lately, and was interested to see this covered in a non-mammalian species.

Figure 1 is a nice example of how to use VISTA for effective displays in comparative genomics. (If you haven’t used VISTA before you might check out our sponsored free tutorial on that–we are currently working with the VISTA team to update that with their new features too.)

There’s a really striking segment of chromosome 6 that appears to be present in one of the strains they examine and absent in the other (illustrated in figure 4). And it looks like it has genes that are expressed and active in the B73 strain. The ongoing investigation of that is pretty intriguing as well.

The structural variations are not evenly distributed across the genomes. Some places have large occurrences, and some are untouched. It’s clear that just in these two strains there’s a lot more structural diversity than in other species that have been examined:

In the human, rat, dog, mouse, macaque and chimpanzee genomes the average number of CNVs between two individuals is between 15 and 75 [43]–[48]. A high resolution study of eight human genomes [49] revealed only several hundred insertions and deletions, including CNV and PAV sequences, in the comparison of any two human genomes. In contrast, even after very stringent filtering we identified >3,700 CNV or PAV sequences that represent at least 2,000 events between these two maize genomes.

Emphasis mine. Plants are so much more flexible, apparently….

This is going to lead to some neat clues on heterosis (or hybrid vigor) as the research proceeds with these new tools. What a great time to be a plant scientist. There are some very exciting projects coming along with the tools of genomics.

What I couldn’t locate was any reference to a CNV database (like DGV or CHOP CNV) where you can examine the whole set. I’ll dig through the supplement data to see if I can find out more on that. But I wanted get this post out to celebrate the very nice work and collection of papers on this project. Congrats to the teams involved!

Today’s tip of the week introduces a new (to us) tool for genomic comparisons. We came across this tool reading a blog post at James and the Giant Corn (great blog) about a figure from his research proposal. See, there are reasons to read blogs :D. The tool he uses to create this figure and analysis is GeVo at CoGe which has several useful tools in addition to GeVo. In today’s tip of the week, we’ll take a quick look at James’ figure at GeVo and introduce CoGe. Check them out, they look like quite useful tools. (and while you’re at it, check out James’ blog. Tidbits like this and interesting discussions make it well worth it.)

It almost seems like a genome sequence now exists for nearly every living thing. Whether it’s a fruit fly, hedgehog, or the duck-billed platypus, the genomics research world has produced enormous

amounts of DNA sequence. How do we make sense of all of these data? The key is in comparisons… In this webinar, Eric Green will present an overview of the utility of comparative sequence analyses and show how these comparisons shed light on how genomes work and how these studies are relevant to human health.

Today’s tip looks at one example of how to view the same genomic data across several databases simply by browsing. You can download the data from analysis tools and databases in several formats and use that in others, and someday we’ll do a tip on that. But today’s tips shows you that many databases link out between them allowing you to view data in one context and then another simply by clicking a link. We are going to start by looking at comparative genomic data in VISTA , there’s much more in depth tutorial on VISTA here (free), then link out to the UCSC Genome Browser (free tutorial) to view the data there and then off to Ensembl (tutorial, subscription).