Mining Genetic Data at Korea's Polar Research Institute

Exploring next-gen sequencing methods at an institute dedicated to polar research. Part of our ongoing series about Charles River’s sabbatical program. You can find more examples of sabbaticals here.

Most of us have heard of the Human Genome Project, a massive effort to map around 20,500 genes that genetically define who we are. But unless you spend your time analyzing DNA—as Charles River scientist Sunhee Hong does every day—you probably aren’t familiar with the sequencing methods that scientists used to sequence all those human genes. Sunhee, a phylogeneticist at our Accugenix site in Newark, Delaware, spends much of her time making sure that the microbes clients send to her facility for analysis are, genetically speaking, being properly classified and curated, and so she knows the benefits but also the limitations of the current sequencing methods. So she decided to take a four-week sabbatical and travel back to her homeland to take a course at the Korea Polar Research Institute, an institution funded by the Korean government. You might wonder what polar research has to do with genetic sequencing. Tune into this Q&A from Sunhee that answers that and other questions about her sabbatical experience in May.

What kind of research does the Korean Polar Research Institute (KOPRI) do?

The major research pillars of KOPRI focus on investigating changes in the polar climates and ecosystems in an effort to respond to the ever-changing global environment. This also includes pursuing future avenues through research and development in the Arctic and Antarctic regions and contributing to environmental protection and sustainable development for these sensitive regions of the world. In order to accomplish this, there are a number of ongoing projects. One such project involves understanding the function of genes and other genome components of polar organisms to elucidate adaptation and evolution. This allows for the construction of a scientific foundation for polar organisms using various –omics approaches, investigations of climate change mechanisms, and understanding of environmental changes in the Arctic permafrost region. Furthermore, there is a focus on surveying biodiversity and reconstructing evolutionary history for seabirds, lichens, and some microalgae.

When I first visited KOPRI in 2013, to discuss a possible collaboration, I didn’t really have an opportunity to tour the facility. This time, however, I was thrilled that they took me on a full tour of the main building. It was fascinating! It wasn’t just a large and impressive facility, but also very well designed for both research and education. One of the more notable parts of the tour was an exhibition center for the many children that visit to learn about life and research activities in the Polar Regions.

Would you ever like to travel to the North or South Pole?

What an interesting question! While I was taking my tour of KOPRI, I learned that the Arctic and Antarctica are somewhat similar but there are many differences as well. For example, penguins live only in Antarctica while Polar bears are found only in the Arctic. This is because Antarctica (South Pole) is a continent and is covered with an immense ice shelf. The Arctic (North Pole) region, on the other hand, is located mainly in the North Polar Ocean and is actually made up of several larger islands. Because of this, it is assumed that penguins evolved from birds that settled in the region while polar bears migrated from the mainland. After learning about these differences, I feel that travelling to Antarctica would be much more challenging and so, if given the chance, I would love to visit the South Pole.

Does the Institute use these sequencing methods to analyze specimens collected in these Polar Regions?

In short, yes. The laboratory, known as The Microbial Ecology Laboratory, is an incredibly active research lab with so many projects both completed and on-going. One such project focuses on isolation and characterization of microorganisms in these Polar Regions, and they’ve actually isolated two new strains whose genomes have already been sequenced and published. Here are the title for those studies; “Complete Genome Sequence of Cryobacterium arcticum Strain PAMC 27867, Isolated from a Sedimentary Rock Sample in Northern Victoria Land, Antarctica” and “Complete genome sequence of Pedobacter cryoconitis PAMC 27485, a CRISPR-Cas system-containing psychrophile isolated from Antarctica. If you would like to read more about these studies, please visit these links to their publications. https://www.ncbi.nlm.nih.gov/pubmed/27015980.

Did you find it difficult to analyze Genome Sequence data?

It wasn’t as difficult as I anticipated. Although, since I’m not very familiar with the LINUX operating system, the first few classes proved to be a little difficult. Due to the massive size of genome sequence data, the majority of the software designed for analysis is written for LINUX users. After the initial learning curve though, it wasn’t so difficult. Another overwhelming aspect for beginners is that there are so many software options available that it could be difficult on deciding which utilities will give you the information you’re looking for. Luckily, the instructor was very knowledgeable and gave the impression that he’d tried almost all of them so he was able to provide us with specific insight into which applications would best suit each of our individual research needs.

Was there any language barrier that came into play when you were taking your course?

The class was taught in Korean, but for many of the terms, there are no Korean translations so our instructor used both Korean and English. Korean is my mother tongue and I also speak English so it wasn’t really an issue for me.

How did you feel about going back to Korea after being in the US for such a long time?

Upon returning to Korea, there were many things that felt very new to me. For example, since I was last there, many new buildings and roadways had been constructed so it was a little difficult to find the places I wanted to get to. The public transport system had changed dramatically and so I was initially confused when trying to get from place to place. The hardest thing for me though, was that English words have been introduced into the Korean language with new terms being frequently coined. So if I was asking for help, sometimes a random English word would come up in conversation that took me a little time to figure out. Despite the new things I encountered, people were still very friendly and the food was fantastic.

What is next gen sequencing (NGS)?

The most straightforward explanation for this question comes from a journal article called Next Generation Sequencing Technologies: A Short Review authored by Tarek Hamed Attia and Maysaa Abdallah Saeed published February 2016 in the Journal of Next Generation Sequencing & Applications. In it, they state: “Next-generation sequencing (NGS) is a type of DNA sequencing technology that uses parallel sequencing of multiple small fragments of DNA to determine sequence. In contrast to Sanger sequencing, the speed of sequencing and amounts of DNA sequence data generated with NGS, which is considered a “high-throughput technology”, are exponentially greater and are produced at significantly reduced costs”. In short, NGS is massive parallel sequencing which generates an enormous amount of sequence data at a time. NGS has multiple templates and multiple reactions with one primer while Sanger sequencing has one template, one reaction and one primer.

How was the course structured?

The course started with an introduction to NGS Technology then focused on two specific NGS platforms, Illumina and PacBio. Our instructor taught us about the theories behind both of these sequencing technologies including library construction, signal detection, and each of their resulting properties. After we had an appropriate foundation of the basic concepts of NGS, we learned how to QC and trim sequence data. To do this, we used raw sequence data extracted from a public database and ran the appropriate software for QC and trimming. After the high quality genome sequence data were obtained, we began to assemble and annotate the genes. The last three classes were for genome sequence analysis which included the whole genome alignment, phylogenetic trees, Venn diagrams, comparative genome mapping and ANI based genome trees.

What are your final thoughts? Did you find this course useful and did it meet your expectations?

Actually, this course exceeded my expectations. The instructor wasn’t only knowledgeable about his field but also very thoughtful, making it a very comfortable environment for asking as many questions as I wanted to. I think this was helped, in part, by the fact that the class was relatively small. One of the best aspects of this course was the sheer amount of experience the instructor had. This was especially important because it allowed him to provide us with strong explanations and the key considerations in determining which software components would work best for us and our research needs. Furthermore, he used a lot of scientific data to support what he was teaching us which made it very easy to decide what to look for now and in the future. All in all, what I felt I’ve learned on my sabbatical has been immensely helpful and I am very glad to have had the opportunity to experience this course.

Charles River is committed to providing its employees with opportunities for growth. To learn how you can join the team, visit www.criver.com. Also, read more stories about Charles River’s sabbaticals.