Data handling made easy

Dr Ann-Sofie Albrekt is currently focusing on two key areas: new biomarkers in cancer studies, and research on allergens

In 2005, a large EU-funded research project was launched to develop and optimise in vitro test strategies that could reduce or replace animal testing for sensitisation studies. By using a multi-disciplinary approach, this study is helping to address skin and lung sensitisation by focusing on the impact of compounds on cellular-molecular interactions, which play a central role in the development and elicitation of many allergies.

Because there are not yet any in vitro tests or test strategies available to test chemical compounds on their potential to induce allergies, the aim of the project is to develop alternatives to animal tests currently used for the risk assessment of potential skin or lung sensitisers.

The project, known as Sens-it-iv, combines both private and public research institutions, as well as several industrial and societal interest organisations. One of the key partners involved with the project is the European Centre for the Validation of Alternative Methods (ECVAM) at the Joint Research Centre. The presence of ECVAM ensures a clear focus on tests and testing strategies that can be validated, which is a prerequisite for eventual regulatory acceptance.

Dr Ann-Sofie Albrekt is currently working in this exciting area, based at Lund University's Immunotechnology Department, which is headed by Professor Carl Borrebaeck, a sub-co-ordinator of Sens-it-iv. As an internationally renowned centre for research and education, Lund University is highly regarded for research in fields such as nanotechnology, translational cancer research and stem cell biology.

Dr Albrekt is currently focusing on two key areas: new biomarkers in cancer studies, and research on allergens.

"Worldwide, more and more people are suffering from allergies, which means this area has become an important health concern," she says. "As a scientist, I am interested to find out why otherwise harmless compounds can often elicit an adverse immune response in humans."

Dr Albrekt is currently using sophisticated data analysis software, called Qlucore Omics Explorer, to help her to get the most value out of the data being produced by this research.

"Although gene expression studies are proving invaluable to the study of allergens, the amount of data produced by these experiments is enormous," she says. "As a result, it is impossible to derive any real biological meaning from these findings unless sophisticated data algorithms are used to help interpret this data effectively."

For this reason, most of the software that has been designed for use in this area has mainly focused on the ability to handle increasingly vast amounts of data, which means that the role of the researcher has been largely set aside and the data analysis has been passed on to bioinformaticians and biostatisticians. However, this model has drawbacks, since it is typically the scientists who know the most about biology.

"Some data analysis applications can be complicated and difficult to use, even for specialist statisticians, so it is very important to find software that has been developed by scientists, for scientists," Dr Albrekt says. "I am not a statistician, and yet I found the Qlucore software very easy to use, and without the need for any manuals or training. We started using the software straight away, and the fact that it is highly intuitive means that we were actually able to learn by using it."

Sophisticated bioinformatics software now enables scientists to analyse very large data sets by a combination of statistical methods and visualisation techniques such as Heatmaps and Principal Component Analysis (PCA). With the benefit of instant user feedback on all actions, as well as an intuitive user interface that can present all data in 3D, scientists studying allergens or other aspects of biology can now easily analyse the data in real-time, on their computer screen.

Modern data analysis software enables researchers to use this approach with extremely large data sets - more than 100 million data points - on a standard PC. This kind of software can even take advantage of annotations and other links that are connected with the data being studied as well as a number statistical functions, such as false discovery rates and p-values.

As such, the research being conducted at the Department of Immunotechnology represents a significant breakthrough in how modern data analysis is being performed. Less than 10 years ago, researchers were able to work only with analysis methods that provided information about single genes. In recent years, the number of information points per subject has grown to hundreds of thousands, thanks to important technological advances in this area.

Most recently, perhaps within the past two years, the overall performance of data analysis software has been optimised significantly. According to Dr Albrekt, modern data analysis software can be used to transform high dimensional data down to lower dimensions, which can then be plotted in three dimensions on a computer screen and rotated manually or automatically, for examination by eye.

These instant visualisation techniques are combined with powerful statistical methods and filters, all of which are handled with a mouse-click. Different colours can make this analysis easier, as each sub-group can be labelled with its own unique colour. As such, the view of the data can be changed in an instant, so that researchers are looking only at the specific sub-group that interests them at any given moment. It is easy to add or remove data as necessary, without having to restart and re-analyse the entire data set.

"When you are looking at such a large amount of genetic data, there is bound to be a number of confounding factors that distort the data," says Dr Albrekt. "The ability to remove this noise. is very important for researchers to be sure that they are working with the most reliable data. Advanced data analysis software like Qlucore Omics Explorer makes it easier to make a qualified judgment about the amount of noise present, so that researchers can see true patterns."

With key actions and plots now displayed within a fraction of a second, scientists can perform the research they want and find the results they need instantly - without the wait. This approach has opened up new ways of working with the analysis and has helped to bring the biologists back into the analysis phase.

According to Dr Albrekt, when performing her own research, she typically begins her workflow by coding any interesting factors (and confounding factors) into a single file. She then imports the data and looks at the pattern of samples to search for both anticipated and non-anticipated sub-patterns.

At this point, Dr Albrekt can begin to examine the sub-patterns using the coded factors that she had identified earlier. For example, by using the application's colour function and/or eliminating the factor function. Dr Albrekt can look for any significant differences by using statistical tests.

"We can test the robustness of these findings by using kNN visualisation, randomisation and permutation tools," Dr Albrekt explains. "That way, we can make a decision on which variables to trust, and then annotate any significant variables that we have found and export them for functional analysis using another soft-ware tool."

With the freedom, speed and flexibility provided by this approach, it is now possible to evaluate and test a number of different scenarios and hypotheses in a very short time, and to understand fully the data being examined.

This technique makes it possible for researchers to combine very large quantities of data, and to conduct analysis in ways that were simply not possible before.

"In our studies, we are dealing with very large amounts of data, sometimes between 10 and 100 million data points, which we tend to view as graphics. With other software, these graphics would take a long time to appear, but with the latest data analysis tools, the information is presented instantly," Dr Albrekt says. "As a result, we can be much more creative with our theories, as we can easily test any number of hypotheses in rapid succession."

Although Dr Albrekt is currently using data analysis software to study gene expression micro array data, other researchers have used it to study protein array data, miRNA data, and RT-PCR data as part of their research studies. The software has also been used to analyse protein data from 2D gels, image analysis data, and in fact with any data set of multivariate data of sizes up to 1,000 samples and 100,000 variables, or 1,000 variables and 100,000 samples.

The latest technological advances are making it easier for researchers to compare the vast quantity of genomic data generated, to test different hypotheses and explore alternative scenarios within seconds. Not only that, the latest data analysis software is helping scientists to regain control of the analysis and to realise the true potential of gene expression profiling.

According to Dr Albrekt, her efforts will continue to focus on the Sens-it-iv allergen studies and the cancer research within CREATE Health ­- a strategic centre for translational cancer research.

"In terms of the work we are doing for Sen-it-iv, I feel confident that a successful project outcome will contribute to a reduction in the number of animals required for safety testing and the establishment of more accurate tools for product development," she says.