A step closer to genome editing with AI

DNA editing at the cellular level is a reality. Technologies are editing out genes that might cause disease in mice or editing in genes that can create high-yield, drought-resistant crops. Genome editing is of great interest in the prevention and treatment of human diseases but scientists are still working to determine whether this approach is safe and effective for use in people. It is being explored in research on a wide variety of diseases and so far, scientists have used it to reduce the severity of genetic deafness in mice, created mushrooms that don’t brown as easily, and edited bone marrow cells in mice to treat sickle-cell anemia.

“Genomics is the study of the function and information encoded in DNA sequences of living cells. A detailed understanding of the relationship between genetic variations and cell function can facilitate the developments of new cures and treatments for various diseases,” said Dr. Hasan Al Marzouqi, Assistant Professor in the Department of Electrical and Computer Engineering.

Genome editing is technology that gives scientists the ability to change an organism’s DNA by adding, removing, or altering genetic material at particular locations in the genome. Several approaches to genome editing have been developed, with the most well-known being clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein 9 (CRISPR-Cas9). This is a faster, cheaper, more accurate, and more efficient method than existing genome editing technology, and was adapted from a naturally occurring genome editing system in bacteria. The bacteria capture snippets of DNA from invading viruses and use them to create DNA segments known as CRISPR arrays; these arrays allow the bacteria to “remember” the viruses, so if they attack again, the bacteria produce RNA segments from the arrays to target the viruses’ DNA. The bacteria then use Cas9 or a similar enzyme to cut the DNA apart, disabling the virus.

In the lab, researchers create a small piece of RNA with a short “guide” sequence that attaches to a specific target sequence of DNA in a genome. The RNA also binds to the Cas9 enzyme; as in bacteria, the modified RNA is used to recognize the DNA sequence, and the Cas9 enzyme cuts the DNA at the targeted location. Once the DNA is cut, the researchers use the cell’s own DNA repair machinery to add or delete pieces of genetic material, or to make changes to the DNA by replacing an existing segment with a customized DNA sequence.

“Precise genome editing holds great potential for significantly improving the way we treat and understand diseases,” said Dr. Al Marzouqi. “Modern genome editing tools like CRISPR-based systems require the design of a guide RNA sequence (gRNA) that binds to an area of interest within the DNA. Guide sequences vary considerably in efficacy and can cause undesired outcomes. Reducing these negative effects paves the way for applying genome editing in humans.”

The reason we aren’t already using CRISPR-Cas9 in fixing all human diseases? While it’s relatively simple and powerful, this technique isn’t perfect. Recent studies have shown this approach to gene editing can inadvertently wipe out and rearrange large swaths of DNA or even trigger cancer. The risks and uncertainties around CRISPR modification are extremely high, meaning most of the scientific community believe experiments in humans are premature.

“The first step in a genome editing experiment is to choose a guide RNA based on on-target and off-target effect predictions,” explained Dr. Al Marzouqi. “Off-target score, an indicator of specificity, estimates the off-target effects which are the number of changes occurring at locations away from the desired target. On-target scores estimate on-target efficiency and are frequently measured using the percentage of indels (insertions or deletions) induced at target sequences. The unpredictable nature of these effects is a major obstacle that prevents the use of genome editing systems in humans.”

The human genome comprises 20,000 genes and more than 3 billion base pairs of the genetic building blocks: adenine, guanine, cytosine and thymine. Trawling through those 3 million base pairs to find repeating sections is time-consuming and meant progress was stalled by the complexity and enormity of the data that needed to be evaluated. With advances in artificial intelligence and machine learning applications, researchers are better able to interpret and act on genomic data.

Rational design rules and modern deep learning techniques­—like residual neural networks, LSTM and attention layers­—will be used to design new prediction systems with improved performance. In addition, machine interpretation models will be utilized to explain the performance of the developed models and provide insights that explain the performance of computational models.

Some of the latest research focuses on resolving the issue of off-target effects—when the tools mistakenly work on the wrong gene because it looks similar to the target gene. AI can help accelerate our understanding of how and why this happens­—and help prevent it.

Collating data points from CRISPR experiments and adding them to machine learning algorithms will further improve an AI system’s accuracy. Predicting the guide performance for different target sequences allows the adjustment of the total number of guides necessary, which maximizes results and lowers costs—and minimizes undesired side effects.

“Deep learning techniques are currently state of the art in numerous machine learning tasks. They achieve remarkable results across many fields including genetic data analysis,” said Dr. Al Marzouqi.

Machines help identify patterns within genetic data sets and then computer models can make predictions about an individual’s odds of developing a disease, responding to interventions, or where DNA might be altered to remove disease susceptibility. We can use AI to decode the genome and determine the best medication therapies for an individual or predict the impact of a gene mutation. We could develop CRISPR-Cas9 techniques to eradicate certain bacteria or viruses like HIV.

“In the future, we plan to extend our investigation to other problems involving the use of machine learning and genomics like protein structure determination and gene expression inference,” said Dr. Al Marzouqi.

Understanding and manipulating the human genome is a daunting task and thus far, breakthroughs have relied on human capabilities. Artificial intelligence is launching us into the future of genome editing.