AI training: A Backward Cat Picture is Still a Cat Picture

Genes constitute only a minor portion of the human genome. There are large regions of DNA between them that tell cells when, where, and how much each gene should be employed. These regulatory motifs are biological instruction manuals. That sounds complicated, and it is.

The instructions for gene regulation are inscribed in a convoluted code that scientists are attempting to decipher using artificial intelligence. Deep neural networks (DNNs), which excel at detecting patterns in vast datasets, are being used to understand the laws of DNA control. DNNs are at the heart of well-known AI technologies such as ChatGPT. Thanks to a new tool developed by Cold Spring Harbor Laboratory Assistant Professor Peter Koo, genome-analyzing DNNs can now be trained with far more data than can be obtained through experiments alone.

“With DNNs, the mantra is more data, better,” adds Koo. “We really need to see a variety of genomes for these models to learn robust motif signals.” However, in certain cases, biology is the limiting factor since we can’t produce more data than exists inside the cell.”

If an AI learns from a small number of instances, it may misunderstand how a regulatory motif affects gene activity. The issue is that certain themes are unusual. There are very few instances in nature.

To address this restriction, Koo and his colleagues created EvoAug, a novel way of supplementing the data used to train DNNs. EvoAug was inspired by an unnoticed dataset: evolution. The process begins by generating artificial DNA sequences that nearly match real sequences found in cells. The sequences are tweaked in the same way genetic mutations have naturally altered the genome during evolution.

With one important assumption, the models are then taught to detect regulatory motifs using the new sequences. It is expected that the great majority of changes will not interfere with the sequences’ functionality. Koo compares this kind of data supplementation to training image-recognition software using mirror pictures of the same animal. The computer realizes that a reverse cat image is still a cat image.

According to Koo, some DNA alterations do affect function. As a result, EvoAug incorporates a second training process that exclusively uses real biological data. This directs the model “back to the biological reality of the dataset,” according to Koo.

Koo’s team discovered that models trained with EvoAug outperform those trained just on biological data. As a consequence, scientists may soon have a greater understanding of the regulatory DNA that dictates the laws of life itself. Ultimately, this might lead to a completely new understanding of human health.

AI training: A Backward Cat Picture is Still a Cat Picture

More Posts

How does the Managed Floating Exchange Rate System work?

Describe Uses of Lanthanides and Actinides

Astronomers devise a novel method for ‘seeing’ first Stars through Early Universe’s Fog

Describe on Electric Power

Music

Cutaneous Respiration System of Toad

Latest Post

Top QS World University Rankings 2024

Nano-oscillator Achieves Record Quality Factor

Not Only Do Opposites Attract: A New Study Demonstrates That Like-Charged Particles Can Come Together

A Breakthrough in Single-photon Integration Shows Promise for Quantum Computing and Cryptography

Could the Sun be Conscious? Enter the Unorthodox World of Panpsychism

The Brains of Conspiracy Theorists Are Different: Here’s How

A New Material might hold the Solution to the Quantum Computing Problem

The Best Ultrawide OLED is Acer’s Slender Predator X45

Solid-state Battery Architecture Charges in Minutes and Lasts for Thousands of Cycles

Birders and AI collaborate to take Bird Protection to the Next Level