Scientists have discovered that powerful algorithms can ‘predict’ the biological language of cancer and neurodegenerative diseases such as Alzheimer’s. Big data collected over decades of research was fed into a computer language model to see if artificial intelligence can outperform humans in making advanced discoveries.
Scientists have discovered that powerful algorithms used by Netflix, Amazon, and Facebook can ‘predict’ the biological language of cancer and neurodegenerative diseases such as Alzheimer’s. Big data collected over decades of research was fed into a computer language model to see if artificial intelligence can outperform humans in making advanced discoveries.
Researchers at the University of Cambridge’s St John’s College discovered that machine-learning technology could decipher the “biological language” of cancer, Alzheimer’s, and other neurodegenerative diseases. Their groundbreaking research was published today (April 8, 2021) in the scientific journal PNAS and could be used in the future to ‘correct the grammatical mistakes inside cells that cause disease.’
Powerful algorithms can ‘predict’ the biological language of cancer and neurodegenerative diseases like Alzheimer’s, scientists have found.
Professor Tuomas Knowles, the paper’s lead author and a Fellow at St John’s College, stated: “The incorporation of machine-learning technology into research into neurodegenerative diseases and cancer is a game changer. Ultimately, the goal will be to use artificial intelligence to develop targeted drugs that significantly alleviate symptoms or prevent dementia from occurring at all.”
When Netflix recommends a series to watch or Facebook recommends someone to friend, the platforms use powerful machine-learning algorithms to make educated guesses about what people will do next. Voice assistants such as Alexa and Siri can even recognize specific people and ‘talk’ back to you.
Dr. Kadi Liis Saar, the paper’s first author and a Research Fellow at St John’s College, used similar machine-learning technology to train a large-scale language model to investigate what happens when proteins inside the body go wrong, causing disease. “The human body is home to thousands and thousands of proteins, and scientists don’t yet know what many of them do,” she said. We asked a neural network-based language model to learn protein language.
“We specifically asked the program to learn the language of shapeshifting biomolecular condensates (protein droplets found in cells), which scientists desperately need to understand in order to crack the code of biological function and malfunction that causes cancer and neurodegenerative diseases like Alzheimer’s. We discovered that it could learn what scientists had already discovered about protein language over decades of research without being explicitly told.”
Proteins are large, complex molecules that play a variety of important roles in the human body. They do the majority of the work in cells and are required for the structure, function, and regulation of the body’s tissues and organs – antibodies, for example, are proteins that protect the body.
Alzheimer’s disease, Parkinson’s disease, and Huntington’s disease are three of the most common neurodegenerative diseases, but scientists believe there are hundreds more. Proteins go rogue, form clumps, and kill healthy nerve cells in Alzheimer’s disease, which affects 50 million people worldwide. A healthy brain has a quality control system that effectively eliminates these potentially dangerous protein masses known as aggregates.
Scientists now believe that some disordered proteins form liquid-like droplets of protein called condensates, which lack a membrane and freely merge with one another. Protein condensates, unlike protein aggregates, can form and reform, and are frequently compared to blobs of shapeshifting wax in lava lamps.
“Protein condensates have recently attracted a lot of attention in the scientific world because they control key events in the cell such as gene expression – how our DNA is converted into proteins – and protein synthesis – how cells make proteins,” Professor Knowles explained.
“Any flaws associated with these protein droplets can result in diseases such as cancer. This is why incorporating natural language processing technology into research into the molecular origins of protein malfunction is critical if we are to be able to correct the grammatical errors that cause disease inside cells.”
“We fed the algorithm all of the data held on the known proteins so it could learn and predict the language of proteins in the same way these models learn about human language and WhatsApp knows how to suggest words for you to use,” Dr. Saar explained. Then we could question it about the specific grammar that causes only some proteins to form condensates inside cells. It’s a difficult problem, and solving it will help us learn the rules of disease’s language.”
Machine-learning technology is evolving at a rapid pace as a result of increased data availability, increased computing power, and technological advances that have resulted in more powerful algorithms.
Further application of machine learning could revolutionize future cancer and neurodegenerative disease research. Discoveries could go beyond what scientists currently know and speculate about diseases, and possibly even beyond what the human brain can understand without the assistance of machine learning.
Dr. Saar elaborated: “Machine-learning can be free of the constraints of what researchers believe are the targets for scientific exploration, implying that new connections will be discovered that we haven’t even considered yet. It’s actually quite exciting.” The network that was created is now freely available to researchers all over the world, allowing more scientists to work on advancements.