Technology

Scientists Successfully Process Complex Solar Data using a Machine Learning Tool

Scientists Successfully Process Complex Solar Data using a Machine Learning Tool

The process of using computers to detect patterns in massive datasets and then making predictions based on what the computer learns from those patterns is known as machine learning. As a result, machine learning is a specific and limited type of artificial intelligence. Full artificial intelligence entails machines that can perform tasks associated with human and intelligent animal minds, such as perception, learning, and problem solving.

For space scientists analyzing vast datasets from increasingly powerful space instrumentation, big data has become a significant challenge. To address this, a team from the Southwest Research Institute created a machine learning tool that efficiently labels large, complex datasets, allowing deep learning models to sift through and identify potentially hazardous solar events. The new labeling tool can be used or adapted to solve other problems involving large datasets.

A machine learning model is a program that can recognize patterns or make decisions based on previously unseen data. In natural language processing, for example, machine learning models can parse and correctly recognize the intent behind previously unseen sentences or word combinations. A machine learning model can be taught to recognize objects in images, such as cars or dogs. Such tasks can be performed by a machine learning model that has been ‘trained’ with a large dataset.

New research demonstrates how convolutional neural networks (CNNs) trained on crudely labeled astronomical videos can be used to improve data labeling quality and breadth while reducing the need for human intervention.

Dr. Subhamoy Chatterjee

Scientists are finding it more difficult to process and analyze relevant trends as space instrument packages collect increasingly complex data in ever-increasing volumes. Machine learning (ML) is becoming an important tool for processing large complex datasets, in which algorithms learn from existing data to make decisions or predictions that can factor in more information at the same time than humans. However, in order to use ML techniques, humans must first label all of the data, which is often a monumental task.

“Labeling data with meaningful annotations is a critical step in supervised machine learning. Labeling datasets, on the other hand, is tedious and time consuming” Dr. Subhamoy Chatterjee, a postdoctoral researcher at SwRI who specializes in solar astronomy and instrumentation and is the lead author of a paper about these findings published in the journal Nature Astronomy, said “New research demonstrates how convolutional neural networks (CNNs) trained on crudely labeled astronomical videos can be used to improve data labeling quality and breadth while reducing the need for human intervention.”

Scientists-Successfully-Process-Complex-Solar-Data-using-a-Machine-Learning-Tool-1
Scientists demonstrate machine learning tool to efficiently process complex solar data

Deep learning techniques, by extracting and learning complex patterns, can automate the processing and interpretation of large amounts of complex data. The SwRI team used solar magnetic field videos to identify areas on the solar surface where strong, complex magnetic fields emerge, which are the main precursors of space weather events.

“We trained CNNs with crude labels, manually verifying only our disagreements with the machine,” explained co-author Dr. Andrés Muoz-Jaramillo, a SwRI solar physicist with machine learning expertise. “The algorithm was then retrained with the corrected data, and the process was repeated until we were all in agreement. While most flux emergence labeling is done by hand, this iterative interaction between the human and the ML algorithm reduces manual verification by 50%.”

Iterative labeling approaches such as active learning can significantly save time, reducing the cost of making big data ML ready. Furthermore, by gradually masking the videos and looking for the moment where the ML algorithm changes its classification, SwRI scientists further leveraged the trained ML algorithm to provide an even richer and more useful database.

“We developed an end-to-end, deep-learning approach for classifying videos of magnetic patch evolution without explicitly supplying segmented images, tracking algorithms, or other handcrafted features,” said SwRI co-author Dr. Derek Lamb, who specializes in the evolution of magnetic fields on the Sun’s surface. “This database will be essential in the development of new methodologies for forecasting the emergence of complex regions conducive to space weather events, potentially increasing the amount of time we have to prepare for space weather.”