Technology

Even with Limited Training Data, Machine Learning Models can Produce Consistent Results

Even with Limited Training Data, Machine Learning Models can Produce Consistent Results

The amount and quality of training data that machine learning models are exposed to frequently influences their performance. In general, more diverse and representative data can help improve a model’s reliability and generalization. However, under certain conditions, machine learning models can produce reasonable results with limited training data.

Researchers have figured out how to create dependable machine-learning models that can understand complex equations in real-world scenarios while using far less training data than is typically expected.

Researchers from the University of Cambridge and Cornell University discovered that machine learning models for partial differential equations – a class of physics equations that describe how things in the natural world evolve in space and time – can produce reliable results even when given limited data. Their findings, published in the Proceedings of the National Academy of Sciences, could help researchers build more time- and cost-effective machine learning models for applications like engineering and climate modeling.

Before they can start returning accurate results, most machine learning models require a large amount of training data. To train the model, a human will traditionally annotate a large volume of data, such as a set of images.

Using humans to train machine learning models is effective, but it’s also time-consuming and expensive. We’re interested to know exactly how little data we actually need to train these models and still get reliable results.

Dr. Nicolas Boullé

“Using humans to train machine learning models is effective, but it’s also time-consuming and expensive,” said first author Dr. Nicolas Boullé of the Isaac Newton Institute for Mathematical Sciences. “We’re interested to know exactly how little data we actually need to train these models and still get reliable results.”

Other researchers have been able to train machine learning models with a small amount of data and achieve excellent results, but how they did so is unclear. Boullé and his Cornell University co-authors, Diana Halikias and Alex Townsend, focused on partial differential equations (PDEs) for their research.

“PDEs are like the building blocks of physics: they can help explain the physical laws of nature, such as how the steady state is held in a melting block of ice,” said Boullé, who is an INI-Simons Foundation Postdoctoral Fellow. “Since they are relatively simple models, we might be able to use them to make some generalizations about why these AI techniques have been so successful in physics.”

Machine learning models can produce reliable results even with limited training data

The researchers discovered that PDEs that model diffusion have a structure that can be used to create AI models. “Using a simple model, you might be able to enforce some of the physics that you already know into the training data set to get better accuracy and performance,” Boullé said in a statement.

The researchers developed an efficient algorithm for predicting PDE solutions under various conditions by utilizing the short and long-range interactions that occur. This enabled them to incorporate some mathematical guarantees into the model and determine how much training data was required to produce a robust model.

“It depends on the field, but for physics, we found that you can actually do a lot with a very limited amount of data,” Boullé said. “It’s surprising how little data is required to produce a reliable model. We can use the mathematical structure of these equations to make the models more efficient.”

According to the researchers, their techniques will enable data scientists to open the ‘black box’ of many machine learning models and design new ones that can be interpreted by humans, though further research is required.

“We need to make sure that models are learning the right things, but machine learning for physics is an exciting field – there are lots of interesting maths and physics questions that AI can help us answer,” said Boullé.