Microsoft Improves Its AI Translations with Z-Code

Microsoft Improves Its AI Translations with Z-Code

Microsoft today announced an upgrade to its translation services that promises dramatically enhanced translations across a wide number of language pairings owing to new machine learning algorithms. These new models frequently score between 3% and 15% better than the company’s prior models during blind evaluations, according to Project Z-Code, which employs a “spare Mixture of Experts” method. 

Z-Code is part of Microsoft’s larger XYZ-Code program, which aims to develop more powerful and useful AI systems by merging text, vision, and audio models across different languages. The term “mixture of experts” isn’t new, but it’s particularly relevant in the context of translation. At its most basic level, the system divides tasks into several subtasks before delegating them to smaller, more specialized models known as “experts.” Based on its own predictions, the model then determines which task to outsource to which expert. You may conceive of it as a model that incorporates numerous more specialized models, simplified.

“We’re making incredible progress with Z-Code because we’re leveraging both transfer learning and multitask learning from monolingual and multilingual data to create a state-of-the-art language model that we believe has the best combination of quality, performance, and efficiency that we can provide to our customers,” said Xuedong Huang, Microsoft technical fellow and Azure AI chief technology officer. As a consequence, a new system has been developed that can now immediately translate between ten languages, eliminating the need for numerous systems. 

Microsoft has lately begun to use Z-Code models to boost additional AI functions, such as entity identification, text summarization, custom text categorization, and key extraction. However, this is the first time it’s employed this method for a translation service. Translation models are notoriously huge, making them difficult to implement in a production context. 

However, the Microsoft team has used a “sparse” approach, activating only a limited number of model parameters per job rather than the entire system. “In the same way that it’s cheaper and more effective to just heat your house at the times of day when you need it and in the places that you routinely use, rather than keeping a furnace running full blast all of the time,” the team adds in today’s release.