Facebook has launched an artificial intelligence that can translate 100 languages |
Facebook has opened up an AI model that is able to translate between 100 languages without first translating it into English as an intermediate step.
The system, called M2M-100, is just a research project at the moment, but it can be used to translate posts for Facebook users who post content in more than 160 languages.
Facebook search assistant Angela Fan said in a blog that AI researchers have been trying for many years to create a single global model that can understand all languages for different tasks.
She added: One model that supports all languages and dialects can help us serve more people better, update translations, and create new experiences for billions of people on an equal footing. We are closer to this goal.
The model is based on a dataset of 7.5 billion sentence pairs in over 100 languages retrieved from the Internet.
The researchers focused on translating many common languages, avoiding rare translations such as Sinhala Javanese, and then divided these languages into 14 different groups based on similarities in language, geography and culture.
This method was chosen because people who use languages with these characteristics are more likely to benefit from each other's translation.
The first group includes the common languages of India such as Hindi, Bengali and Marathi, and all possible language pairs are extracted in each group.
Different language groups relate to a small number of link languages. In the Hindi group, Hindi, Bengali and Tamil are the connecting languages of Hindu Aryan.
Then the team extracted the training data for all relevant language groups and created a dataset of 7.5 billion parallel sentences corresponding to 2,200 translation trends.
For languages that lack data for high-quality translations, researchers use a method called back translation to create synthetic translations that can complement the combined data.
The combination of technology gave rise to the first model of CMM. It can translate between 100 languages without relying on English data.
Fan said: When translating from Chinese into French, most multilingual English language models are trained on Chinese-English translation data and English-French translation data. In fact, training data in English is the most used.
She added: Our model was trained directly on Sino-French translation data to better preserve meaning.