Given the speed at which machine translation (MT) is advancing, one topic that excites us at TBSJ is the future of this technology. And that is one of the key reasons why our MT experts, Paul O’Hare, CTO and co-founder, and Yury Sharshov, chief scientist, attended MT Summit 2021—a global biennial event for academic, industry, and government MT specialists. Many speakers reflected on how MT has developed and what the future might hold.
There is no doubt that MT has come a long way. First proposed in 1949 by a researcher exploring successes in code breaking during World War II, rule-based MT saw early, but limited, use in the 1960s and 1970s. Rapid advances in statistical machine translation (SMT) during the 1990s increased government investment in MT after the turn of the century, and access to much more powerful computers, particularly in the last decade, have helped to make neural machine translation (NMT) an indispensable part of modern life.
Extensive research in SMT also saw a shift from one-size-fits-all generic MT to the first specialized engines, and with this, post-editing also became a crucial part of the translation process. The technology has now reached the stage where it can learn from humans and other resources with the assistance of natural language processing (NLP).
At TBSJ, those resources include experienced teams of specialists with in-depth linguistic and technological know-how, as well as computer-aided solutions such as Leveraged AI, our self-developed approach that incorporates translation memory and bespoke AI engines for each client’s exclusive use.
“Machine translation today is based on a specific and very popular type of neural network architecture, which is able to perform many-to-many multilingual translations. The most interesting feature of current MT technology is how easy it is to adopt and integrate with translation tools compared to just a few years ago,” says Yury.
It has also become much easier to train MT engines and adopt open-source tools and technologies in the past few years. In the next few, we expect to see these engines getting smaller and lighter, to the extent that they can be run effectively on mobile phones or any other device with a central processing unit, instead of requiring expensive graphics processing units. As part of MT Summit’s government track, Gerardo Cervantes and Christian Schlesiger of the U.S. Army Research Laboratory spoke about how existing NLP tools and technologies (such as automatic speech recognition, MT, and speech synthesis) can now be put to use on Android devices.
A hot topic at the summit was multi-modal machine translation, where engines are trained using non-text content so that they can better interpret context. In one of the five keynote sessions, Professor Lucia Special of Imperial College London showed how they trained engines with a combination of text and associated pictures. These engines were then used to translate text, and also given an associated picture. Even when words were intentionally removed from the source text to be translated, the engine could fill in the gaps using the picture and produce a usable translation.
Because NMT engines do not “understand” the sentences they are translating, never mind entire paragraphs or documents, they struggle with context. Multi-modal MT is an approach aimed at improving this. At TBSJ, we are investigating how various pieces of multi-modal information can be given to our engines in order to produce more complete English translations when the Japanese source text does not contain all of the required information. We hope to improve how our equity research engines handle pronouns, dates, and omitted subjects. We are also looking into how this approach could improve consistency of defined terms in legal translation.
“There is, of course, a limit to what can be achieved here,” explains Paul. “AI would have a hard time dealing with a monolingual reference file provided by a client or searching the web for the correct reading of a person’s name. That said, translators can find it frustrating to fix simple mistakes (simple for a human, not for a machine) over and over again, and we hope to achieve some improvements in this area with the multi-modal approach.”
Becoming aware of context (and metadata) is a critical step in the evolution of MT as it moves into the next frontier: Responsive MT. This means it is able to adapt to consumer feedback and easily integrate with other environments. “At this level, language service providers face new specific steps and tasks,” says Yury. “TBSJ is aware of new developments and is adapting internal processes to efficiently use the state-of-the-art power of MT.”
In one of the other keynotes—“The Road to Infinity”—Jane Nemcova, former managing director at Moravia and Lionbridge, presented on how the latest innovations in MT have revealed what many of us in the language industry have known all along: “language” is a complicated thing. Natural language has baffled the world of AI, and it has become clear that the expertise required to totally understand natural language is, in fact, infinite.
Jane spoke about AI having now squarely taken root in every aspect of our lives and posed the question of whether machines will eventually replace linguists. She proposed that we should turn our focus towards the needs of the market as well as our obligations to promote the human side of language in preserving culture and individuality.
At TBSJ, MT technology is not meant to replace translators but to make them more efficient. Moreover, the objective is to free language and translation specialists from tasks where MT excels and instead fully utilize their unique linguistic expertise.