<?xml version="1.0" encoding="UTF-8" ?><!-- generator=Zoho Sites --><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><atom:link href="https://blog.icefire.ca/blogs/tag/machine-translation/feed" rel="self" type="application/rss+xml"/><title>The PointFire Blog - The PointFire Blog for Multilingual SharePoint #Machine Translation</title><description>The PointFire Blog - The PointFire Blog for Multilingual SharePoint #Machine Translation</description><link>https://blog.icefire.ca/blogs/tag/machine-translation</link><lastBuildDate>Sun, 13 Jul 2025 12:16:18 -0700</lastBuildDate><generator>http://zoho.com/sites/</generator><item><title><![CDATA[Is GPT better at translating than translation engines?]]></title><link>https://blog.icefire.ca/blogs/post/is-gpt-better-at-translating-than-translation-engines</link><description><![CDATA[<img align="left" hspace="5" src="https://blog.icefire.ca/languages.png"/>Generative Pre-trained Transformer (GPT) systems are not designed to be translation engines. So it is surprising that they succeed so well at doing simple translations. Some articles have claimed that they can translate better than existing translation engines. How true are those claims?]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_tDbLseMrQ1atvh_reNkSWg" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_UVV3AohDQdGTvwDADeeliw" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"> [data-element-id="elm_UVV3AohDQdGTvwDADeeliw"].zprow{ border-radius:1px; } </style><div data-element-id="elm_BTUzPodyT4CO85UyifeZEA" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"> [data-element-id="elm_BTUzPodyT4CO85UyifeZEA"].zpelem-col{ border-radius:1px; } </style><div data-element-id="elm_f3ptpRwjce2pziid40quFg" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_f3ptpRwjce2pziid40quFg"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>Generative Pre-trained Transformer (GPT) systems are not designed to be translation engines.&nbsp; So it is surprising that they succeed so well at doing simple translations.&nbsp; Some articles have claimed that they can translate better than existing translation engines.&nbsp; How true are those claims?</p></div></div>
</div><div data-element-id="elm_wwrYpJJnyynTEBW1VCGNaw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_wwrYpJJnyynTEBW1VCGNaw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><p><span style="color:inherit;">Most of those claims are based on testing with a few sentences chosen by the author, few language pairs, and a qualitative scoring of how good the translation is.&nbsp; However, more systematic evaluations with large samples of more types of text, more languages, and more objective quality scoring by machines and humans tell a different story.</span><br></p></div>
</div><div data-element-id="elm_46LFSwHNzt6gdr8LjAh8NQ" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_46LFSwHNzt6gdr8LjAh8NQ"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>The best comprehensive evaluation was done by scientists at Microsoft Research, unsurprising because they are among the leaders in both machine translation and GPT models.&nbsp; The brief summary is that while GPT models have competitive quality when translating usual sentences from a major (see high-resource below) language to English, they are less good at other types of translation.</p></div></div>
</div><div data-element-id="elm_N7pEJJlKArL38OQeyi1R3Q" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_N7pEJJlKArL38OQeyi1R3Q"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>The evaluation uses three GPT systems that are known for translation quality, and compares them with neural machine translation engines (NMT), either Microsoft Azure API or the best-performing commercial systems or research prototype.&nbsp; Quality scoring uses either algorithmic scoring or human evaluation.</p></div></div>
</div><div data-element-id="elm_hPgLpVu7xvZTX8EVf8QoUg" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_hPgLpVu7xvZTX8EVf8QoUg"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p><a href="https://arxiv.org/pdf/2302.09210.pdf" title="How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation&nbsp;" rel="">How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation</a>&nbsp;<br></p><div><br></div></div></div>
</div><div data-element-id="elm_ivvprsnnnhFmrXoNaiJMfw" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_ivvprsnnnhFmrXoNaiJMfw"].zpelem-heading { border-radius:1px; } </style><h3
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><p>Languages and language direction</p></div></h3></div>
<div data-element-id="elm_OYh0FJo0e1jQwh74F41YlA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_OYh0FJo0e1jQwh74F41YlA"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>In most languages and for language directions, MS Azure Translator and other NMT engines outperform GPT for most measures of quality.&nbsp; However, GPT does have the ability to improve after being given a few examples of correct translations, and to outperform NMT in some language directions after 5 tries.&nbsp; This is the case for translations to English from German, Chinese and Japanese, languages for which there are a lot of examples in the GPT training set.&nbsp; These are called “high-resource” languages.&nbsp; On the other hand, it does not do particularly well for low-resource languages like Czech or Icelandic, or for English to other languages.&nbsp; GPT’s training set had less text in those languages.&nbsp; In the chart below, orange is GPT and blue is NMT.&nbsp; The lines are algorithmic evaluation and the bars are human evaluation.</p></div></div>
</div><div data-element-id="elm_4GnKGtHJ6wbcwAK1NGEjCg" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_4GnKGtHJ6wbcwAK1NGEjCg"] .zpimage-container figure img { width: 624px !important ; height: 398px !important ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_4GnKGtHJ6wbcwAK1NGEjCg"] .zpimage-container figure img { width:624px ; height:398px ; } } @media (max-width: 767px) { [data-element-id="elm_4GnKGtHJ6wbcwAK1NGEjCg"] .zpimage-container figure img { width:624px ; height:398px ; } } [data-element-id="elm_4GnKGtHJ6wbcwAK1NGEjCg"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-original zpimage-tablet-fallback-original zpimage-mobile-fallback-original hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/languages.png" width="624" height="398" loading="lazy" size="original" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_Trbv_VsvOl0UReU8UqOXpA" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_Trbv_VsvOl0UReU8UqOXpA"].zpelem-heading { border-radius:1px; } </style><h3
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><div><p>Sentence-level vs. multi-sentence</p></div></div></h3></div>
<div data-element-id="elm_4PSNPeNPm7jG976XCnkNSQ" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_4PSNPeNPm7jG976XCnkNSQ"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>Those experiments above are for sentence-level translations.&nbsp; For multi-sentence translations, those with more context that can be found in other sentences, GPT improves relative to NMT.&nbsp; Not enough to beat the best NMT systems, but sometimes enough to match or beat the normal Azure API.&nbsp; That is not very surprising: Azure Translator was optimized for sentence-level translation, while GPT is trained for multi-sentence context, up to thousands of words.&nbsp; Other Azure translation APIs like Document Translator and Custom Translator are better at longer context windows, but this is not what was tested here.&nbsp;</p></div></div>
</div><div data-element-id="elm_13pYmoMB8Ri4gFMkS-ersw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_13pYmoMB8Ri4gFMkS-ersw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>Neural Machine Translation models have another big advantage over GPT: they can be re-trained for a particular domain rather than with general domains, and this significantly improves their quality score for that domain.&nbsp; For example, by giving it many training examples of automotive documents, Azure Custom Translator (a re-trainable version of Azure Translator) can increase its translation quality for documents in the automotive domain by a large factor.</p></div></div>
</div><div data-element-id="elm_MiC-NQB9r4oyIjEzaBo6HQ" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_MiC-NQB9r4oyIjEzaBo6HQ"].zpelem-heading { border-radius:1px; } </style><h3
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><div><p>Fluency and alignment</p></div></div></h3></div>
<div data-element-id="elm_OnDZyqsMG88uqB5qjbGWOQ" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_OnDZyqsMG88uqB5qjbGWOQ"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>Looking at other measures of performance gives a better understanding of what is different in the performance of the GPT vs. NMT.&nbsp; Measuring fluency, essentially how natural is the sentence, how similar it is to other sentences out in the world, tells you something about the quality of the prose.&nbsp; GPT is more fluent in English, it sounds more natural.&nbsp; That doesn’t mean that it is a more accurate translation, far from it.&nbsp; In fact, GPT has a greater tendency to add words, concepts, and punctuation that do not correspond to the original, or to omit some.&nbsp; So it’s good prose, but it’s not necessarily what the original said.&nbsp; For example, it does better at figures of speech, by not translating them literally, but also does not necessarily replace it with a term that means exactly the same thing.&nbsp; It does not wander far from the original with completely made-up things, but it’s often not quite correct.&nbsp; However GPT also does hallucinate words or concepts that were not in the original.<br></p><div><div style="color:inherit;"><p><br></p><p><a href="https://arxiv.org/abs/2305.16806" title="Do GPTs Produce Less Literal Translations?" rel="">Do GPTs Produce Less Literal Translations?</a>&nbsp;</p></div></div></div></div>
</div><div data-element-id="elm_em5DmpwzIvEOv7gi25W58w" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_em5DmpwzIvEOv7gi25W58w"] .zpimage-container figure img { width: 463px !important ; height: 127px !important ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_em5DmpwzIvEOv7gi25W58w"] .zpimage-container figure img { width:463px ; height:127px ; } } @media (max-width: 767px) { [data-element-id="elm_em5DmpwzIvEOv7gi25W58w"] .zpimage-container figure img { width:463px ; height:127px ; } } [data-element-id="elm_em5DmpwzIvEOv7gi25W58w"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-original zpimage-tablet-fallback-original zpimage-mobile-fallback-original hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/idiom.png" width="463" height="127" loading="lazy" size="original" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_xBGZxVb8PgHqZB19limLNw" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_xBGZxVb8PgHqZB19limLNw"].zpelem-heading { border-radius:1px; } </style><h3
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><div><p>Translationese</p></div></div></h3></div>
<div data-element-id="elm_2e2-K-qRJowsZrrZcmfFOA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_2e2-K-qRJowsZrrZcmfFOA"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>A literal and faithful translation is sometimes required, but that often leads to what is called “translationese”.&nbsp; This refers to a set of common issues with text generated by human translators.&nbsp; Translationese can refer to excessive precision or wordiness, or excessive vagueness in translated text, or syntax that is uncommon in the target language.&nbsp; What translators are compensating for is the fact that different languages are specific about different things.&nbsp; For example, English has the term “uncle”, which does not differentiate between paternal and maternal uncle or uncles by blood or marriage, but many other languages are much more specific.&nbsp; Translating from those other languages to English, Translationese would not say “uncle” but might say “maternal uncle by marriage”, a term that is unusual in English, but which avoids losing information that was contained in the original.&nbsp; In terms of translation quality, humans who are not translators might rate the translation with “uncle” higher because it sounds more natural, but translators would rate the awkward translation higher because it is more accurate.</p><p><br></p><p><span style="color:inherit;"><a href="https://arxiv.org/abs/2104.07623" title="Sometimes We Want Translationese" rel="">Sometimes We Want Translationese</a></span><br></p></div></div>
</div><div data-element-id="elm_1v3UqmUWI0PY_kDgHNpFEg" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_1v3UqmUWI0PY_kDgHNpFEg"].zpelem-heading { border-radius:1px; } </style><h3
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><div><p>Design and training of GPT and NMT</p></div></div></h3></div>
<div data-element-id="elm_TrxEYA97NeXggnNFOH0fDw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_TrxEYA97NeXggnNFOH0fDw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><div style="color:inherit;"><p>There are big differences between the texts used to train GPT engines and the texts used to train NMT engines.&nbsp; GPT engines are trained on unilingual text found on the internet, mostly in English.&nbsp; For any sequence of words, GPT learns the most likely next word.&nbsp; NMT engines are trained on curated professionally translated sentences, pairs of original sentences and their translations.&nbsp; For all the curation, these data sets are often noisy and include incorrect translations that set back the training.&nbsp; For any sentence within a document in the source language, NMT predicts the translated sentence.&nbsp; This is part of the reason why NMT learns to produce translationese and GPT does not: it’s in the training set.</p></div></div></div>
</div><div data-element-id="elm_C4pvHsnS1vHjjdqyrrTqOw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_C4pvHsnS1vHjjdqyrrTqOw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>The design of the two types of models is also different. The “T” in GPT stands for “Transformer”.&nbsp; Transformer is an attention-based neural network model that when looking at a word within a sentence or even longer text, determines which other words are the most relevant ones to pay attention to.&nbsp; NMT also uses Transformer models.&nbsp; However, there are big differences.&nbsp; One is that GPT uses Decoder models, while NMT uses Encoder-decoder models.&nbsp; What does that mean?&nbsp; Decoder models focus on the output, the next word to be spit out.&nbsp; Encoder-decoder models try to extract features from the input before feeding it to the part of the model that predicts the output.&nbsp; It focuses separately on the input and on the output.&nbsp; It tries to be robust to small changes in the input.</p></div></div>
</div><div data-element-id="elm_ZFFoET6GwZJ40lJm9kkBrA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_ZFFoET6GwZJ40lJm9kkBrA"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>GPT only outputs one word at a time.&nbsp; It starts with the text of the prompt plus other information in the input, then outputs a single word.&nbsp; Then it adds that word to the end of the prompt in the input and puts this new input through again to get the next word.&nbsp; It is unidirectional, that is to say when it is generating text it only looks at the previous words that are already generated, it doesn’t consider what it will say next because it hasn’t said it yet. Like a lot of humans, GPT is more concerned with what it wants to say next than with what you’re saying. &nbsp;NMT is bidirectional.&nbsp; It considers the rest of the sentence and the next sentence in both the source and the translation while it is generating the text.&nbsp; It generates entire sentences at once.</p></div></div>
</div><div data-element-id="elm_yIisgdweYhX48cbBcwa4wg" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_yIisgdweYhX48cbBcwa4wg"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>Because it is a generative model, GPT is biased towards what is usual.&nbsp; If the original text in the other language is commonplace and expected, then GPT will find good ways to express that text in English in ways that are commonplace and expected, because that is what it is trained to do.&nbsp; If the original says something that is unexpected or expresses it in unexpected ways, GPT’s translation is likely to replace it with something more usual using some of the same words.&nbsp; GPT does well at translation essentially because most things that require translation are predictable and unoriginal.</p></div></div>
</div><div data-element-id="elm_XU06fCYNGFWKXyoER9rTUw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_XU06fCYNGFWKXyoER9rTUw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>NMTs have a whole bag of tricks to deal with translation tasks that GPT does not, including specialized knowledge about the structure of languages, and tricks to deal with numbers, capitalization, and non-standard spacing correctly and efficiently.&nbsp; They are also trained to preserve information.&nbsp; You know that trick that people sometimes use, translating a sentence to another language then back to English so they can laugh at the result?&nbsp; NMTs include that round-trip in their training, to make sure that none of the meaning gets lost in the translation.&nbsp; Other tricks include having the neural network teach another neural network how to translate, detecting errors in the training data, and other tricks that address common translation errors.&nbsp; There are also tricks to reduce gender bias, a problem that still plagues GPT.</p></div></div>
</div><div data-element-id="elm_36dwuF6_ONEOJ5UGuuNZhw" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_36dwuF6_ONEOJ5UGuuNZhw"].zpelem-heading { border-radius:1px; } </style><h3
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><div><div><p>Computing power required</p></div></div></div></h3></div>
<div data-element-id="elm_968TT3XKXbL3DR-Ht9w_3g" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_968TT3XKXbL3DR-Ht9w_3g"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>Current Azure Translator NMT uses models of about 50 million parameters, which can run 4 language pairs in a Docker container on a host having a 2-core CPU with 2 GB memory.&nbsp; Even the next generation of NMTs, <a href="https://blog.icefire.ca/blogs/post/how-to-get-a-higher-level-of-machine-translation-quality" title="Z-code MoE" rel="">Z-code MoE</a>, which have 100 languages (10,000 language pairs) in a single model, can fit on a single GPU even though they have billions or hundreds of billions of parameters.&nbsp; These are sizes for querying, what is required for training is much bigger.&nbsp; GPT-4 uses 100 trillion parameters.&nbsp; Training requires hundreds of thousands of CPUs and tens of thousands of GPUs, but to query them, it looks like a single cluster of 8 GPUs and a dozen or two CPUs is what is required.&nbsp; Microsoft is very good at shrinking by orders of magnitude the size of machines required to run AI models so direct comparison is difficult, but NMTs deliver translations at much lower computational cost.&nbsp; Microsoft’s DeepSpeed library in particular increases speed and reduces latency by a large factor.</p><p><br></p><p>The computing power required also has a potential impact on security.&nbsp; NMTs, even the bigger potential NMTs can be run on a single processor, while GPT requires many processors.&nbsp; Using GPT you are probably sharing hardware with strangers, while for NMT it is possible to have dedicated resources.&nbsp; Because of its recurrent architecture, where the output is fed back into the input, GPT probably has some static storage of your data, while NMT can be architected with a pipeline where neither the input nor output text is ever stored.&nbsp; I don't know how it is implemented by anyone, but I notice that for Azure, NMT has a no-trace option by default while GPT limited access previews do not.&nbsp; Because of ethical concerns, data is probably retained for abuse monitoring.&nbsp; I'm sure the security is good, but the architecture reduces the options for security.</p></div></div>
</div><div data-element-id="elm_azlB0lmtdPdOlibqoksLHQ" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_azlB0lmtdPdOlibqoksLHQ"] .zpimage-container figure img { width: 1110px ; height: 713.35px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_azlB0lmtdPdOlibqoksLHQ"] .zpimage-container figure img { width:723px ; height:464.64px ; } } @media (max-width: 767px) { [data-element-id="elm_azlB0lmtdPdOlibqoksLHQ"] .zpimage-container figure img { width:415px ; height:266.70px ; } } [data-element-id="elm_azlB0lmtdPdOlibqoksLHQ"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-fit zpimage-tablet-fallback-fit zpimage-mobile-fallback-fit hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/performance.jpg" width="415" height="266.70" loading="lazy" size="fit" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_DOtJj3rG9dVwXmZBaX0S3A" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_DOtJj3rG9dVwXmZBaX0S3A"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true">Conclusion</h2></div>
<div data-element-id="elm_XFr0lnW8TS4_CtxFNGC9gg" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_XFr0lnW8TS4_CtxFNGC9gg"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>The blanket claim that GPT is better at translation in not generally true.&nbsp; However GPT is surprisingly good, considering that translation is a task that it was neither designed nor trained for.&nbsp; It is unexpected that it is sometimes equal to or better than the highly specialized NMTs.&nbsp; There is a fair bit of work being done on hybrid systems that combine the accuracy and specialized training of NMT with the fluency of GPT and will deliver the best of both. The next generation of NMT (see <span style="color:inherit;"><a href="https://blog.icefire.ca/blogs/post/how-to-get-a-higher-level-of-machine-translation-quality" title="How to get a higher level of machine translation quality" rel="">How to get a higher level of machine translation quality</a></span><span style="color:inherit;">) will also allow the model to transfer language knowledge obtained from one language to other related languages, and in that way vastly improve the quality for low-resource languages such as southern Slavic languages.&nbsp; That innate knowledge of what is common to languages in the same family can then be used to improve the quality of both NMT and GPT.</span></p><div><br></div><div><div style="color:inherit;"><p><a href="https://arxiv.org/abs/2309.11674" title="A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models" rel="">A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models</a>&nbsp;<br></p><div><br></div></div></div></div></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Fri, 06 Oct 2023 09:30:00 -0400</pubDate></item><item><title><![CDATA[How to get a higher level of machine translation quality]]></title><link>https://blog.icefire.ca/blogs/post/how-to-get-a-higher-level-of-machine-translation-quality</link><description><![CDATA[<img align="left" hspace="5" src="https://blog.icefire.ca/Z-code-model-improvement.png"/>What can you do to ensure the highest level of quality in Machine Translation of pages and documents in Microsoft SharePoint? This blog post details all the steps to follow and different problems and their solutions]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_AjkHLZCBQ-iNaI_HQ5KNSw" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_LKRYFxFTSZaqvDcBJt5BIg" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"></style><div data-element-id="elm_vL5D9nZ1S6SfIHmapowXZA" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"></style><div data-element-id="elm_bFHzWy19Tt-eRiB1PPc9PQ" data-element-type="text" class="zpelement zpelem-text "><style></style><div class="zptext zptext-align-center " data-editor="true"><div style="color:inherit;"><div style="color:inherit;"><p style="text-align:left;"><span style="font-size:16px;">Clients often ask me what they can do to ensure the highest level of quality in Machine Translation of pages and documents.&nbsp; This blog post applies particularly to users of PointFire Translator, but since the translation technology that we use is similar to that of other products that follow the leading edge of machine translation technology, most of the advice below applies to other machine translation scenarios.&nbsp; If you are using the machine translation of PointFire on premise or SharePoint on premise, that is much older technology, and the advice would be a bit different.</span></p><p style="text-align:left;font-size:11pt;"><br></p><p style="text-align:left;"><span style="font-size:16px;">The first thing that you need to do is determine your tolerance for errors in different types of documents and the time and money you are willing to spend to correct the errors.&nbsp; Not all documents need to be translated with the same level of quality.&nbsp; Some of them are fine if you get the gist, while others must be very faithful to the original.</span></p><p style="text-align:left;font-size:11pt;"><br></p><p style="text-align:left;"><span style="font-size:16px;">A good human translator will charge about 10-50 cents per word depending on the languages and the type of text.&nbsp; Azure Translator text API will charge about 0.02 cents per word, Azure Translator document API will charge about 0.04 cents and Azure Custom translator will charge 0.1 cents.&nbsp; These are round figures, but the cost savings from machine translation is considerable.</span></p><p style="text-align:left;font-size:11pt;"><br></p><p style="text-align:left;"><span style="font-size:16px;">How accurate are the translations from machine translation?&nbsp; That is a complex subject, but in the past decade we have gone from statistical machine translation which was still barely better than gibberish except for languages from the same family, to the current situation where translation for many language pairs including English to Chinese and back, has reached what is called ”human parity”, to the next generation of machine translation based on Z-code MoE models (more on that below), with dramatic improvements in many language pairs.</span></p></div></div></div>
</div><div data-element-id="elm_r2KnMymHcmv4TCkr0ahWYg" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_r2KnMymHcmv4TCkr0ahWYg"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><span style="font-family:&quot;Work Sans&quot;;font-size:20px;font-weight:700;color:rgb(11, 27, 45);">What does &quot;human parity&quot; mean?</span><br></h2></div>
<div data-element-id="elm_O_PakDiCy0zZw1xOa7r8ww" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_O_PakDiCy0zZw1xOa7r8ww"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>Let us stand back and try to understand what the term “human parity” means and what it doesn’t.&nbsp;</p><p>First, the humans in question are not professional translators.&nbsp; Professional human translators are still better.&nbsp; The tests that justified the use of that term came from an annual competition where both humans and machines translated sentences from news stories.&nbsp; Journalistic style is relatively simple, and the machines had trained on other news stories, so that makes it easier.&nbsp; The two sets of translations, human and machine, were then scored independently for quality.&nbsp; Microsoft Azure Translator obtained as good a score as the average human.&nbsp; But remember, the humans in this comparison were probably bilingual computer science students, as were the evaluators.&nbsp; If the testing had been done with professional translators doing the translation and doing the rating, it would not have been a tie, as other studies have shown.&nbsp; “Human parity” is a good benchmark, but it is far from perfection.&nbsp; There are still errors.</p></div></div>
</div><div data-element-id="elm_t27sTJ2gip6GuGFciEXZAA" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_t27sTJ2gip6GuGFciEXZAA"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><span style="font-family:&quot;Work Sans&quot;;font-weight:700;color:rgb(11, 27, 45);font-size:20px;">What are these human parity errors and what can we do about them?</span><br></h2></div>
<div data-element-id="elm_3ADNPzqpJYqFU8fVvYejTQ" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_3ADNPzqpJYqFU8fVvYejTQ"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><div style="color:inherit;"><p><span style="font-size:16px;">Different language pairs will have different types of errors.&nbsp; Again, depending on the types of errors you are getting and how badly you need to correct them, there are different things that you can do.&nbsp; Take negation errors for example.&nbsp; These are easily noticeable by users.&nbsp; A sentence that contains a negative in the source language may be incorrectly missing the negation in the target language, or vice-versa, or may have incorrect scoping: does the not/nicht/kein apply to one adjective, to the whole clause, or to the verb?&nbsp; Some languages handle negation differently, and it is not necessarily clear to the translation engine what is being negated.&nbsp; It is a particular problem between English and either Russian, Lithuanian, or German.&nbsp; Part of the reason is that training sets used to create and tune translation engines don't contain a lot of negation.</span></p><p><span style="font-size:16px;"><br></span></p><p><span style="font-size:16px;">Negation is only one category of errors among many others, but a very noticeable one.&nbsp; Linguistic and statistical measures of quality will say the translation is very, very close, it’s only off by one word, but the client will say not it’s not, it’s as wrong as it could be, it’s the opposite.&nbsp; It is a very simple error to correct, it requires very little editing time to change it.&nbsp; Humans and machines have very different evaluation criteria.</span></p><p><br></p></div></div></div>
</div><div data-element-id="elm_uJ16oPTKVh6tmjrZkMZ32g" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_uJ16oPTKVh6tmjrZkMZ32g"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><p><span style="font-family:&quot;Work Sans&quot;;font-size:20px;font-weight:700;color:rgb(11, 27, 45);">Four things you can do to improve translation quality in general</span></p></div></h2></div>
<div data-element-id="elm_xx8Pxbuy6wb9y0ZIKV-qMg" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_xx8Pxbuy6wb9y0ZIKV-qMg"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><ol><li style="text-align:left;">Implement a&nbsp;<b>post-editing</b>&nbsp;step that looks particularly for certain kinds of common translation errors before publishing</li><li style="text-align:left;">Switch to&nbsp;<b>Azure Custom Translator</b>&nbsp;and train it on a corpus of your own documents and phrases so that it learns your vocabulary and your style</li><li style="text-align:left;">Participate in the by-invitation pilot of the&nbsp;<b>Z-Code MoE</b>&nbsp;translation engine</li><li style="text-align:left;">Change the&nbsp;<b>style guide of source documents</b>&nbsp;so that they are written in a way that is easier to translate</li></ol></div></div>
</div><div data-element-id="elm_KeCel1nQmnQ4vVLUeMUA-w" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_KeCel1nQmnQ4vVLUeMUA-w"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><span style="font-family:&quot;Work Sans&quot;;font-size:16px;font-weight:700;color:rgb(11, 27, 45);"><span>1.</span>&nbsp;<b><span>Post editing</span></b></span><br></h2></div>
<div data-element-id="elm_6Hvk1SJb4Hu2Rra3s8f7UA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_6Hvk1SJb4Hu2Rra3s8f7UA"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><div style="color:inherit;"><p><span style="font-size:16px;">Post editing is the simplest and most robust solution.&nbsp; It means making changes to the translated page or document.&nbsp; It should be implemented whether or not you implement some of the other solutions.&nbsp; Machine translation is not perfect, so <u>limited</u> post-editing is good practice.&nbsp; It is still significantly cheaper than professional translation or editing and may be cheaper than Azure Custom Translator.&nbsp; PointFire Translator by default saves all documents, pages, and items as drafts, and someone should revise that draft before it is published.&nbsp; This advice is not what most people want to hear, they want a technology solution because they don’t have the in-house expertise to translate, and would like the technology to solve the problem without involving them.&nbsp; Sorry.</span></p><p><span style="font-size:16px;"><br></span></p><p><span style="font-size:16px;text-decoration-line:underline;">Who should do this?</span></p><p><span style="font-size:16px;">The obvious answer may not be the best one.&nbsp; “We have someone in the office who speaks both languages”.&nbsp; No matter how clever they are, efficiently editing documents in their other language may not be in their skill set.&nbsp; And you have to make sure that someone’s ethnicity or cultural background does not mean that they get extra tasks that do not contribute to their career progression or career goals.</span></p><p><span style="font-size:16px;">What is this “limited editing” that I mentioned?&nbsp; That is actually very challenging even for professional editors and translators.&nbsp; You want to limit the time that is spent in this editing step, otherwise you will end up spending more on post-editing than you would have for professional translators.&nbsp; Remember, professional translators have access to machine translation too.&nbsp; Today’s translators are already only charging you for machine translation plus the cost of post-editing, and they are very good and quick at doing this.&nbsp; They have glossaries, translation memory databases, and previous translations at their fingertips and they do this all day. The cost advantage that you have over them comes from the fact that they’re perfectionists and you’re not.&nbsp;</span></p><p><span style="font-size:16px;"><br></span></p><p><span style="font-size:16px;">Limited post-editing means you correct some errors that are show stoppers and easy to correct, like negation errors, and let other errors slide.&nbsp; You find poor phrasing, not quite the right word, or errors in agreement between the noun and the adjective?&nbsp; These are errors that do not require reading the original text in a different language in order to correct, and everyone reading it knows what the author meant to say.&nbsp; And those errors are more time-consuming to correct.&nbsp; I know it’s hard to see an error on a page that you are editing and leave it there, and not everyone agrees with this advice, but this is how you control costs.&nbsp; Given the current quality of machine translation, for most documents you will not even hit the edit button, you will click on publish if you can resist the temptation to fiddle with the text until it’s perfect.&nbsp; Different types of documents require different edit rules. If the text is describing a procedure that must be exact, spend more time ensuring that it is exact.&nbsp; It’s like a food processing assembly line.&nbsp; There are some acceptable defects, but also some that cannot be tolerated no matter the cost.</span></p><p><span style="font-size:16px;"><br></span></p><p><span style="font-size:16px;">As we will see below, certain errors are difficult to prevent by other methods.&nbsp; Post editing is the best way to correct them, but if too much post-editing is required or too many errors are missed, professional translators may be a safer or less costly alternative.</span></p></div></div></div>
</div><div data-element-id="elm_7z63fPqCuIWe9VQM-lAlGw" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_7z63fPqCuIWe9VQM-lAlGw"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><span style="color:rgb(11, 27, 45);font-family:&quot;Work Sans&quot;;font-size:16px;font-weight:700;">2. Custom Translator</span></h2></div>
<div data-element-id="elm_9Rd0WWQOyVmS-enwOTGzcg" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_9Rd0WWQOyVmS-enwOTGzcg"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><div style="color:inherit;"><p><span style="font-size:16px;">While Azure Translator uses the same translation engine for everyone, <u>Azure Custom Translator</u> is an alternative engine that you can re-train on your documents and vocabulary.&nbsp; It can increase translation accuracy by a few points, particularly for specialized domains.&nbsp; You’re in the automotive industry?&nbsp; Tell Custom Translator and your score will already improve over the default engine.</span></p><p><span style="font-size:16px;"><br></span></p><p><span style="font-size:16px;">Unfortunately, re-training a translation model, the main component of the translation engine, is not very simple and it requires some knowledge of linguistics and statistics, plus a lot of data and data cleaning, to get the advantage of having your own model.&nbsp; The new preview interface of Azure Custom Translator is a big improvement, but it still needs some work.&nbsp; I may blog more in the future about how to use it, but here is a brief introduction.</span></p></div></div></div>
</div><div data-element-id="elm_al7R8ERiPoaZm3Ox0k3YfQ" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_al7R8ERiPoaZm3Ox0k3YfQ"] .zpimage-container figure img { width: 606px !important ; height: 502px !important ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_al7R8ERiPoaZm3Ox0k3YfQ"] .zpimage-container figure img { width:606px ; height:502px ; } } @media (max-width: 767px) { [data-element-id="elm_al7R8ERiPoaZm3Ox0k3YfQ"] .zpimage-container figure img { width:606px ; height:502px ; } } [data-element-id="elm_al7R8ERiPoaZm3Ox0k3YfQ"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="left" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-left zpimage-size-original zpimage-tablet-fallback-original zpimage-mobile-fallback-original hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/Microsoft%20Azure%20Custom%20Translator.png" width="606" height="502" loading="lazy" size="original" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_Rw5kfiVGt5242iWqM0spQw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_Rw5kfiVGt5242iWqM0spQw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p><span style="font-size:16px;">First you need to have at least 10,000 professionally translated sentences or terms, in a supported format and following the naming conventions.&nbsp; This is the learning set.&nbsp; You can upload parallel documents in both languages, and it has some tools to help you align the sentences, but it’s not as good as you’d think and you have to revise carefully.&nbsp; If an extra line is inserted, every sentence after that will be mis-aligned.&nbsp; You also have to remove the sentences that are not quite the same.&nbsp; Translators often split or combine sentences when they translate, it may be more natural sounding that way, but it will confuse the translation engine training, best to remove those.&nbsp; Similarly, some translations, although correct, are not a good example for a machine to train on. For example, if the text is talking about pangrams,&nbsp; sentences that use all 26 letters, “the quick brown fox jumped over the lazy dog” can be translated to “Portez ce vieux whisky au juge blond qui fume” (Carry this old whiskey to the blonde judge who smokes).&nbsp; While it’s correct in this context, having a model learn from this translation would probably interfere with the model’s ability to translate other text about canids.&nbsp; Training sets have to curated to be most effective.&nbsp; How about idioms, expressions that do not translate literally?&nbsp; Those are good, training sets should include them so that it will know later how to translate those in a way that is not literal.</span></p><p><span style="font-size:16px;"><br></span></p><div style="color:inherit;"><p><span>Besides those 10,000 sentence pairs, you will also need a testing set and a tuning set.&nbsp; To simplify the process a bit, neural network models train on a training set, but you also need a set of sentences to test on.&nbsp; If you over-train a neural network translation model, it will become very good with the sentences it has seen, but at some point it becomes worse at the sentences that it hasn’t seen.&nbsp; It is memorizing the training sentences, but not generalizing.&nbsp; You have to stop the training when it gets to the highest quality score, and before it gets worse.&nbsp; The measure of quality that it is optimizing is called the <a href="https://docs.microsoft.com/en-us/azure/cognitive-services/translator/custom-translator/what-is-bleu-score" title="BLEU score" target="_blank" rel="">BLEU score</a>.&nbsp; Again simplifying a bit, the BLEU score looks at the translation that the model produced and compares it to the translation that a professional translator provided. The score is a purely numerical comparison.&nbsp; For each sequence of 4 words in your translation, does that sequence of 4 words appear in the reference professional translation?&nbsp; How about for sequences of 3, 2, or 1 words?&nbsp; The score has other factors as well, but that is the essential.</span></p><p><span style="font-size:16px;text-decoration-line:underline;"><br></span></p><p></p><div style="color:inherit;"><div style="color:inherit;"><p><span style="font-size:16px;text-decoration-line:underline;">A couple of notes about the BLEU score:</span></p><ul><li><span>Professional translators don’t get perfect marks because one translator can phrase things differently from other translators.&nbsp; They will get maybe 50 out of 100.</span></li><li><span>BLEU score is not good at punishing long-range errors, things that are more than 4 words apart like negation for instance</span></li><li><span>It doesn’t care about meaning or synonyms. If you use the word “large” and the translator used the word “big”, you get the same score as if you had used “small” or even “but” or “blarg”</span></li></ul><div><div style="color:inherit;"><div style="color:inherit;"><p><span style="font-size:16px;"><br></span></p><p><span style="font-size:16px;">For better or for worse that score is what Azure Custom Translator is trying to maximize.&nbsp; It starts out with a good general model, and then it modifies it a using the training data that you provide.&nbsp; The “tuning set” of training data is especially crucial.&nbsp; It has to be very representative of the translations that you will carry out later, and a lot of the quality of the model depends on having the correct distribution of and range of documents in this set.</span></p><p><span style="font-size:16px;"><br></span></p><p><span style="font-size:16px;">How much more translation accuracy you can get by using Custom Translator depends on the language pair and on the type of text.&nbsp; For example, English-German already has very high translation quality without re-training, so the improvement may be smaller than other languages.&nbsp; Version 1 of Custom Translator improved the accuracy of English-German over the normal Azur Translator by a few points of BLEU score, much more in the automotive field with a medium number of documents. “Medium” in this case means 50-100 thousand professionally translated sentences.</span></p></div></div></div></div></div></div></div></div>
</div><div data-element-id="elm_7bhcNtLxLD5l9eR9tTdtCw" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_7bhcNtLxLD5l9eR9tTdtCw"] .zpimage-container figure img { width: 800px ; height: 390.24px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_7bhcNtLxLD5l9eR9tTdtCw"] .zpimage-container figure img { width:500px ; height:243.90px ; } } @media (max-width: 767px) { [data-element-id="elm_7bhcNtLxLD5l9eR9tTdtCw"] .zpimage-container figure img { width:500px ; height:243.90px ; } } [data-element-id="elm_7bhcNtLxLD5l9eR9tTdtCw"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="left" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-left zpimage-size-large zpimage-tablet-fallback-large zpimage-mobile-fallback-large hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/Machine-translation-bleu-score.png" width="500" height="243.90" loading="lazy" size="large" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_TDYUdk6t8COGKQYMU2Q2Sw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_TDYUdk6t8COGKQYMU2Q2Sw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><p><span style="color:inherit;">You can find more details <a href="https://www.microsoft.com/en-us/translator/blog/2018/05/07/customtranslator/" title="here" target="_blank" rel="">here</a></span><br></p><p><br></p><div style="color:inherit;"><p><span style="font-size:16px;">Version 2 of the Custom Translator, which includes “human parity” models that PointFire uses, improved even further particularly for languages where Version 1 had not performed as well like Korean and Hindi, although less so for languages where it was already good like German.</span></p></div></div>
</div><div data-element-id="elm_HCYIDBrOuHTLavvs39uS1g" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_HCYIDBrOuHTLavvs39uS1g"] .zpimage-container figure img { width: 800px ; height: 451.13px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_HCYIDBrOuHTLavvs39uS1g"] .zpimage-container figure img { width:500px ; height:281.95px ; } } @media (max-width: 767px) { [data-element-id="elm_HCYIDBrOuHTLavvs39uS1g"] .zpimage-container figure img { width:500px ; height:281.95px ; } } [data-element-id="elm_HCYIDBrOuHTLavvs39uS1g"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="left" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-left zpimage-size-large zpimage-tablet-fallback-large zpimage-mobile-fallback-large hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/translator-human-parity-evaluation.png" width="500" height="281.95" loading="lazy" size="large" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_4DiNdbzmzranzhB7Auye1Q" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_4DiNdbzmzranzhB7Auye1Q"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><p>More information <a href="https://www.microsoft.com/en-us/translator/blog/2020/08/05/custom-translator-v2-is-now-available/" title="here" target="_blank" rel="">here</a></p><p><br></p><div style="color:inherit;"><p><span style="font-size:16px;">Depending on the language and the field, particularly how your specialized vocabulary differs from the vocabulary that Microsoft used in its training, typically government documents crawled from the web, you can get small to significant improvements in translation quality by training your own model in Azure Custom Translator. However, it requires significant investment in time and resources to set up, and Microsoft charges 4-5 times what it charges for the regular translation engine.&nbsp; It is not for everyone.</span></p></div></div>
</div><div data-element-id="elm_SMVGEGtgKib5mh1B0WwI4g" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_SMVGEGtgKib5mh1B0WwI4g"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><span style="font-family:&quot;Work Sans&quot;;font-size:16px;font-weight:700;color:rgb(11, 27, 45);"><span>3.</span>&nbsp;<b><span>Z-code MoE models</span></b></span><br></h2></div>
<div data-element-id="elm_QkyHWUBroZTYPmGc55bSAw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_QkyHWUBroZTYPmGc55bSAw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p><span style="font-size:16px;">Z-code is part of Microsoft’s larger ambitious “XYZ-code” initiative.&nbsp; It takes advantage of new technology for massive, massive neural networks of a scale barely envisioned a few years ago, billions or even hundreds of billions of parameters, made possible with the Microsoft DeepSpeed library <a href="https://www.deepspeed.ai/">https://www.deepspeed.ai/</a>.&nbsp; It’s thousands of times bigger than current translation models.&nbsp; Rather than separately training individual models for English-French, English-Hungarian, English-Chinese, etc., it trains one massive model that learns about all language pairs at once.&nbsp; In order to avoid duplication, the model teaches itself about features that are common to families of languages, and features that are common to all written human languages.&nbsp; That way, even if it was not given enough examples from a particular language pair, it can extrapolate from examples in other related languages, and most likely get it right.&nbsp; It is called MoE (mixture of experts) because it incorporates specialized competing smaller models, called “experts”, each of which may propose an answer but each of which specializes in a type of problems, and it has another component that is trained to be good at determining which of the experts will be right under different circumstances.</span></p><p><span style="font-size:16px;"><br></span></p><p><span style="font-size:16px;"></span></p><div style="color:inherit;"><p><span style="font-size:16px;">Those engines went live in March 2022 and are available by invitation. &nbsp;They will probably be generally available some time next year. &nbsp;They are particularly good at languages for which there is a smaller training corpus, for example southern Slavic languages like Slovenian, Bosnian, and Bulgarian.&nbsp; If your language pair is among the ones that get a significant improvement in quality, it’s worth trying to get an invitation.&nbsp; At the moment, this can not be combined with Custom Translator.</span></p></div></div></div>
</div><div data-element-id="elm_zbYnt6IXGFBNDbX1lUkSEw" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_zbYnt6IXGFBNDbX1lUkSEw"] .zpimage-container figure img { width: 800px ; height: 448.95px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_zbYnt6IXGFBNDbX1lUkSEw"] .zpimage-container figure img { width:500px ; height:280.59px ; } } @media (max-width: 767px) { [data-element-id="elm_zbYnt6IXGFBNDbX1lUkSEw"] .zpimage-container figure img { width:500px ; height:280.59px ; } } [data-element-id="elm_zbYnt6IXGFBNDbX1lUkSEw"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="left" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-left zpimage-size-large zpimage-tablet-fallback-large zpimage-mobile-fallback-large hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/Z-code-model-improvement.png" width="500" height="280.59" loading="lazy" size="large" data-lightbox="true"/></picture></span><figcaption class="zpimage-caption zpimage-caption-align-left"><span class="zpimage-caption-content">https://www.microsoft.com/en-us/research/blog/microsoft-translator-enhanced-with-z-code-mixture-of-experts-models/</span></figcaption></figure></div>
</div><div data-element-id="elm_OQwd_SnAb6N02BZB_ruPHg" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_OQwd_SnAb6N02BZB_ruPHg"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p><span style="font-size:16px;">In the diagram above, the percent improvement seems to be improvement in BLEU score.&nbsp; According to other research, this class of models is particularly good for correcting negation errors in German.&nbsp; Note that 2) and 3) cannot be used together.</span></p></div></div>
</div><div data-element-id="elm_7aVIN8NpeAfTGVU-IdT5tA" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_7aVIN8NpeAfTGVU-IdT5tA"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><span style="font-family:&quot;Work Sans&quot;;font-size:16px;font-weight:700;color:rgb(11, 27, 45);">4.&nbsp;<b>Write the original with a style that is easier to translate</b></span><br></h2></div>
<div data-element-id="elm_vcQN4VKKdoH9ht-q7JII5Q" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_vcQN4VKKdoH9ht-q7JII5Q"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><div style="color:inherit;"><p><span style="font-size:16px;">This is a bit strange to say, but English is not a good source language to translate from.&nbsp; A lot of its grammar is vague or ambiguous.&nbsp; For example, it often recycles words to mean something else.&nbsp; Well known examples include “A good pharmacist dispenses with accuracy”. “Dispense” and “dispense with” are very different concepts that use the same words.&nbsp; It is obvious to humans from context, but how is a computer to know?&nbsp; “Bill gave the dog water and Sue the cat food”.&nbsp; Here the conjunction reuses the verb give, but the subject, direct object and indirect object require an educated guess. Is it a single object made up of one noun (cat) describing the other noun (food), or two different objects of the same verb?&nbsp; English does not have declensions and is stingy with the prepositions that other languages use to make such things clear.&nbsp; English is infamous for its sequences of nouns where the reader or computer must determine what describes what.&nbsp; For example, we can figure out “airport long term car park courtesy vehicle pickup point” from our knowledge of airports and car rentals, but can a computer figure out which word describes which other word or group of words enough to translate it?&nbsp; Machine translation would have more luck with the less common but still correct English phrasing “pickup point for the courtesy vehicles of the airport’s long-term car park”.</span></p><p><span style="font-size:16px;"><br></span></p><p><span style="font-size:16px;">But before you decide to write all your original documents in Lithuanian rather than English to improve translation quality, know that it is possible to write in English in a way that is easier to translate.</span></p><p><span>Microsoft publishes some style guides with sections on how to author documents so that machine translation has fewer errors and is easier to understand.&nbsp; You can find them <a href="https://www.microsoft.com/en-us/language/StyleGuides" title="here" target="_blank" rel="">here</a>.</span></p><p style="font-size:11pt;"><br></p><p><span style="font-size:16px;">Style guides for machine translation are similar to style guides for writing for ESL audiences, where some English language constructs that are potentially ambiguous for non-native speakers are avoided.&nbsp; Within the Microsoft style guides are some tips about writing so that machine translation will have higher quality.</span></p><p style="font-size:11pt;"><a href="https://docs.microsoft.com/en-us/style-guide/global-communications/writing-tips"><span style="font-size:16px;">https://docs.microsoft.com/en-us/style-guide/global-communications/writing-tips</span></a></p><p style="font-size:11pt;"><br></p><p><span style="font-size:16px;">Some of the tips seem to have been written at the time of earlier versions of machine translation which had problems that are less common now, but there is no harm in reducing ambiguity.&nbsp; The tips include:</span></p><ul><li><span>Use articles. Does “Empty container” mean “Empty the container” or “The empty container”?&nbsp; Articles (determiners) make it explicit.</span></li><li><span>Reduce chains of modifiers. Instead of &nbsp;“well thought-out Windows migration project plan” say “a project plan to migrate Windows that's well thought out”</span></li><li><span>Keep adjectives and adverbs close to the words they modify. Pay particular attention to the placement of “only”.</span></li><li><span>·Use simple sentence structures. Write sentences that use standard word order (that is, subject + verb + object) whenever possible.</span></li><li><span>Use words ending in –ing carefully. A word ending in –ing can be a verb, an adjective, or a noun. Use the sentence structure and optional words to clarify the role of the –ing word.</span></li><li><span>Use words ending in –ed carefully. A word ending in -ed can be a modifier or part of a verb phrase. Use the sentence structure and optional words to clarify the role of the –ed word.</span></li><ul><li><span>Add an article (a, an, the, this) before or after the –ed word. “They have <b>an</b> added functionality.”</span></li><li><span>Add a form of the verb be. “Configure limits for the backup that are based on the amount of storage space available.”</span></li></ul></ul><p><span style="font-size:16px;"><br></span></p><p><span style="font-size:16px;">I would probably add a few more tips.&nbsp; Be careful with the scoping of negation and of adjectives/adverbs (often a word can be one or the other).&nbsp; Keep the negation or the adjective close to what it modifies, whether a word, a noun phrase, a clause or the verb, and try to phrase it so it is not easily mistaken.&nbsp; An example of negation scoping is “All that glitters is not gold”.&nbsp; Is it saying that everything that glitters is made of something other than gold?&nbsp; It might, but it is probably asserting that the set of all items that glitter does not coincide with the set of gold items. It depends on whether it means {all that glitters} is-not gold, or {all that glitters} is not-gold.&nbsp; The “not” could be associated with “is” or with “gold”.&nbsp; I am tempted to wander into a digression into predicate logic or set theory, but that would take us away from the scoping problem.&nbsp; Not all who wander are lost 😉</span></p><p><span style="font-size:16px;"><br></span></p><p><span style="font-size:16px;">Also be careful with words that can have several senses.&nbsp; Instead look for a different word with fewer senses.</span></p></div></div></div>
</div><div data-element-id="elm_nLGIKkQiuryf0KgItbzUtw" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_nLGIKkQiuryf0KgItbzUtw"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><p style="font-size:11pt;"><b><span style="font-size:16px;font-family:&quot;Work Sans&quot;;color:rgb(11, 27, 45);">Cultural differences</span></b></p></div></h2></div>
<div data-element-id="elm_sb6NuRzw4hxPJlZcs4IvTw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_sb6NuRzw4hxPJlZcs4IvTw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><div style="color:inherit;"><p><span style="font-size:16px;">Some translation issues are not easily solved with these strategies.&nbsp; This is because in addition to communicating across languages, you may be communicating across cultures.</span></p><p><span style="font-size:16px;"><br></span></p><p><span style="font-size:16px;">There are translations that correct but are hard to understand or insensitive for cultural reasons.&nbsp; Some of them you can change before translation takes place, others can be caught during post-editing, but there is another alternative.</span></p><p><span style="font-size:16px;">For example, if you talk about hitting a homerun, or about pulling the trigger, or call two things kissing cousins, these are cultural references that may be common in your country but may be hard to understand elsewhere even with a correct translation.&nbsp; More recent translation engines are better at common idioms, but it’s still risky.&nbsp; And it’s not just American English that draws from its culture.&nbsp; For example, Japanese has a term “mikka bouzu” meaning “three-day monk”.&nbsp; Some knowledge of Buddhism and Japanese culture is required to understand this is someone who gives up too easily.&nbsp; German has a word “Deppenleerzeichen” meaning “Idiot’s space”, a derogatory term for putting spaces between words that the German language normally sticks together in a large compound word.&nbsp; Some knowledge of German syntax is required.</span></p><p><span style="font-size:16px;"><br></span></p><p><span style="font-size:16px;">Some more difficult cultural issues are better caught in post-editing.&nbsp; Some correct translations are to be avoided because they sound like slogans used by political parties or extremist groups.&nbsp; Catch me some day in person and I might tell you about some big yikes.</span></p><p><span style="font-size:16px;"><br></span></p><p><span style="font-size:16px;">Another issue has to do with tone.&nbsp; A perfectly correct translation of communication written in German may sound inappropriate or even rude when translated to Japanese.&nbsp; That is because the German culture is comfortable with directness while Japanese employees are more comfortable with more indirect phrasing.&nbsp; Some re-training of the model using professionally translated text where the tone of business communication has been adjusted to be culturally appropriate is useful.&nbsp; However in some cultures such as Korean, verb forms and vocabulary depend on relative status of the writer and the reader, whether peer or subordinate, or on age difference.&nbsp; This is called honorifics or register.&nbsp; Similarly, many languages have differences based on the gender of the person being addressed or of the person who is speaking or writing.&nbsp; This is context that the computer does not have, so its&nbsp; phrasing may violate cultural norms.</span></p><p><span style="font-size:16px;"><br></span></p><p><span style="font-size:16px;">When it comes to these cultural norms for things like tone, post-editing is the only way to fix these errors but correcting these type of errors can be time consuming, inconsistent with the “limited editing”&nbsp; that was recommended.&nbsp; Another alternative is to educate the readers of the translations about the cultural differences and explain that messages from that country may be expressed differently based on their different culture, and that no offense is intended.&nbsp; Educating the readers about cultural differences may be more cost effective than adapting all translations to the different cultural context.</span></p></div></div></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Thu, 04 Aug 2022 14:03:43 -0400</pubDate></item><item><title><![CDATA[The end of the Machine Translation Service in SharePoint Online ]]></title><link>https://blog.icefire.ca/blogs/post/machine-translation-service-end-sharepoint</link><description><![CDATA[<img align="left" hspace="5" src="https://blog.icefire.ca/SharePoint-variations.png"/>Microsoft is stopping the machine translation service for SharePoint Online. Here's what will stop working and what to do about it]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_NcZ2kw5aQD26WQHyPoXx9Q" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_LdWyJS8bReG0orREjVtdzg" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"> [data-element-id="elm_LdWyJS8bReG0orREjVtdzg"].zprow{ border-radius:1px; } </style><div data-element-id="elm_OP6NvOPXQrKjwXkJsb3X7g" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"> [data-element-id="elm_OP6NvOPXQrKjwXkJsb3X7g"].zpelem-col{ border-radius:1px; } </style><div data-element-id="elm_Pl218yj6Qfm1g-aoOrJwRA" data-element-type="text" class="zpelement zpelem-text "><style></style><div class="zptext zptext-align-center " data-editor="true"><div style="color:inherit;"><p style="text-align:left;"><span style="font-size:16px;">Microsoft has announced that the Machine Translation Service (MTS) in SharePoint Online will stop working at the end of July 2022.&nbsp; Are you using it, and if so what can you do about it, to prepare for its retirement?</span></p></div></div>
</div><div data-element-id="elm_pqCSr0Kk1y8gPe_TZ9zF1g" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_pqCSr0Kk1y8gPe_TZ9zF1g"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><h1><span style="font-size:20px;font-family:&quot;Work Sans&quot;;font-weight:700;color:rgb(0, 0, 0);">What is the Machine Translation Service?</span></h1></div></h2></div>
<div data-element-id="elm_Jok5RLlkXmsXuzLTXifvnA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_Jok5RLlkXmsXuzLTXifvnA"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p><span style="font-size:16px;">MTS is a back-end SharePoint service application that works in SharePoint Online and in SharePoint 2013, 2016, 2019, and SE on premise.&nbsp; It is able to carry out machine translation of several different file types including Office files and html pages, and of plain text.&nbsp; Because it is built into SharePoint, using it is free.&nbsp; Since it is a back-end service, you have to look at the different front-end functionality that may be using it.</span></p></div></div>
</div><div data-element-id="elm_7spNKQw-MaH8zNF01yZ1ow" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_7spNKQw-MaH8zNF01yZ1ow"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><h1><span style="font-size:20px;font-family:&quot;Work Sans&quot;;font-weight:700;color:rgb(0, 0, 0);">Where is it available in SharePoint?</span></h1></div></h2></div>
<div data-element-id="elm_x9QYTOVmGSDLgUP3dH5tdA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_x9QYTOVmGSDLgUP3dH5tdA"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p><span style="font-size:16px;">There are three places where the Machine Translation Service is available for you to use in SharePoint: translation of Variations, translation of Term Sets, and translation via the API or PowerShell.&nbsp; Let’s look at these one by one.</span></p></div></div>
</div><div data-element-id="elm_arS8vTTB-bMYVkyPU9usQA" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_arS8vTTB-bMYVkyPU9usQA"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><span style="font-family:&quot;Work Sans&quot;;font-size:20px;color:rgb(0, 0, 0);font-weight:500;">1. Translation of Variations</span><br></h2></div>
<div data-element-id="elm_EjI4H2XFDnOdeeU4TwIS7w" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_EjI4H2XFDnOdeeU4TwIS7w"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p><span style="font-size:16px;">If you have been using Variations on Classic sites for a while, you may realize that it used to be possible to set up machine translation for a Variation label, which could machine translate pages either automatically or on demand.&nbsp; This uses MTS as a back-end.&nbsp; In September 2018, the user interface to machine translate Variation labels or classic pages was removed from SharePoint Online.&nbsp; The functionality itself persisted, that is to say if you had previously set up a Variation label with automatic translation before then, MTS continued translating any new pages automatically, but you simply couldn’t set it up for any new Variation labels.</span></p></div></div>
</div><div data-element-id="elm_UfDb8MFKNChiJulB7GiLaw" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_UfDb8MFKNChiJulB7GiLaw"] .zpimage-container figure img { width: 372px !important ; height: 124px !important ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_UfDb8MFKNChiJulB7GiLaw"] .zpimage-container figure img { width:372px ; height:124px ; } } @media (max-width: 767px) { [data-element-id="elm_UfDb8MFKNChiJulB7GiLaw"] .zpimage-container figure img { width:372px ; height:124px ; } } [data-element-id="elm_UfDb8MFKNChiJulB7GiLaw"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="left" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-left zpimage-size-original zpimage-tablet-fallback-original zpimage-mobile-fallback-original hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/SharePoint-variations.png" width="372" height="124" loading="lazy" size="original" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_Z3gozh4NXF_pw9hlGDdHDw" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_Z3gozh4NXF_pw9hlGDdHDw"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><h2><span style="font-family:&quot;Work Sans&quot;;font-size:20px;font-weight:500;color:rgb(0, 0, 0);">2. Translation of Term Sets</span></h2></div></h2></div>
<div data-element-id="elm_RNzQOijfT0VhN9j68dbGLA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_RNzQOijfT0VhN9j68dbGLA"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p style="font-size:11pt;">If you have created a term set in the term store, and you have set it up to have working languages in addition to the default language, then you have the option to machine translate the terms in a term set on demand, using the MTS.</p><p style="font-size:11pt;"><br></p><p style="font-size:11pt;">In the SharePoint Admin Centre, under Content Services, select “Term store”</p></div></div>
</div><div data-element-id="elm_HmzjwGcGUEuJfmP5UZ0GhQ" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_HmzjwGcGUEuJfmP5UZ0GhQ"] .zpimage-container figure img { width: 272px !important ; height: 108px !important ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_HmzjwGcGUEuJfmP5UZ0GhQ"] .zpimage-container figure img { width:272px ; height:108px ; } } @media (max-width: 767px) { [data-element-id="elm_HmzjwGcGUEuJfmP5UZ0GhQ"] .zpimage-container figure img { width:272px ; height:108px ; } } [data-element-id="elm_HmzjwGcGUEuJfmP5UZ0GhQ"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="left" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-left zpimage-size-original zpimage-tablet-fallback-original zpimage-mobile-fallback-original hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/SharePoint-term-store.png" width="272" height="108" loading="lazy" size="original" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_PZw0WpPpvWb3wpjLry_T4w" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_PZw0WpPpvWb3wpjLry_T4w"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p style="font-size:11pt;">At any point, you can add working languages to a term store.</p></div></div>
</div><div data-element-id="elm_ezyulvm-SfGbEennuOvDXg" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_ezyulvm-SfGbEennuOvDXg"] .zpimage-container figure img { width: 624px !important ; height: 452px !important ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_ezyulvm-SfGbEennuOvDXg"] .zpimage-container figure img { width:624px ; height:452px ; } } @media (max-width: 767px) { [data-element-id="elm_ezyulvm-SfGbEennuOvDXg"] .zpimage-container figure img { width:624px ; height:452px ; } } [data-element-id="elm_ezyulvm-SfGbEennuOvDXg"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="left" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-left zpimage-size-original zpimage-tablet-fallback-original zpimage-mobile-fallback-original hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/SharePoint-Term-store1.png" width="624" height="452" loading="lazy" size="original" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_UfMUTQ9uicxV1WPMSaK8ug" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_UfMUTQ9uicxV1WPMSaK8ug"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p><span style="font-size:16px;">Inside the term store, individual term sets provide the option to create translations to the other working languages.&nbsp; Under Translation, select Manage</span></p></div></div>
</div><div data-element-id="elm_qWGzTNOgU2SXFDrJIOxZZA" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_qWGzTNOgU2SXFDrJIOxZZA"] .zpimage-container figure img { width: 624px !important ; height: 442px !important ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_qWGzTNOgU2SXFDrJIOxZZA"] .zpimage-container figure img { width:624px ; height:442px ; } } @media (max-width: 767px) { [data-element-id="elm_qWGzTNOgU2SXFDrJIOxZZA"] .zpimage-container figure img { width:624px ; height:442px ; } } [data-element-id="elm_qWGzTNOgU2SXFDrJIOxZZA"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="left" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-left zpimage-size-original zpimage-tablet-fallback-original zpimage-mobile-fallback-original hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/Sharepoint-term-store2.png" width="624" height="442" loading="lazy" size="original" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_ytCMOLWdZfhQR8ZICSTlMw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_ytCMOLWdZfhQR8ZICSTlMw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p><span style="font-size:16px;">This will bring up the translation options.</span></p></div></div>
</div><div data-element-id="elm_RPV8ez56Vab8EZ7tC1aqJA" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_RPV8ez56Vab8EZ7tC1aqJA"] .zpimage-container figure img { width: 588px !important ; height: 418px !important ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_RPV8ez56Vab8EZ7tC1aqJA"] .zpimage-container figure img { width:588px ; height:418px ; } } @media (max-width: 767px) { [data-element-id="elm_RPV8ez56Vab8EZ7tC1aqJA"] .zpimage-container figure img { width:588px ; height:418px ; } } [data-element-id="elm_RPV8ez56Vab8EZ7tC1aqJA"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="left" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-left zpimage-size-original zpimage-tablet-fallback-original zpimage-mobile-fallback-original hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/SharePoint-translation-options.png" width="588" height="418" loading="lazy" size="original" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm__h7pEd3XEOx7L0c751tk5g" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm__h7pEd3XEOx7L0c751tk5g"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p><span style="font-size:16px;">One of the options is Machine translation.&nbsp; Press Start.&nbsp; You will then have to select one language to translate to.&nbsp; You can come back later and choose the other languages one by one.</span></p></div></div>
</div><div data-element-id="elm_T1208Ghu89AKx4LV1Dicsw" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_T1208Ghu89AKx4LV1Dicsw"] .zpimage-container figure img { width: 380px !important ; height: 386px !important ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_T1208Ghu89AKx4LV1Dicsw"] .zpimage-container figure img { width:380px ; height:386px ; } } @media (max-width: 767px) { [data-element-id="elm_T1208Ghu89AKx4LV1Dicsw"] .zpimage-container figure img { width:380px ; height:386px ; } } [data-element-id="elm_T1208Ghu89AKx4LV1Dicsw"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="left" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-left zpimage-size-original zpimage-tablet-fallback-original zpimage-mobile-fallback-original hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/SharePoint-machine-translation.png" width="380" height="386" loading="lazy" size="original" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_SGeuXrjDxFb1liVqUvaKEA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_SGeuXrjDxFb1liVqUvaKEA"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p><span style="font-size:16px;">Translation is often quite slow.&nbsp; It is using the MTS in the back-end.&nbsp; You may have to wait a while and if you are the first person to use it that day, it might time out before the service has a chance to fully start up and you may have to start again.</span></p></div></div>
</div><div data-element-id="elm_7sknmqdKIJAwjoH3Kvbn1g" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_7sknmqdKIJAwjoH3Kvbn1g"] .zpimage-container figure img { width: 304px !important ; height: 156px !important ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_7sknmqdKIJAwjoH3Kvbn1g"] .zpimage-container figure img { width:304px ; height:156px ; } } @media (max-width: 767px) { [data-element-id="elm_7sknmqdKIJAwjoH3Kvbn1g"] .zpimage-container figure img { width:304px ; height:156px ; } } [data-element-id="elm_7sknmqdKIJAwjoH3Kvbn1g"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="left" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-left zpimage-size-original zpimage-tablet-fallback-original zpimage-mobile-fallback-original hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/SharePoint-machine-translation2.png" width="304" height="156" loading="lazy" size="original" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_YhE5fvBfXNyYhh3cktSMKA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_YhE5fvBfXNyYhh3cktSMKA"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p style="font-size:11pt;">Once the term set is translated, the text will be fully translated to the other languages and the translated version of the term site is (often) what will be shown to the user based on their language.</p></div></div>
</div><div data-element-id="elm_YcYIOxzgEZV8OXdxJQLgtA" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_YcYIOxzgEZV8OXdxJQLgtA"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><h2><span style="font-family:&quot;Work Sans&quot;;font-size:20px;font-weight:500;color:rgb(0, 0, 0);">3. Translation via the API or PowerShell</span></h2></div></h2></div>
<div data-element-id="elm_dy1Q430wqoJiNUgL4qK7eA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_dy1Q430wqoJiNUgL4qK7eA"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p><span style="font-size:16px;">Without going into too much detail, the translation capability of the MTS is available via CSOM or REST.&nbsp; That means it can be called from C#, JavaScript, or PowerShell.&nbsp; The MTS API lets you translate either a short text or a supported document or even an entire folder or library, either using a stream or a SharePoint file URL.&nbsp; In the back-end it is using MTS.&nbsp; It is possible that one of the customizations or products that you use are calling this API.&nbsp; If you have custom code or apps that generate some translation, there are good chances that it is using the API to call MTS.</span></p></div></div>
</div><div data-element-id="elm_18BjRiIWeh_3MvI2uAqviw" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_18BjRiIWeh_3MvI2uAqviw"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><h1><span style="font-size:20px;font-family:&quot;Work Sans&quot;;font-weight:700;color:rgb(0, 0, 0);">What will no longer work?</span></h1></div></h2></div>
<div data-element-id="elm_NhywRpuiZ6JzWHQ-5LyVBQ" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_NhywRpuiZ6JzWHQ-5LyVBQ"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><div style="color:inherit;"><div style="color:inherit;"><p><span>On March 7, 2022, Microsoft announced that the Machine Translation Service in SharePoint Online will be retired at the end of July 2022.&nbsp; Existing automated translation in Variations will stop working and the APIs will be retired.&nbsp; Calls to the APIs will result in an error.&nbsp; The full announcement is here:</span></p><p><span><a href="https://devblogs.microsoft.com/microsoft365dev/end-of-service-for-sharepoint-online-machine-translation-service-and-apis/" title="https://devblogs.microsoft.com/microsoft365dev/end-of-service-for-sharepoint-online-machine-translation-service-and-apis/" rel="">https://devblogs.microsoft.com/microsoft365dev/end-of-service-for-sharepoint-online-machine-translation-service-and-apis/</a></span></p><p><br></p><p><span>This means that in August 2022, existing translation of Variations, which had survived the 2018 deprecation, will stop working.&nbsp; New pages will be copied to the Variation label but not translated.&nbsp; Any customizations or apps that you use which use the MTS API will start returning errors.</span></p><p><span><br></span></p><p><span>The announcement is not explicit that the machine translation of term sets will also be discontinued, but even if it is not retired in July you should be prepared to see it disappear soon like all other uses of the Machine Translation Service.&nbsp; If it is discontinued, then term sets that have already been translated will stay translated, but any new terms or term sets will have to be translated manually by humans.</span></p></div></div></div></div>
</div><div data-element-id="elm__toYXtkVYXAveoDrSg-IyQ" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm__toYXtkVYXAveoDrSg-IyQ"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><h1><span style="font-family:&quot;Work Sans&quot;;font-size:20px;font-weight:700;color:rgb(0, 0, 0);">What to do about it</span></h1></div></h2></div>
<div data-element-id="elm_1OSobI8iUQtowW0nV2B_Fw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_1OSobI8iUQtowW0nV2B_Fw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><div style="color:inherit;"><div style="color:inherit;"><p><span style="font-size:16px;">Maybe it doesn’t apply to you and no action is required.</span></p><ul><li>If you don’t use the machine translation features described above nor any customization that uses the Machine Translation Service, don’t worry about it.&nbsp; It’s not the ability of Variations to display classic pages in different languages that is disappearing, it is the machine translation of those pages, a feature that you have not been able to set up for nearly 4 years.&nbsp; Similarly, it’s not the ability to display term set terms in different languages that is disappearing, it is the machine translation of those terms.</li></ul><ul><li>If you are using SharePoint on premise, the announcement does not affect you, it is only for SharePoint Online. MTS will continue to work on premise.</li></ul><ul><li>If you are using the multilingual page publishing feature of modern communication sites, you will similarly not be affected since SharePoint never offered machine translation for those pages in the first place.</li></ul><p><span><br></span></p><p><span>If you <u>are</u> using one of those features, then you will either start translating manually of find a different alternative.</span></p><p><span>The announcement suggests that you can use modern communication sites or Azure translation APIs as an alternative.&nbsp; Are they alternatives to the loss of machine translation?&nbsp; Modern communication sites do not offer machine translation at all, and Azure translation APIs do not support any modern pages.&nbsp; In fact, the majority of types of classic pages are also not supported by Azure translation APIs.&nbsp; For document translation, Azure Translation APIs support roughly the same document types as MTS, but they are not a simple replacement for the MTS API.&nbsp; They do not support streams or SharePoint library URLs.&nbsp; Instead, they support text strings of limited length or Azure blob storage containers.&nbsp; That means that some extra complexity and security considerations if you are trying to port the code yourself, while the MTS API call was often a single line of code.</span></p><p><span><br></span></p><p><span>If you are using the MTS API to translate short lines of plain text rather than documents, then the Azure Translation APIs are a viable alternative.&nbsp; It will take some re-coding but a reasonable amount of it, since&nbsp; the Azure API for translating short strings is much simpler than the API for translating documents.</span></p><p><span><br></span></p><p><span>If you use PointFire products to translate your pages, documents, or user interface, then little will change.&nbsp; The <u>free</u> version of the app will need some additional configuration in order to use the Azure Translation API, making it a bit less free to you (you will have to pay Microsoft about $15 USD per million characters, plus a few cents for the Azure storage operations). The <u>paid</u> version of PointFire Translator has been using the Azure Translation API all along, and it has been parsing SharePoint documents and pages to extract strings so that you don’t need to use Azure blob storage.&nbsp; No change is required.</span></p><p><span><br></span></p><p>Additionally, <a href="http://pointfire.com" title="PointFire" rel="">PointFire</a> already supports not only Classic pages like Variations but also Communication sites, whether they use Multilingual Page Publishing feature or not, and in fact every type of page, modern or classic, on any SharePoint site, as well as documents, lists, and metadata.</p></div>
</div></div></div></div><div data-element-id="elm_VaUz0n5O9TUt92jkPB7YZw" data-element-type="divider" class="zpelement zpelem-divider "><style type="text/css"> [data-element-id="elm_VaUz0n5O9TUt92jkPB7YZw"].zpelem-divider{ border-radius:1px; } </style><style></style><div class="zpdivider-container zpdivider-line zpdivider-align-center zpdivider-width100 zpdivider-line-style-solid "><div class="zpdivider-common"></div>
</div></div><div data-element-id="elm_c1b6uSAWIOYYtpw4R5VjKw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_c1b6uSAWIOYYtpw4R5VjKw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><p><span style="color:inherit;">If you want to know how this end of service affects you, talk to us for a free consultation and absolutely no commitment or need to use PointFire! We'll help you figure out your next move.</span><br></p></div>
</div><div data-element-id="elm_KwjvEtLLPvgm1ttLEGTeDw" data-element-type="button" class="zpelement zpelem-button "><style> [data-element-id="elm_KwjvEtLLPvgm1ttLEGTeDw"].zpelem-button{ font-family:'Work Sans'; font-size:16px; font-weight:500; border-radius:1px; } </style><div class="zpbutton-container zpbutton-align-left "><style type="text/css"> [data-element-id="elm_KwjvEtLLPvgm1ttLEGTeDw"] .zpbutton.zpbutton-type-primary{ background-color:#3C9BF3 !important; font-family:'Work Sans'; font-size:16px; font-weight:500; border-radius:100px; } </style><a class="zpbutton-wrapper zpbutton zpbutton-type-primary zpbutton-size-md zpbutton-style-none " href="mailto:sales@icefire.ca?subject=End%20of%20SharePoint%20Machine%20Translation"><span class="zpbutton-content">Let's talk</span></a></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Fri, 06 May 2022 11:35:31 -0400</pubDate></item><item><title><![CDATA[Inuktitut machine translation is here. Why that's a big deal.]]></title><link>https://blog.icefire.ca/blogs/post/inuktitut-machine-translation-is-here-why-that-s-a-big-deal1</link><description><![CDATA[<img align="left" hspace="5" src="https://blog.icefire.ca/INUKTITUT-1.png"/>Azure Translator Text now supports the Inuktituk language spoken in the Inuit area in the far north of North America.]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm__uaIH0jcQxSnYhilZfwi7w" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_vZmFG8xlSwSw0UcWb4khyQ" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"></style><div data-element-id="elm_w1VhFEDrQ_ShtiBhuJ438A" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"></style><div data-element-id="elm_9ibTL2zXQt2JnqYYoa8gpw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_9ibTL2zXQt2JnqYYoa8gpw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-center " data-editor="true"><p style="text-align:left;"><span style="color:inherit;font-size:16px;">Azure Translator Text now&nbsp;<a href="https://news.microsoft.com/en-ca/2021/01/27/microsoft-introduces-inuktitut-to-microsoft-translator/">supports the Inuktituk language</a>&nbsp;spoken in the Inuit area in the far north of North America.</span></p><div style="text-align:left;font-size:14px;"><br></div><p><span style="color:inherit;font-size:16px;"></span></p><div style="text-align:left;"><span style="font-size:16px;">Over the years, I've received a lot of requests to provide machine translation for Inuktitut.&nbsp; Despite the tools that are available, particularly from Microsoft, to&nbsp;<a href="https://portal.customtranslator.azure.ai/">train your own neural translation engine&nbsp;</a>for an unsupported language using a corpus of translated documents, and a great&nbsp;<a href="https://www.aclweb.org/anthology/2020.lrec-1.312/">bilingual corpus</a>&nbsp;from the debates of the&nbsp;<a href="http://www.inuktitutcomputing.ca/NunavutHansard/info.php?lang=en">Nunanut legislative assembly</a>, I knew that this would not be possible.&nbsp; Other machine translation experts also agreed that it was beyond the state of the art.&nbsp; On a couple of occasions I had applied for funding to push beyond the state of the art to make this possible, unsuccessfully.&nbsp; Why is machine translation of Inuktitut so difficult?</span></div></div>
</div><div data-element-id="elm__jZB9Zi9F3KKYkuoh9q-3w" data-element-type="image" class="zpelement zpelem-image "><style> [data-element-id="elm__jZB9Zi9F3KKYkuoh9q-3w"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="size-original" data-size-mobile="size-original" data-align="left" data-tablet-image-separate="" data-mobile-image-separate="" class="zpimage-container zpimage-align-left zpimage-size-fit zpimage-tablet-fallback-fit zpimage-mobile-fallback-fit hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/INUKTITUT-1.png" size="fit" data-lightbox="true" style="width:100%;padding:0px;margin:0px;"/></picture></span></figure></div>
</div><div data-element-id="elm_yEkm9WpDQhOR5O6BF7rYJA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_yEkm9WpDQhOR5O6BF7rYJA"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="font-size:14px;"><div><span style="font-size:16px;">Inuktitut belongs to a class called &quot;polysynthetic languages&quot;.&nbsp; Most of the languages that you know are probably &quot;agglutinative&quot;&nbsp; There are some root words which can be modified by changing the beginning or the end of the word.&nbsp; The root word is in the dictionary, but the words that are modified by adding or changing suffixes and prefixes are typically not, because eveyone knows the rules.&nbsp; These agglutinative languages are part of a larger class called synthetic languages, which includes other simple rules for sticking words together, usually with a small set of rules that apply to one part of speech.&nbsp; For example German can stick a lot of known nouns end-to-end to make a new word, but there is one root word and all the other words are modifying or narrowing down the sense one of the words, and the resulting word behaves like a longer version of the base word, and has the same part of speech.</span></div><div><br></div><div><span style="font-size:16px;">Inuktitut is polysynthetic.&nbsp; The combination rules are much more complex.&nbsp; There can be several root&nbsp;concepts and root&nbsp;words, and it can lose its part of speech or change it because the full word is an entire sentence with subject object, verb, adjective, even subordinate clauses, all contained in a big compound word. How you join words together can vary using complex rules about what comes before and after the join.&nbsp; A well known example is the word &quot;ᖃᖓᑕᓲᒃᑯᕕᒻᒨᕆᐊᖃᓛᖅᑐᖓ&quot; which means &quot;I'll have to go to the airport&quot;.&nbsp; Verbs, nouns, subject, object, they're all contained in the same word.</span></div><div><br></div><div><span style="font-size:16px;">Not all but most native American languages are polysynthetic.&nbsp; Unlike other languages, neural networds can't just have a dictionary and some rules and train the translation engine to see patterns of three or four words in a row that always translate to the same 3 or 4 words in another language.&nbsp; Almost all the neural translation engines I have seen are word-based.&nbsp; There are languages that are written without spaces, like Chinese, Japanese, Thai,&nbsp;and Korean, but they still have individual words and breaking them up into individual words is relatively simple.&nbsp; Not so with polysynthetic languages.</span></div><div><br></div><div><span style="font-size:16px;">I don't see any information about how Microsoft tackled the problem for Inuktitut.&nbsp; I am assuming that they used a tool to break down words into morphemes.&nbsp; What I would have used but didn't get funding for was the National Research Council's&nbsp;<a href="http://www.inuktitutcomputing.ca/Uqailaut/info.php">Uqailaut Inuktitut Morphological Analyzer</a>, but I don't know whether Microsoft did something similar.&nbsp; I am watching for any publications about it. There have been&nbsp;<a href="https://www.clsp.jhu.edu/workshops/19-workshop/neural-polysynthetic-language-modeling-leveraging-related-low-resource-languages-and-rule-based-resources/">some advances lately</a>&nbsp;in&nbsp;<a href="https://github.com/neural-polysynthetic-language-modelling">modeling and translating these languages</a>, so that is not the only approach.</span></div><div><br><span style="font-size:16px;">On the other hand, perhaps they trained a neural network to decompose into morphemes and vice-versa without a standalone processor.&nbsp; If that's the case, then the same techniques could be used for various other widespread but hard to translate polysynthetic languages from the Algonquian language family like Cree and Ojibwe, or Iroquoian languages like Mohawk, or Athabascan langauges like Dene or Navajo, or Siouan languages like Dakotan.&nbsp; It's a game changer.</span></div><div><br></div><p><span style="color:inherit;font-size:16px;"></span></p><div><span style="font-size:16px;">Oh, and if you were curious, PointFire Translator now supports translation to and from Inuktitut on SharePoint sites, just use language code &quot;iu&quot;.&nbsp; Your browser should already support Canadian Aboriginal Syllabics.</span></div></div></div>
</div><div data-element-id="elm_g1dAcLKU5Y6LQBEiqtUzMw" data-element-type="divider" class="zpelement zpelem-divider "><style type="text/css"> [data-element-id="elm_g1dAcLKU5Y6LQBEiqtUzMw"].zpelem-divider{ border-radius:1px; } </style><style></style><div class="zpdivider-container zpdivider-line zpdivider-align-center zpdivider-width100 zpdivider-line-style-solid "><div class="zpdivider-common"></div>
</div></div><div data-element-id="elm_eKoz0-Mq42FM9P3nm8VdFQ" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_eKoz0-Mq42FM9P3nm8VdFQ"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><span style="font-size:16px;">Related Posts</span></h2></div>
<div data-element-id="elm_h1v9YM8MfXtbOBqesqlpTQ" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_h1v9YM8MfXtbOBqesqlpTQ"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><p><span style="color:inherit;"><a href="https://blog.icefire.ca/blogs/post/Machine-Translation-in-PointFire-20131" target="_blank" rel="">Machine Translation in PointFire 2013</a></span><br></p><p><span style="color:inherit;"><a href="https://blog.icefire.ca/blogs/post/Why-Do-We-Need-PointFire-for-Multilingual-Collaboration-in-SharePoint-Part-II" title="Why Do We Need PointFire for Multilingual Collaboration in SharePoint? Part II" target="_blank" rel="">Why Do We Need PointFire for Multilingual Collaboration in SharePoint? Part II</a></span><a href="https://blog.icefire.ca/blogs/post/Why-Do-We-Need-PointFire-for-Multilingual-Collaboration-in-SharePoint-Part-II" title="Why Do We Need PointFire for Multilingual Collaboration in SharePoint? Part II" target="_blank" rel=""><br></a></p><p><span style="color:inherit;"><a href="https://blog.icefire.ca/blogs/post/Why-Do-We-Need-PointFire-for-Multilingual-Collaboration-in-SharePoint-Part-III" target="_blank" rel="">Why Do We Need PointFire for Multilingual Collaboration in SharePoint? Part III</a></span></p></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Fri, 05 Feb 2021 23:02:00 -0500</pubDate></item></channel></rss>