<?xml version="1.0" encoding="UTF-8" ?><!-- generator=Zoho Sites --><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><atom:link href="https://blog.icefire.ca/blogs/posts/feed" rel="self" type="application/rss+xml"/><title>The PointFire Blog - The PointFire Blog for Multilingual SharePoint , Blog</title><description>The PointFire Blog - The PointFire Blog for Multilingual SharePoint , Blog</description><link>https://blog.icefire.ca/blogs/posts</link><lastBuildDate>Wed, 16 Jul 2025 16:01:40 -0700</lastBuildDate><generator>http://zoho.com/sites/</generator><item><title><![CDATA[Is GPT better at translating than translation engines?]]></title><link>https://blog.icefire.ca/blogs/post/is-gpt-better-at-translating-than-translation-engines</link><description><![CDATA[<img align="left" hspace="5" src="https://blog.icefire.ca/languages.png"/>Generative Pre-trained Transformer (GPT) systems are not designed to be translation engines. So it is surprising that they succeed so well at doing simple translations. Some articles have claimed that they can translate better than existing translation engines. How true are those claims?]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_tDbLseMrQ1atvh_reNkSWg" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_UVV3AohDQdGTvwDADeeliw" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"> [data-element-id="elm_UVV3AohDQdGTvwDADeeliw"].zprow{ border-radius:1px; } </style><div data-element-id="elm_BTUzPodyT4CO85UyifeZEA" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"> [data-element-id="elm_BTUzPodyT4CO85UyifeZEA"].zpelem-col{ border-radius:1px; } </style><div data-element-id="elm_f3ptpRwjce2pziid40quFg" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_f3ptpRwjce2pziid40quFg"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>Generative Pre-trained Transformer (GPT) systems are not designed to be translation engines.&nbsp; So it is surprising that they succeed so well at doing simple translations.&nbsp; Some articles have claimed that they can translate better than existing translation engines.&nbsp; How true are those claims?</p></div></div>
</div><div data-element-id="elm_wwrYpJJnyynTEBW1VCGNaw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_wwrYpJJnyynTEBW1VCGNaw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><p><span style="color:inherit;">Most of those claims are based on testing with a few sentences chosen by the author, few language pairs, and a qualitative scoring of how good the translation is.&nbsp; However, more systematic evaluations with large samples of more types of text, more languages, and more objective quality scoring by machines and humans tell a different story.</span><br></p></div>
</div><div data-element-id="elm_46LFSwHNzt6gdr8LjAh8NQ" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_46LFSwHNzt6gdr8LjAh8NQ"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>The best comprehensive evaluation was done by scientists at Microsoft Research, unsurprising because they are among the leaders in both machine translation and GPT models.&nbsp; The brief summary is that while GPT models have competitive quality when translating usual sentences from a major (see high-resource below) language to English, they are less good at other types of translation.</p></div></div>
</div><div data-element-id="elm_N7pEJJlKArL38OQeyi1R3Q" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_N7pEJJlKArL38OQeyi1R3Q"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>The evaluation uses three GPT systems that are known for translation quality, and compares them with neural machine translation engines (NMT), either Microsoft Azure API or the best-performing commercial systems or research prototype.&nbsp; Quality scoring uses either algorithmic scoring or human evaluation.</p></div></div>
</div><div data-element-id="elm_hPgLpVu7xvZTX8EVf8QoUg" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_hPgLpVu7xvZTX8EVf8QoUg"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p><a href="https://arxiv.org/pdf/2302.09210.pdf" title="How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation&nbsp;" rel="">How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation</a>&nbsp;<br></p><div><br></div></div></div>
</div><div data-element-id="elm_ivvprsnnnhFmrXoNaiJMfw" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_ivvprsnnnhFmrXoNaiJMfw"].zpelem-heading { border-radius:1px; } </style><h3
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><p>Languages and language direction</p></div></h3></div>
<div data-element-id="elm_OYh0FJo0e1jQwh74F41YlA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_OYh0FJo0e1jQwh74F41YlA"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>In most languages and for language directions, MS Azure Translator and other NMT engines outperform GPT for most measures of quality.&nbsp; However, GPT does have the ability to improve after being given a few examples of correct translations, and to outperform NMT in some language directions after 5 tries.&nbsp; This is the case for translations to English from German, Chinese and Japanese, languages for which there are a lot of examples in the GPT training set.&nbsp; These are called “high-resource” languages.&nbsp; On the other hand, it does not do particularly well for low-resource languages like Czech or Icelandic, or for English to other languages.&nbsp; GPT’s training set had less text in those languages.&nbsp; In the chart below, orange is GPT and blue is NMT.&nbsp; The lines are algorithmic evaluation and the bars are human evaluation.</p></div></div>
</div><div data-element-id="elm_4GnKGtHJ6wbcwAK1NGEjCg" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_4GnKGtHJ6wbcwAK1NGEjCg"] .zpimage-container figure img { width: 624px !important ; height: 398px !important ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_4GnKGtHJ6wbcwAK1NGEjCg"] .zpimage-container figure img { width:624px ; height:398px ; } } @media (max-width: 767px) { [data-element-id="elm_4GnKGtHJ6wbcwAK1NGEjCg"] .zpimage-container figure img { width:624px ; height:398px ; } } [data-element-id="elm_4GnKGtHJ6wbcwAK1NGEjCg"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-original zpimage-tablet-fallback-original zpimage-mobile-fallback-original hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/languages.png" width="624" height="398" loading="lazy" size="original" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_Trbv_VsvOl0UReU8UqOXpA" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_Trbv_VsvOl0UReU8UqOXpA"].zpelem-heading { border-radius:1px; } </style><h3
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><div><p>Sentence-level vs. multi-sentence</p></div></div></h3></div>
<div data-element-id="elm_4PSNPeNPm7jG976XCnkNSQ" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_4PSNPeNPm7jG976XCnkNSQ"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>Those experiments above are for sentence-level translations.&nbsp; For multi-sentence translations, those with more context that can be found in other sentences, GPT improves relative to NMT.&nbsp; Not enough to beat the best NMT systems, but sometimes enough to match or beat the normal Azure API.&nbsp; That is not very surprising: Azure Translator was optimized for sentence-level translation, while GPT is trained for multi-sentence context, up to thousands of words.&nbsp; Other Azure translation APIs like Document Translator and Custom Translator are better at longer context windows, but this is not what was tested here.&nbsp;</p></div></div>
</div><div data-element-id="elm_13pYmoMB8Ri4gFMkS-ersw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_13pYmoMB8Ri4gFMkS-ersw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>Neural Machine Translation models have another big advantage over GPT: they can be re-trained for a particular domain rather than with general domains, and this significantly improves their quality score for that domain.&nbsp; For example, by giving it many training examples of automotive documents, Azure Custom Translator (a re-trainable version of Azure Translator) can increase its translation quality for documents in the automotive domain by a large factor.</p></div></div>
</div><div data-element-id="elm_MiC-NQB9r4oyIjEzaBo6HQ" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_MiC-NQB9r4oyIjEzaBo6HQ"].zpelem-heading { border-radius:1px; } </style><h3
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><div><p>Fluency and alignment</p></div></div></h3></div>
<div data-element-id="elm_OnDZyqsMG88uqB5qjbGWOQ" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_OnDZyqsMG88uqB5qjbGWOQ"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>Looking at other measures of performance gives a better understanding of what is different in the performance of the GPT vs. NMT.&nbsp; Measuring fluency, essentially how natural is the sentence, how similar it is to other sentences out in the world, tells you something about the quality of the prose.&nbsp; GPT is more fluent in English, it sounds more natural.&nbsp; That doesn’t mean that it is a more accurate translation, far from it.&nbsp; In fact, GPT has a greater tendency to add words, concepts, and punctuation that do not correspond to the original, or to omit some.&nbsp; So it’s good prose, but it’s not necessarily what the original said.&nbsp; For example, it does better at figures of speech, by not translating them literally, but also does not necessarily replace it with a term that means exactly the same thing.&nbsp; It does not wander far from the original with completely made-up things, but it’s often not quite correct.&nbsp; However GPT also does hallucinate words or concepts that were not in the original.<br></p><div><div style="color:inherit;"><p><br></p><p><a href="https://arxiv.org/abs/2305.16806" title="Do GPTs Produce Less Literal Translations?" rel="">Do GPTs Produce Less Literal Translations?</a>&nbsp;</p></div></div></div></div>
</div><div data-element-id="elm_em5DmpwzIvEOv7gi25W58w" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_em5DmpwzIvEOv7gi25W58w"] .zpimage-container figure img { width: 463px !important ; height: 127px !important ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_em5DmpwzIvEOv7gi25W58w"] .zpimage-container figure img { width:463px ; height:127px ; } } @media (max-width: 767px) { [data-element-id="elm_em5DmpwzIvEOv7gi25W58w"] .zpimage-container figure img { width:463px ; height:127px ; } } [data-element-id="elm_em5DmpwzIvEOv7gi25W58w"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-original zpimage-tablet-fallback-original zpimage-mobile-fallback-original hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/idiom.png" width="463" height="127" loading="lazy" size="original" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_xBGZxVb8PgHqZB19limLNw" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_xBGZxVb8PgHqZB19limLNw"].zpelem-heading { border-radius:1px; } </style><h3
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><div><p>Translationese</p></div></div></h3></div>
<div data-element-id="elm_2e2-K-qRJowsZrrZcmfFOA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_2e2-K-qRJowsZrrZcmfFOA"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>A literal and faithful translation is sometimes required, but that often leads to what is called “translationese”.&nbsp; This refers to a set of common issues with text generated by human translators.&nbsp; Translationese can refer to excessive precision or wordiness, or excessive vagueness in translated text, or syntax that is uncommon in the target language.&nbsp; What translators are compensating for is the fact that different languages are specific about different things.&nbsp; For example, English has the term “uncle”, which does not differentiate between paternal and maternal uncle or uncles by blood or marriage, but many other languages are much more specific.&nbsp; Translating from those other languages to English, Translationese would not say “uncle” but might say “maternal uncle by marriage”, a term that is unusual in English, but which avoids losing information that was contained in the original.&nbsp; In terms of translation quality, humans who are not translators might rate the translation with “uncle” higher because it sounds more natural, but translators would rate the awkward translation higher because it is more accurate.</p><p><br></p><p><span style="color:inherit;"><a href="https://arxiv.org/abs/2104.07623" title="Sometimes We Want Translationese" rel="">Sometimes We Want Translationese</a></span><br></p></div></div>
</div><div data-element-id="elm_1v3UqmUWI0PY_kDgHNpFEg" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_1v3UqmUWI0PY_kDgHNpFEg"].zpelem-heading { border-radius:1px; } </style><h3
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><div><p>Design and training of GPT and NMT</p></div></div></h3></div>
<div data-element-id="elm_TrxEYA97NeXggnNFOH0fDw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_TrxEYA97NeXggnNFOH0fDw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><div style="color:inherit;"><p>There are big differences between the texts used to train GPT engines and the texts used to train NMT engines.&nbsp; GPT engines are trained on unilingual text found on the internet, mostly in English.&nbsp; For any sequence of words, GPT learns the most likely next word.&nbsp; NMT engines are trained on curated professionally translated sentences, pairs of original sentences and their translations.&nbsp; For all the curation, these data sets are often noisy and include incorrect translations that set back the training.&nbsp; For any sentence within a document in the source language, NMT predicts the translated sentence.&nbsp; This is part of the reason why NMT learns to produce translationese and GPT does not: it’s in the training set.</p></div></div></div>
</div><div data-element-id="elm_C4pvHsnS1vHjjdqyrrTqOw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_C4pvHsnS1vHjjdqyrrTqOw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>The design of the two types of models is also different. The “T” in GPT stands for “Transformer”.&nbsp; Transformer is an attention-based neural network model that when looking at a word within a sentence or even longer text, determines which other words are the most relevant ones to pay attention to.&nbsp; NMT also uses Transformer models.&nbsp; However, there are big differences.&nbsp; One is that GPT uses Decoder models, while NMT uses Encoder-decoder models.&nbsp; What does that mean?&nbsp; Decoder models focus on the output, the next word to be spit out.&nbsp; Encoder-decoder models try to extract features from the input before feeding it to the part of the model that predicts the output.&nbsp; It focuses separately on the input and on the output.&nbsp; It tries to be robust to small changes in the input.</p></div></div>
</div><div data-element-id="elm_ZFFoET6GwZJ40lJm9kkBrA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_ZFFoET6GwZJ40lJm9kkBrA"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>GPT only outputs one word at a time.&nbsp; It starts with the text of the prompt plus other information in the input, then outputs a single word.&nbsp; Then it adds that word to the end of the prompt in the input and puts this new input through again to get the next word.&nbsp; It is unidirectional, that is to say when it is generating text it only looks at the previous words that are already generated, it doesn’t consider what it will say next because it hasn’t said it yet. Like a lot of humans, GPT is more concerned with what it wants to say next than with what you’re saying. &nbsp;NMT is bidirectional.&nbsp; It considers the rest of the sentence and the next sentence in both the source and the translation while it is generating the text.&nbsp; It generates entire sentences at once.</p></div></div>
</div><div data-element-id="elm_yIisgdweYhX48cbBcwa4wg" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_yIisgdweYhX48cbBcwa4wg"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>Because it is a generative model, GPT is biased towards what is usual.&nbsp; If the original text in the other language is commonplace and expected, then GPT will find good ways to express that text in English in ways that are commonplace and expected, because that is what it is trained to do.&nbsp; If the original says something that is unexpected or expresses it in unexpected ways, GPT’s translation is likely to replace it with something more usual using some of the same words.&nbsp; GPT does well at translation essentially because most things that require translation are predictable and unoriginal.</p></div></div>
</div><div data-element-id="elm_XU06fCYNGFWKXyoER9rTUw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_XU06fCYNGFWKXyoER9rTUw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>NMTs have a whole bag of tricks to deal with translation tasks that GPT does not, including specialized knowledge about the structure of languages, and tricks to deal with numbers, capitalization, and non-standard spacing correctly and efficiently.&nbsp; They are also trained to preserve information.&nbsp; You know that trick that people sometimes use, translating a sentence to another language then back to English so they can laugh at the result?&nbsp; NMTs include that round-trip in their training, to make sure that none of the meaning gets lost in the translation.&nbsp; Other tricks include having the neural network teach another neural network how to translate, detecting errors in the training data, and other tricks that address common translation errors.&nbsp; There are also tricks to reduce gender bias, a problem that still plagues GPT.</p></div></div>
</div><div data-element-id="elm_36dwuF6_ONEOJ5UGuuNZhw" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_36dwuF6_ONEOJ5UGuuNZhw"].zpelem-heading { border-radius:1px; } </style><h3
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><div><div><div><p>Computing power required</p></div></div></div></h3></div>
<div data-element-id="elm_968TT3XKXbL3DR-Ht9w_3g" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_968TT3XKXbL3DR-Ht9w_3g"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>Current Azure Translator NMT uses models of about 50 million parameters, which can run 4 language pairs in a Docker container on a host having a 2-core CPU with 2 GB memory.&nbsp; Even the next generation of NMTs, <a href="https://blog.icefire.ca/blogs/post/how-to-get-a-higher-level-of-machine-translation-quality" title="Z-code MoE" rel="">Z-code MoE</a>, which have 100 languages (10,000 language pairs) in a single model, can fit on a single GPU even though they have billions or hundreds of billions of parameters.&nbsp; These are sizes for querying, what is required for training is much bigger.&nbsp; GPT-4 uses 100 trillion parameters.&nbsp; Training requires hundreds of thousands of CPUs and tens of thousands of GPUs, but to query them, it looks like a single cluster of 8 GPUs and a dozen or two CPUs is what is required.&nbsp; Microsoft is very good at shrinking by orders of magnitude the size of machines required to run AI models so direct comparison is difficult, but NMTs deliver translations at much lower computational cost.&nbsp; Microsoft’s DeepSpeed library in particular increases speed and reduces latency by a large factor.</p><p><br></p><p>The computing power required also has a potential impact on security.&nbsp; NMTs, even the bigger potential NMTs can be run on a single processor, while GPT requires many processors.&nbsp; Using GPT you are probably sharing hardware with strangers, while for NMT it is possible to have dedicated resources.&nbsp; Because of its recurrent architecture, where the output is fed back into the input, GPT probably has some static storage of your data, while NMT can be architected with a pipeline where neither the input nor output text is ever stored.&nbsp; I don't know how it is implemented by anyone, but I notice that for Azure, NMT has a no-trace option by default while GPT limited access previews do not.&nbsp; Because of ethical concerns, data is probably retained for abuse monitoring.&nbsp; I'm sure the security is good, but the architecture reduces the options for security.</p></div></div>
</div><div data-element-id="elm_azlB0lmtdPdOlibqoksLHQ" data-element-type="image" class="zpelement zpelem-image "><style> @media (min-width: 992px) { [data-element-id="elm_azlB0lmtdPdOlibqoksLHQ"] .zpimage-container figure img { width: 1110px ; height: 713.35px ; } } @media (max-width: 991px) and (min-width: 768px) { [data-element-id="elm_azlB0lmtdPdOlibqoksLHQ"] .zpimage-container figure img { width:723px ; height:464.64px ; } } @media (max-width: 767px) { [data-element-id="elm_azlB0lmtdPdOlibqoksLHQ"] .zpimage-container figure img { width:415px ; height:266.70px ; } } [data-element-id="elm_azlB0lmtdPdOlibqoksLHQ"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="" data-size-mobile="" data-align="center" data-tablet-image-separate="false" data-mobile-image-separate="false" class="zpimage-container zpimage-align-center zpimage-size-fit zpimage-tablet-fallback-fit zpimage-mobile-fallback-fit hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/performance.jpg" width="415" height="266.70" loading="lazy" size="fit" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_DOtJj3rG9dVwXmZBaX0S3A" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_DOtJj3rG9dVwXmZBaX0S3A"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true">Conclusion</h2></div>
<div data-element-id="elm_XFr0lnW8TS4_CtxFNGC9gg" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_XFr0lnW8TS4_CtxFNGC9gg"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="color:inherit;"><p>The blanket claim that GPT is better at translation in not generally true.&nbsp; However GPT is surprisingly good, considering that translation is a task that it was neither designed nor trained for.&nbsp; It is unexpected that it is sometimes equal to or better than the highly specialized NMTs.&nbsp; There is a fair bit of work being done on hybrid systems that combine the accuracy and specialized training of NMT with the fluency of GPT and will deliver the best of both. The next generation of NMT (see <span style="color:inherit;"><a href="https://blog.icefire.ca/blogs/post/how-to-get-a-higher-level-of-machine-translation-quality" title="How to get a higher level of machine translation quality" rel="">How to get a higher level of machine translation quality</a></span><span style="color:inherit;">) will also allow the model to transfer language knowledge obtained from one language to other related languages, and in that way vastly improve the quality for low-resource languages such as southern Slavic languages.&nbsp; That innate knowledge of what is common to languages in the same family can then be used to improve the quality of both NMT and GPT.</span></p><div><br></div><div><div style="color:inherit;"><p><a href="https://arxiv.org/abs/2309.11674" title="A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models" rel="">A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models</a>&nbsp;<br></p><div><br></div></div></div></div></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Fri, 06 Oct 2023 09:30:00 -0400</pubDate></item><item><title><![CDATA[Inuktitut machine translation is here. Why that's a big deal.]]></title><link>https://blog.icefire.ca/blogs/post/inuktitut-machine-translation-is-here-why-that-s-a-big-deal1</link><description><![CDATA[<img align="left" hspace="5" src="https://blog.icefire.ca/INUKTITUT-1.png"/>Azure Translator Text now supports the Inuktituk language spoken in the Inuit area in the far north of North America.]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm__uaIH0jcQxSnYhilZfwi7w" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_vZmFG8xlSwSw0UcWb4khyQ" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"></style><div data-element-id="elm_w1VhFEDrQ_ShtiBhuJ438A" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"></style><div data-element-id="elm_9ibTL2zXQt2JnqYYoa8gpw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_9ibTL2zXQt2JnqYYoa8gpw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-center " data-editor="true"><p style="text-align:left;"><span style="color:inherit;font-size:16px;">Azure Translator Text now&nbsp;<a href="https://news.microsoft.com/en-ca/2021/01/27/microsoft-introduces-inuktitut-to-microsoft-translator/">supports the Inuktituk language</a>&nbsp;spoken in the Inuit area in the far north of North America.</span></p><div style="text-align:left;font-size:14px;"><br></div><p><span style="color:inherit;font-size:16px;"></span></p><div style="text-align:left;"><span style="font-size:16px;">Over the years, I've received a lot of requests to provide machine translation for Inuktitut.&nbsp; Despite the tools that are available, particularly from Microsoft, to&nbsp;<a href="https://portal.customtranslator.azure.ai/">train your own neural translation engine&nbsp;</a>for an unsupported language using a corpus of translated documents, and a great&nbsp;<a href="https://www.aclweb.org/anthology/2020.lrec-1.312/">bilingual corpus</a>&nbsp;from the debates of the&nbsp;<a href="http://www.inuktitutcomputing.ca/NunavutHansard/info.php?lang=en">Nunanut legislative assembly</a>, I knew that this would not be possible.&nbsp; Other machine translation experts also agreed that it was beyond the state of the art.&nbsp; On a couple of occasions I had applied for funding to push beyond the state of the art to make this possible, unsuccessfully.&nbsp; Why is machine translation of Inuktitut so difficult?</span></div></div>
</div><div data-element-id="elm__jZB9Zi9F3KKYkuoh9q-3w" data-element-type="image" class="zpelement zpelem-image "><style> [data-element-id="elm__jZB9Zi9F3KKYkuoh9q-3w"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="size-original" data-size-mobile="size-original" data-align="left" data-tablet-image-separate="" data-mobile-image-separate="" class="zpimage-container zpimage-align-left zpimage-size-fit zpimage-tablet-fallback-fit zpimage-mobile-fallback-fit hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/INUKTITUT-1.png" size="fit" data-lightbox="true" style="width:100%;padding:0px;margin:0px;"/></picture></span></figure></div>
</div><div data-element-id="elm_yEkm9WpDQhOR5O6BF7rYJA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_yEkm9WpDQhOR5O6BF7rYJA"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="font-size:14px;"><div><span style="font-size:16px;">Inuktitut belongs to a class called &quot;polysynthetic languages&quot;.&nbsp; Most of the languages that you know are probably &quot;agglutinative&quot;&nbsp; There are some root words which can be modified by changing the beginning or the end of the word.&nbsp; The root word is in the dictionary, but the words that are modified by adding or changing suffixes and prefixes are typically not, because eveyone knows the rules.&nbsp; These agglutinative languages are part of a larger class called synthetic languages, which includes other simple rules for sticking words together, usually with a small set of rules that apply to one part of speech.&nbsp; For example German can stick a lot of known nouns end-to-end to make a new word, but there is one root word and all the other words are modifying or narrowing down the sense one of the words, and the resulting word behaves like a longer version of the base word, and has the same part of speech.</span></div><div><br></div><div><span style="font-size:16px;">Inuktitut is polysynthetic.&nbsp; The combination rules are much more complex.&nbsp; There can be several root&nbsp;concepts and root&nbsp;words, and it can lose its part of speech or change it because the full word is an entire sentence with subject object, verb, adjective, even subordinate clauses, all contained in a big compound word. How you join words together can vary using complex rules about what comes before and after the join.&nbsp; A well known example is the word &quot;ᖃᖓᑕᓲᒃᑯᕕᒻᒨᕆᐊᖃᓛᖅᑐᖓ&quot; which means &quot;I'll have to go to the airport&quot;.&nbsp; Verbs, nouns, subject, object, they're all contained in the same word.</span></div><div><br></div><div><span style="font-size:16px;">Not all but most native American languages are polysynthetic.&nbsp; Unlike other languages, neural networds can't just have a dictionary and some rules and train the translation engine to see patterns of three or four words in a row that always translate to the same 3 or 4 words in another language.&nbsp; Almost all the neural translation engines I have seen are word-based.&nbsp; There are languages that are written without spaces, like Chinese, Japanese, Thai,&nbsp;and Korean, but they still have individual words and breaking them up into individual words is relatively simple.&nbsp; Not so with polysynthetic languages.</span></div><div><br></div><div><span style="font-size:16px;">I don't see any information about how Microsoft tackled the problem for Inuktitut.&nbsp; I am assuming that they used a tool to break down words into morphemes.&nbsp; What I would have used but didn't get funding for was the National Research Council's&nbsp;<a href="http://www.inuktitutcomputing.ca/Uqailaut/info.php">Uqailaut Inuktitut Morphological Analyzer</a>, but I don't know whether Microsoft did something similar.&nbsp; I am watching for any publications about it. There have been&nbsp;<a href="https://www.clsp.jhu.edu/workshops/19-workshop/neural-polysynthetic-language-modeling-leveraging-related-low-resource-languages-and-rule-based-resources/">some advances lately</a>&nbsp;in&nbsp;<a href="https://github.com/neural-polysynthetic-language-modelling">modeling and translating these languages</a>, so that is not the only approach.</span></div><div><br><span style="font-size:16px;">On the other hand, perhaps they trained a neural network to decompose into morphemes and vice-versa without a standalone processor.&nbsp; If that's the case, then the same techniques could be used for various other widespread but hard to translate polysynthetic languages from the Algonquian language family like Cree and Ojibwe, or Iroquoian languages like Mohawk, or Athabascan langauges like Dene or Navajo, or Siouan languages like Dakotan.&nbsp; It's a game changer.</span></div><div><br></div><p><span style="color:inherit;font-size:16px;"></span></p><div><span style="font-size:16px;">Oh, and if you were curious, PointFire Translator now supports translation to and from Inuktitut on SharePoint sites, just use language code &quot;iu&quot;.&nbsp; Your browser should already support Canadian Aboriginal Syllabics.</span></div></div></div>
</div><div data-element-id="elm_g1dAcLKU5Y6LQBEiqtUzMw" data-element-type="divider" class="zpelement zpelem-divider "><style type="text/css"> [data-element-id="elm_g1dAcLKU5Y6LQBEiqtUzMw"].zpelem-divider{ border-radius:1px; } </style><style></style><div class="zpdivider-container zpdivider-line zpdivider-align-center zpdivider-width100 zpdivider-line-style-solid "><div class="zpdivider-common"></div>
</div></div><div data-element-id="elm_eKoz0-Mq42FM9P3nm8VdFQ" data-element-type="heading" class="zpelement zpelem-heading "><style> [data-element-id="elm_eKoz0-Mq42FM9P3nm8VdFQ"].zpelem-heading { border-radius:1px; } </style><h2
 class="zpheading zpheading-style-none zpheading-align-left " data-editor="true"><span style="font-size:16px;">Related Posts</span></h2></div>
<div data-element-id="elm_h1v9YM8MfXtbOBqesqlpTQ" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_h1v9YM8MfXtbOBqesqlpTQ"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><p><span style="color:inherit;"><a href="https://blog.icefire.ca/blogs/post/Machine-Translation-in-PointFire-20131" target="_blank" rel="">Machine Translation in PointFire 2013</a></span><br></p><p><span style="color:inherit;"><a href="https://blog.icefire.ca/blogs/post/Why-Do-We-Need-PointFire-for-Multilingual-Collaboration-in-SharePoint-Part-II" title="Why Do We Need PointFire for Multilingual Collaboration in SharePoint? Part II" target="_blank" rel="">Why Do We Need PointFire for Multilingual Collaboration in SharePoint? Part II</a></span><a href="https://blog.icefire.ca/blogs/post/Why-Do-We-Need-PointFire-for-Multilingual-Collaboration-in-SharePoint-Part-II" title="Why Do We Need PointFire for Multilingual Collaboration in SharePoint? Part II" target="_blank" rel=""><br></a></p><p><span style="color:inherit;"><a href="https://blog.icefire.ca/blogs/post/Why-Do-We-Need-PointFire-for-Multilingual-Collaboration-in-SharePoint-Part-III" target="_blank" rel="">Why Do We Need PointFire for Multilingual Collaboration in SharePoint? Part III</a></span></p></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Fri, 05 Feb 2021 23:02:00 -0500</pubDate></item><item><title><![CDATA[Supporting another 12 Languages]]></title><link>https://blog.icefire.ca/blogs/post/supporting-another-12-languages</link><description><![CDATA[In the most recent version of PointFire Translator (beta) we are introducing new or enhanced support for 12 new languages. Of these, four are languages ]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_I4kfcOuqTyqg8ZQakpsfyA" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_dGxewUuER8qoc71y-7sh3Q" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"></style><div data-element-id="elm_reMN5opnREmReS8XdV_NmA" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"></style><div data-element-id="elm_JNDU55P4SiSfa1A8PxPQ4A" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_JNDU55P4SiSfa1A8PxPQ4A"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-center " data-editor="true"><p style="text-align:left;"><span style="color:inherit;font-size:16px;">In the most recent version of PointFire Translator (beta) we are introducing new or enhanced support for 12 new languages.</span></p><div style="text-align:left;font-size:14px;"><br></div><div style="text-align:left;"><span style="font-size:16px;">Of these, four are languages that are supported by SharePoint.&nbsp; Irish and Kazakh languages are now supported for machine translation.&nbsp; That means if your SharePoint site supports Irish or Kazakh, PointFire Translator can now translate its pages, documents, and lists, and PointFire 365 will automatically filter and/or redirect as appropriate.&nbsp; If you want to translate the user interface, contact us, one of the steps is different for those languages than for other languages.</span></div><div style="text-align:left;font-size:14px;"><br></div><div style="text-align:left;"><span style="font-size:16px;">PointFire Translator now supports European Portuguese and Brazilian Portuguese as two separate languages.&nbsp; Before this, the same translation engine was used for both, a neutral Portuguese that was actually closer to the Brazilian version.</span></div><div style="text-align:left;font-size:14px;"><br></div><div style="text-align:left;"><span style="font-size:16px;">Several new languages have been added to PointFire Translator which are not supported by SharePoint, including&nbsp;Māori (New Zealand), and five languages from India and Pakistan: Marathi, Gujarati, Punjabi, Malayalam and Kannada.&nbsp; PointFire Translator will happily translate to or from those non-SharePoint languages, but PointFire 365 will be unable to filter by that language code.</span></div><div style="text-align:left;font-size:14px;"><br></div><p><span style="color:inherit;font-size:16px;"></span></p><div style="text-align:left;"><span style="font-size:16px;">All of those new languages have Neural Network engines behind them.&nbsp; Irish, Brazilian Portuguese, Marathi, Gujarathi, and Māori have a customizable engine, meaning you can re-train it with your own documents to improve the translation quality.</span></div></div>
</div><div data-element-id="elm_y8SwSV3b4Fq5pPtEVLBR0w" data-element-type="image" class="zpelement zpelem-image "><style> [data-element-id="elm_y8SwSV3b4Fq5pPtEVLBR0w"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="size-original" data-size-mobile="size-original" data-align="left" data-tablet-image-separate="" data-mobile-image-separate="" class="zpimage-container zpimage-align-left zpimage-size-original zpimage-tablet-fallback-original zpimage-mobile-fallback-original hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/KLINGON.png" size="original" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_o-2Ru26_CFgirnFBGAlGCQ" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_o-2Ru26_CFgirnFBGAlGCQ"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><div style="font-size:14px;"><div><span style="font-size:16px;">The other two languages, or rather one language and two scripts, are ones that PointFire Translator had supported before and which had stopped working and discontinued.&nbsp; In preparation for the&nbsp;<a href="https://www.collabsummit.space/en/">Galactic Collaboration Summit</a>&nbsp;we decided to brush off our Klingon translator.&nbsp; This is where we discovered that there had been an undocumented change to Microsoft's Klingon language codes.&nbsp; So we are happy to announce that we have reinstated&nbsp;Klingon (Latin script) and Klingon (pIqaD script).&nbsp; If you choose the pIqaD script, make sure that you download a font that supports it, and change the font on the document.&nbsp; This language only has a statistical translation engine, not a neural translation engine, so the quality is not very good.&nbsp; But to paraphrase Samuel Johnson, it is like a dog's walking on his hind legs. It is not done well, but you are surprised to find it done at all.</span></div><div><br></div><p><span style="color:inherit;font-size:16px;"></span></p><div><span style="font-size:16px;">If you're keeping count, that is 73 languages in total.</span></div></div></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Wed, 17 Jun 2020 02:06:00 -0400</pubDate></item><item><title><![CDATA[Changes to Serbian language codes in SharePoint Online]]></title><link>https://blog.icefire.ca/blogs/post/Serbian-language-codes</link><description><![CDATA[One of the changes that went into general availability last week was the retiring of the &quot;sr-Latn-CS&quot; language code, Locale ID 2074. &quot;Se ]]></description><content:encoded><![CDATA[<div class="zpcontent-container blogpost-container "><div data-element-id="elm_Fqs-VD4oTe2CW6QXTpwNLQ" data-element-type="section" class="zpsection "><style type="text/css"></style><div class="zpcontainer-fluid zpcontainer"><div data-element-id="elm_4EfKKz1rSXqH3AtF8JTdtg" data-element-type="row" class="zprow zprow-container zpalign-items- zpjustify-content- " data-equal-column=""><style type="text/css"></style><div data-element-id="elm_D9O2riKJQZ6EUQqIhdJm_A" data-element-type="column" class="zpelem-col zpcol-12 zpcol-md-12 zpcol-sm-12 zpalign-self- "><style type="text/css"></style><div data-element-id="elm_Jh1h2QFORm6kJBlAewvNTA" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_Jh1h2QFORm6kJBlAewvNTA"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-center " data-editor="true"><p style="text-align:left;"><span style="color:inherit;font-size:16px;">One of the changes that went into general availability last week was the retiring of the &quot;sr-Latn-CS&quot; language code, Locale ID 2074.</span></p><div style="text-align:left;font-size:14px;"><br></div><div style="text-align:left;"><span style="font-size:16px;">&quot;Serbian (Cyrillic, Serbia)&quot; [LCID=10266, language code &quot;sr-Cyrl-RS&quot;] was renamed &quot;Serbian (Cyrillic)&quot; and&nbsp;</span></div><p><span style="color:inherit;font-size:16px;"></span></p><div style="text-align:left;"><span style="font-size:16px;">&quot;Serbian (Latin, Serbia)&quot; [LCID=9242, language code &quot;sr-Latn-RS&quot;] was renamed &quot;Serbian (Latin)&quot;.</span></div></div>
</div><div data-element-id="elm_Y8EwYFl59DN9cMQO2p6w7g" data-element-type="image" class="zpelement zpelem-image "><style> [data-element-id="elm_Y8EwYFl59DN9cMQO2p6w7g"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="size-original" data-size-mobile="size-original" data-align="left" data-tablet-image-separate="" data-mobile-image-separate="" class="zpimage-container zpimage-align-left zpimage-size-original zpimage-tablet-fallback-original zpimage-mobile-fallback-original hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/SERBIAN.png" size="original" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_2vheVsAgZV9A7zfzWwXQKQ" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_2vheVsAgZV9A7zfzWwXQKQ"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><p><span style="color:inherit;font-size:16px;">&quot;CS&quot; in the&nbsp;sr-Latn-CS was the former country code for the State Union of Serbia and Montenegro until it was dissolved in 2006 and Serbia and Montenegro became separate countries.&nbsp; &quot;RS&quot; is the code for&nbsp;&quot;Republic of Serbia&quot; not to be confused with Republika Srpska. I have never noticed any UI difference between sr-Latn-CS and sr-Latn-RS, they were probably identical.&nbsp;Many users in countries other than Serbia where Serbian is used, such as Montenegro or Bosnia and Herzegovina, tended to use the code because it did not mention Serbia in the name.&nbsp; In case you're wondering, yes the versions of Serbian used in Bosnia and in Montenegro are different from the one used in Serbia, in fact there are a few more letters in the alphabet.&nbsp; However they are not supported as distinct languages in SharePoint.</span><br></p></div>
</div><div data-element-id="elm_7qN-I7Q8XxK6q-NKeoVLtA" data-element-type="image" class="zpelement zpelem-image "><style> [data-element-id="elm_7qN-I7Q8XxK6q-NKeoVLtA"].zpelem-image { border-radius:1px; } </style><div data-caption-color="" data-size-tablet="size-original" data-size-mobile="size-original" data-align="left" data-tablet-image-separate="" data-mobile-image-separate="" class="zpimage-container zpimage-align-left zpimage-size-original zpimage-tablet-fallback-original zpimage-mobile-fallback-original hb-lightbox " data-lightbox-options="
                type:fullscreen,
                theme:dark"><figure role="none" class="zpimage-data-ref"><span class="zpimage-anchor" role="link" tabindex="0" aria-label="Open Lightbox" style="cursor:pointer;"><picture><img class="zpimage zpimage-style-none zpimage-space-none " src="/SERBIAN-VERSIONS.png" size="original" data-lightbox="true"/></picture></span></figure></div>
</div><div data-element-id="elm_TMU0DIxUq9meo70Ml_Xatw" data-element-type="text" class="zpelement zpelem-text "><style> [data-element-id="elm_TMU0DIxUq9meo70Ml_Xatw"].zpelem-text{ border-radius:1px; } </style><div class="zptext zptext-align-left " data-editor="true"><p><span style="color:inherit;font-size:16px;">The locale IDs sr-Latn-RS and sr-Latn-BA&nbsp; still exist in Delve, labelled with the country name for display language and for locales, and if you choose one there is a mapping done to the supported version of Serbian that uses the same script.</span><br></p></div>
</div></div></div></div></div></div> ]]></content:encoded><pubDate>Mon, 01 Jun 2020 05:01:00 -0400</pubDate></item></channel></rss>