Avoid references losing their data (showing as plain-text "[1]") when added to the translation using MinT
Closed, ResolvedPublic

Description

Translators from the wikiproject Medicine have reported issues with references in Content Translation when MinT is being used. The same issues do not occur when Google Translate is being used.
The report is based on the translation of this page into Igbo. A shorter page has been created as a test case. You can use this quick link to start translating it.

Adding a paragraph with references used in multiple sentences to the translation in some cases results in the paragraph added to the translation where for some sentences a given reference is added correctly but for another sentence it is added as a plain text "[1]" without using any template. Resulting in template data getting lost for such instance.
Given that the same reference can be adapted in some cases (which suggests that all conditions are met for being able to adapt the reference), we may need to inspect why that same reference fails to adapt when used in another sentence.

Notice the highlighted sentence in the screenshots below:

Using MinT
Screenshot 2024-04-26 at 12.35.07 2.png (268×1 px, 91 KB)
Using Google Translate
Screenshot 2024-04-26 at 12.35.55 2.png (270×1 px, 95 KB)

In other cases, the reference gets added but the "[1]" plain text version gets added too next to it. You can check the highlighted sentence below:

Using MinT
Screenshot 2024-04-26 at 12.38.18 2.png (248×1 px, 86 KB)
Using Google Translate
Screenshot 2024-04-26 at 12.40.39 2.png (235×1 px, 88 KB)

The expected result would be for references to be added to the translation.

Event Timeline

Pginer-WMF renamed this task from Avoid references losing their data when added to the translation, presented as plain-text "[1]" to Avoid references losing their data (showing as plain-text "[1]") when added to the translation using MinT.Apr 26 2024, 11:06 AM
Pginer-WMF triaged this task as Medium priority.

I was able to reproduce and find out the pattern that cause this issue. Repeated references. Only the first one gets fixed in MT. Second one onwards, it appears plain text. A few months back I had addressed this by keeping a search start in look up logic, but it is not catching repeatations outside the sentence. I am exploring potential solutions.

Change #1041542 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/machinetranslation@master] html: Fix bug in repeated annotation translation

https://gerrit.wikimedia.org/r/1041542

Change #1041542 merged by jenkins-bot:

[mediawiki/services/machinetranslation@master] html: Fix bug in repeated annotation translation

https://gerrit.wikimedia.org/r/1041542

Change #1042541 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update MinT to 2024-06-12-111204-production

https://gerrit.wikimedia.org/r/1042541

Change #1042541 merged by jenkins-bot:

[operations/deployment-charts@master] Update MinT to 2024-06-12-111204-production

https://gerrit.wikimedia.org/r/1042541

Mentioned in SAL (#wikimedia-operations) [2024-06-13T08:29:19Z] <kart_> Updated MinT to 2024-06-12-111204-production (T363563)

The issue is no longer happening after the fix. As it is illustrated below, all instances of the references are transferred to the translation:

ig.wikipedia.org_wiki_Special_ContentTranslation_from=en&to=ig&campaign=undefined&page=User%3ACXTests%2FT363563(Wiki Tablet) 2.png (271×1 px, 85 KB)

ig.wikipedia.org_wiki_Special_ContentTranslation_from=en&to=ig&campaign=undefined&page=User%3ACXTests%2FT363563(Wiki Tablet).png (283×1 px, 83 KB)