Ignore extra spaces form source text in the MinT test instance
Closed, ResolvedPublic

Description

When using the MinT test instance, pasting contents to the input text often results in extra space characters at the end. These seem to have an effect in the translation, resulting in unexpected content in the translation. An example is shown below:

Translation with some trailing spacesTranslation without extra spaces
translate.wmcloud.org_(Wiki Tablet) (9).png (768×1 px, 83 KB)
translate.wmcloud.org_(Wiki Tablet) (10).png (768×1 px, 81 KB)

Notice in the first example how the selected text in the source (highlighted in green) results in the translation containing the "Other " text at the end as an incomplete sentence. Something that does not happen when the extra spaces are removed.

The text used in the example is this:

Paplitę daugiausiai Afrikos žemyne, tik nedidelė liūtų populiacijos dalis – šiaurinių liūtų porūšio azijinė populiacija gyvena Azijos žemyno Indostano pusiasalio šiaurvakariuose.

This ticket proposes for MinT to trim the input text internally to remove leading and trailing spaces. We can consider whether it is also safe and beneficial to also remove in-between spaces except for a single space character between words (e.g., double spaces).

In this way, translating a message should produce the same result regardless of the number of extra spaces at the beginning or end of it.

Event Timeline

Pginer-WMF triaged this task as Medium priority.May 9 2024, 9:34 AM
Pginer-WMF moved this task from Backlog to General translation functionality on the MinT board.

Change #1051092 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/machinetranslation@master] plaintext translator: strip whitespace from the text to translate

https://gerrit.wikimedia.org/r/1051092

Change #1051092 merged by jenkins-bot:

[mediawiki/services/machinetranslation@master] plaintext translator: strip whitespace from the text to translate

https://gerrit.wikimedia.org/r/1051092

Change #1051290 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update MinT to 2024-07-02-060114-production

https://gerrit.wikimedia.org/r/1051290

Change #1051290 merged by jenkins-bot:

[operations/deployment-charts@master] Update MinT to 2024-07-02-060114-production

https://gerrit.wikimedia.org/r/1051290

Mentioned in SAL (#wikimedia-operations) [2024-07-03T07:36:57Z] <kart_> Updated MinT to 2024-07-02-060114-production (T364525)