Explore possible approaches to support wikitext in MinT
Open, HighPublic

Description

MinT integrates machine learning models that operate using plain-text. Contents with markup, in particular in the form of wikitext, gets in the way of the translations affecting their quality (and the integrity of the markup).

This ticket proposes to explore possible approaches to reapply the markup to a translated content.

Event Timeline

Pginer-WMF triaged this task as Medium priority.Dec 4 2023, 4:42 PM
Pginer-WMF created this task.

Round trip technique like wikitext->html->wikitext is one way to achieve this. However it has limitations. For example, if wikitext has a template and one of the template parameter is nested wikitext, we will miss it in html rendering(For example i18n sentences with plural syntax etc). So translation will be incomplete.

A better solution would be to parse wikitext and build an AST. Then transform that AST to another AST by applying translation. This is a very interesting problem I would like to work personally as AST is one of favorite CS topic. However, such a parsing project for WIkiText is not trivial project. It could take months and require many people to address the wild nature of wikitext. Even if we accept that we work only on basic wikitext features, it is still a big project in python land. So for now, I am not committing to do any further explorations in this front. I will try to avoid my temptations to solve this problem by forgetting other things in hand.