Once T294147 is deployed (probably in MediaWiki_1.40/wmf.5), we can reindex the relevant wikis, to activate ICU tokenization, ICU normalization, ICU folding, and homoglyph normalization.
Currently 7 wikis for Arabic (ar), and 6 wikis for Thai (th).
Acceptance Criteria
- All wikis in the relevant languages are reindexed
- A before-and-after analysis for each language's Wikipedia is provided