Standardize ASCII-folding/ICU-folding across analyzers
Open, HighPublic8 Estimated Story Points

Description

User Story: As a multi-lingual searcher, I would like more consistency and predictability in how character folding works across wikis.

Some languages have ASCII folding disabled, some have it enabled, some have it enabled with the option to preserve the unfolded original; some upgrade ASCII folding (with or without preserving the original) to ICU folding.

Acceptance Critera:

  • Either an update to AnalysisConfigBuilder to make ASCII-folding / ASCII-folding preserve more consistently used or a better understanding of why it should be different across languages.
  • Bonus: An easy mechanism to enable custom ICU folding for a given language code without having to create a full analysis config for that language. (This may already exist.)

Event Timeline

TJones changed the point value for this task from 5 to 8.Jul 17 2023, 3:50 PM

Moving this back to the backlog in favor of a smaller next harmonization project.

TJones triaged this task as High priority.Feb 26 2024, 3:03 PM
TJones moved this task from needs triage to Language Stuff on the Discovery-Search board.