Jump to content

Community Wishlist Survey 2019/Wiktionary/Multiple collations per site: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
Line 12: Line 12:
*: @[[user:Anomie|Anomie]]: not sure I understand. What does "that one" refer to? — [[User:Automatik|Automatik]] ([[User talk:Automatik|talk]]) 19:44, 11 November 2018 (UTC)
*: @[[user:Anomie|Anomie]]: not sure I understand. What does "that one" refer to? — [[User:Automatik|Automatik]] ([[User talk:Automatik|talk]]) 19:44, 11 November 2018 (UTC)
*:: "That one" refers to [[Community Wishlist Survey 2019/Miscellaneous/Improvements of Categories in Chinese Wikipedia|Improvements of Categories in Chinese Wikipedia]]. [[User:Anomie|Anomie]] ([[User talk:Anomie|talk]]) 20:08, 11 November 2018 (UTC)
*:: "That one" refers to [[Community Wishlist Survey 2019/Miscellaneous/Improvements of Categories in Chinese Wikipedia|Improvements of Categories in Chinese Wikipedia]]. [[User:Anomie|Anomie]] ([[User talk:Anomie|talk]]) 20:08, 11 November 2018 (UTC)
*::: OK, but then I must specify that this one is also to allow multiple collations on a site—but is more general. — [[User:Automatik|Automatik]] ([[User talk:Automatik|talk]]) 20:25, 11 November 2018 (UTC)

Revision as of 20:25, 11 November 2018

Multiple collations per site

  • Problem: It is extremely common, on Wiktionary projects, to display entries of multiple languages on the same page. But, only one collation can be used on a particular Wikimedia project. That means: if a website uses a language-compliant collation, e.g. uca-default which is a English- and Portuguese-friendly collation, all categories concerning e.g. Swedish words, will sort words starting with Å under A, because Å is considered in English to be the same letter than A with a diacritic, while it is a whole new letter in Swedish (where it is sorted at the near end of the alphabet). Categories' headers are therefore incorrect for many languages with the current solution used on Wiktionary projects.
    Currently a way to circumvent the problem is to use the default Mediawiki collation (namely uppercase), but this implies that sort keys are added in all English/French/etc. entries with a diacritic in the title, as Å, É, etc., as all diacritic letters are considered as first-entry headers in categories, and this implies a huge amount of sort keys in pages to bypass this behavior (and thus sort Å under A for e.g. English), and makes Wiktionary projects less readable and editable for newcomers.
  • Who would benefit: users of Wiktionary categories
  • Proposed solution: allow multiple collations per site, and therefore collation to be specified per category: uca-sv should be used for Swedish-related categories, uca-es for Spanish cats, uca-default for English (and similar), etc.
  • More comments: Liangent and Bawolff have been working on this in the past, but feasability seems also to depend on sysadmins (for increased system load).
  • Phabricator tickets: phab:T30397
  • Proposer: Automatik (talk) 12:18, 11 November 2018 (UTC)[reply]

Discussion