Unicode: Difference between revisions

Content deleted Content added
→‎Character unification: Brief for now, might expand later.
Line 717:
 
==== Localised case pairs ====
For use in the [[Turkish alphabet]] and [[Azeri alphabet]], Unicode includes a separate [[dotless I|dotless lowercase {{serif|I}}]] (ı) and a [[İ|dotted uppercase {{serif|I}}]] ({{serif|İ}}). However, the usual ASCII letters are used for the lowercase dotted {{serif|I}} and the uppercase dotless {{serif|I}}. As such, case-insensitive comparisons for those languages have to use different rules than case-insensitive comparisons for other languages using the Latin script.<ref>{{cite web |url=https://unicode.org/Public/UNIDATA/CaseFolding.txt |work=Unicode Character Database |title=Case Folding Properties |institution=[[Unicode Consortium]] |date=2023-05-12}}</ref>
 
By contrast, the [[ð|Icelandic eth (ð)]], the [[đ|barred D (đ)]] and the [[ɖ|retroflex D (ɖ)]], which usually look the same in uppercase (Đ), are given the opposite treatment, and encoded separately in both letter-cases. This approach also has issues, requiring security measures relating to [[homoglyph]] attacks.<ref>{{cite web |url=https://unicode.org/Public/security/latest/confusablesSummary.txt |title=confusablesSummary.txt |work=Unicode Security Mechanisms for UTS #39 |date=2023-08-11 |institution=[[Unicode Consortium]]}}</ref>
 
==== Diacritics on lowercase {{serif|I}} ====