Jump to content

Talk:List of Unicode characters

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Why is U+00A0 not in the control character section?

[edit]

Its function is a control character no? — Preceding unsigned comment added by 76.81.249.42 (talk) 01:52, 9 October 2019 (UTC)[reply]

U+00A0 has a general category of Zs (Separator, space), not Cc (Other, control) per UnicodeData.txt. BTW: I've removed U+0020 from the control character section's table because it too has a Unicode general category of Zs and the text before the table correctly states there are "65 characters, including DEL but not SP". DRMcCreedy (talk) 04:13, 9 October 2019 (UTC)[reply]

Octal Entity Reference Code

[edit]

Octal code is very useful & still need to be used in some programs, for example: in bash/shell programming, escape sequence, JS(javascript), perl, postscript, etc, etc. Various OS core (low-level) libraries/programs still use octal, & its especially need to be viewed for Control-Characters, Basic-Latin, etc Unicode characater ranges.
To see/obtain more octal chart/code, you may go here: https://utf8-chartable.de/unicode-utf8-table.pl?utf8=oct
More info: https://en.wikipedia.org/wiki/UTF-8#Examples ,
Wiki page on Octal needs to be updated further with a more detail on how octal numbers are actually used in different type of computer programs. Literal conversion from hex/dec to oct is not enough for all cases. But one sentence that has "\3nn", does mention the UTF-8 based octal usage, but needs elaboration. In shell terminal, 3-digits octal code can be used, for-example, we will try to show ÷ (U+00F7) and € (U+20AC) sign: this code ‟printf "Not-Bold. \303\267 . \342\202\254 (1) \xE2\x82\xAC (2) \x20AC (3) \u20AC (4) \U000020AC (5). \u \033[1mBold\033[0m.\n";
Or this code ‟echo $'Not-Bold. \303\267 . \342\202\254 (1) \xE2\x82\xAC (2) \x20AC (3) \u20AC (4) \U000020AC (5). \033[1mBold\033[0m.';
both will be displayed as: ‟No-Bold. ÷ . € (1) € (2) \x20AC (3) \u20AC (4) \U000020AC (5). Bold.” (in macOS-catalina(10.15.x) old bash v3.2.57 shell did not support (3)(4)(5) format) . € = U+20AC = Decimal code-point 8364 = Octal code-point 20254 = UTF-8-Octal \342\202\254 = UTF-8-Hex \xE2\x82\xAC.
To convert a symbol/character into octal, you may do this1:
printf 👍 | od -t o1
0000000 360 237 221 215 <-- Octal Unicode code-point 372115 (U+1F44D)
          ^  ^^  ^^  ^^.  --atErik1 (talk) 13:43, 5 September 2020 (UTC)[reply]

The mysterious # column

[edit]

Hi, most of the tables from Basic_Latin through Cyrillic have a rightmost column headed #. What is the significance? Without an explanation the naive reader is left to guess. =8~/ Thx, ... PeterEasthope (talk) 02:59, 18 November 2022 (UTC)[reply]

It's the decimal value for the hexidecimal Unicode code point. I agree it should definitely be labeled better. DRMcCreedy (talk) 03:26, 18 November 2022 (UTC)[reply]
No, it isn't. The numbers start with "001" at the space, and increment through Latin Extended-A. Then select characters in Latin Extended-B and Additional, IPA Extensions, Spacing Modifier Letters, then take up again in Greek and Coptic and Cyrillic. I have sheparded a script through the Unicode / ISO 10646 process, and I am confident I've never seen those values before. VanIsaac, GHTV contWpWS 04:47, 18 November 2022 (UTC)[reply]
Sorry, I was looking at the wrong column. My best guess is it's some enumeration of the characters in WGL-4, MES-1 and MES-2. Maybe just MES-2 since the article says MES-2 contains all the characters in WGL-4 and MES-1. The WGL-4, MES-1 and MES-2 table splits the Unicode code point up by "row" and "cells" but you can see it going from U+0020–7E, 00A0–FF, 0100-017F, 018F, 0192, 01B7, etc, which matches the # column. No idea why this as added to the List of Unicode characters article. Although the lede says "This article includes the 1062 characters in the Multilingual European Character Set 2 (MES-2) subset, and some additional related characters." DRMcCreedy (talk) 08:24, 18 November 2022 (UTC)[reply]