When writing for technical audiences, there are various ways to type Unicode representations, but they all seem to be Hexadecimal:
uFFFF
– From C# / Java StringsU0000FFFF
– From C# / Java Strings
However, Unicode can also be specified in decimal, and can be entered both ways in XML Entities:
- Hexadecimal:

- Decimal:

While I could just (kind of) use u65535
, this does make something that is already defined as specifically hexadecimal and abuse it, and also could cause problems – is u1111
decimal or hex?
So – are there any programming languages that allow similar ways to denote Unicode characaters by decimal, or common shorthand conventions for specifying Unicode in decimal notation for technical audiences?
9
There is no commonly accepted decimal notation for Unicode codepoints.
Unicode codepoints are almost universally represented in hexadecimal. The sole exception I’m aware of is the use of Numeric Character References (NCRs) in languages derived from SGML (e.g., HTML and XML), which can take one of two forms: &#nnn;
in decimal or &#xnnn;
in hexadecimal.
In other contexts, some languages have attempted to intermingle differing numeric bases – the best-known example being C’s use of nnn
for decimal, 0nnn
for octal, and 0xnnn
for hexadecimal. Even with this well-known usage, it trips up beginning C programmers all the time that 012
and 12
are different numbers.
7
So far here are some ways decimal Unicode can be specified enough to convey the meaning to anyone who searches:
- XML Entity Notation:
&#nnn;
- C / C++ Wide Character:
wchar_t(nnn)
- SQL National Character:
NCHAR(nnn)
(Available in MSSQL, MySQL, Oracle) - Windows ALT Notation:
ALT+nnn
(although it can be easily confused with the hex version with the plus key,ALT++01bd
)
I’m interested in any others to add to this list.