I tried to enter my name “哎满” and it didn’t work. I asked in libera chat and they said you can’t enter non-ascii chars in IRC, only few IRC instances supports it since it could be easily abused.
Erroneous Nickname: 哎满
Unicode has a lot of “lookalike” characters, so if you’re allowed to select characters as a unique identifier to other users, permitting selection of arbitrary Unicode characters opens the possibility to impersonate users.
I believe that there is some system for dealing with this for domain names, as they permit for Unicode and being able to uniquely identify domains is important. I don’t know if this could be generalized to other Unicode-using applications.
The system for domain names is called Punycode: https://en.wikipedia.org/wiki/Punycode
But it’s still combined with domain registrars rejecting names like “αpple.com”, which ultimately needs a human to approve names.
There could also be a system like here on Lemmy, where there’s a separate display name, but it still doesn’t really solve the impersonation problem…
Some TLDs don’t allow full unicode either. Country TLDs usually just add their own special chars, for example .se (sweden) allows åäö.
The whole thing has a name as well: https://en.wikipedia.org/wiki/IDN_homograph_attack
I’d also add that ASCII has had some similar issues in the part, but that tends to have been ironed out by now via changes to onscreen typefaces.
For example, some old typewriters don’t have a “0” key or a “1” key because capital-o and lowercase-l looked similar enough and context was sufficient to let them be used in place of the corresponding number. This trained some people to do that, to the point that various software adapted to permit misuse of one in the place of the other. To this day, I can open up Firefox, and the following webpage will render green text:
<html><font color="#OOFFOO">green text </font></html>
Some other fixes were were made over time, like making capital-i, lowercase-l, and the pipe (“I”, “l”, and “|”) as more-visually-distinct characters in typefaces where this matters.
In the monospaced font world, “programming” or “coding” fonts, where not confusing the character in question is particularly important, place a premium on keeping characters like this particularly distinctive, even at the cost of trading off some aesthetic appeal or conforming to traditional typography or handwriting-like conventions for letters. You’ll get more-distinctive “.” and “,”, “O” and “0”, “l”, “I”, and “|”, “j” and “i”, etc.
A slight correction on IRC terminology:
There are no IRC instances, if you are talking about Quakenet or EFnet, then you are talking about IRC networks, which consists of several servers.