Punycode

Punycode, defined in RFC 3492, is a self-proclaimed "Bootstring encoding" of Unicode strings into the limited character set supported by the Domain Name System. The encoding is used as part of IDNA, which is a system enabling the use of internationalized domain names in all languages supported by Unicode, where the burden of translation lies entirely with the user application (e.g., web browser). The encoding is applied separately to each component of a domain name which is not representable solely within the ASCII character set, and a reserved prefix 'xn--' is added to the translated Punycode string. For example, bcher becomes bcher-kva in Punycode, and therefore the domain name bcher.ch would be represented as xn--bcher-kva.ch in IDNA. Compare an ASCII 'punycoded' URL http://xn--tdali-d8a8w.lv/ (working) and its full Unicode counterpart that does include Latvian characters with appropriate diacritics: http://tūdaliņ.lv (not working because this page is not in Unicode; instead, its character set is ISO-8859-1, which cannot correctly render URLs containing internationalized domain names). Google is able to search within the 'punycoded' sites; the query string to enter is, e.g. site:tūdaliņ.lv. Punycode is designed to work across all script systems, and to be self-optimising by attempting to adapt to the character set ranges within the string as it operates. It is optimised for the case where the string is composed of zero or more ASCII characters and in addition characters from only one other script system, but will cope with any arbitrary Unicode string. Note that for DNS use, the domain name string is assumed to have been normalised using Nameprep and (for top-level domains) filtered against an officially registered language table before being Punycoded, and that the DNS protocol sets limits on the acceptable lengths of the output Punycode string.

Spoofing concerns

Because Punycode allows websites to use full Unicode names, IDNA could leave their users open to phishing attacks. IDNA makes it possible to create a spoofed web site that looks exactly like another, including domain name and security certificate, but in fact is controlled by someone attempting to steal private information. See Internationalizing Domain Names in Applications for more. Firefox, a popular web browser, has recently changed its handling of Punycode for websites that use full Unicode names. Rather than disabling IDN, the Mozilla Foundation has settled on a temporary workaround where International Domain Names are displayed by Firefox 1.0.1 as "Punycode" by default so that spoofed websites are easier to spot. Mozilla does not see this as a permanent fix, and it’s unlikely to placate some critics who are urging browser manufacturers to stick by IDN. Apple Computer's Safari, as of Security Update 2005-003, does the same for a configurable list of scripts (defaulting to the three most likely to mislead: Greek, Cyrillic, and Cherokee). Rather than using a workaround, Opera, another popular web browser, has implemented a white-list for domain registrars that take care for possible exploits: only TLDs on the whitelist are allowed to use full IDN URLs. This means that websites for a whitelisted TLD will display the Unicode name, whereas other websites, while still working, will display the Punycode name with its xn-- prefix instead. Characters from Latin1 are allowed for all TLDs, even those not on the whitelist, as within Latin1 there is little chance for exploit using misleading characters.

External links

 

<< PreviousWord BrowserNext >>
thinkgeek
laafta
neo grec
biogen idec
macintosh iisi
gla
scsh
red warszawa
skeletal muscle
santo cilauro
lijiang river
john colvin
tanintharyi
william tryon
reed gold mine
edmondo de amicis
story musgrave
mark twain prize for american humor
lishui river
john f. kennedy center for the performing arts
cladoceran
belobog
duntroon
dazbog
dziewona
dzydzilelya
henry lawrence
jarilo
kupala
lada and lado
marzanna
marzyana
matka gabia
miesiac
oynyena maria
percunatel
porvata
siliniez
sorrowful god
stribog
sudz
swietowit
svarog
tawals