ISO/IEC 8859-2
MIME / IANA | ISO-8859-2 |
---|---|
Alias(es) | iso-ir-101, csISOLatin2, latin2, l2, IBM1111 |
Language(s) | (see below) |
Standard | ECMA-94:1986, ISO/IEC 8859 |
Classification | Extended ASCII, ISO/IEC 8859 |
Extends | US-ASCII |
Based on | ISO-8859-1 |
Other related encoding(s) | Windows-1250, MacCroatian |
ISO/IEC 8859-2:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No. 2, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as "Latin-2". It is generally intended for Central[1] or "Eastern European" languages that are written in the Latin script. Note that ISO/IEC 8859-2 is very different from code page 852 (MS-DOS Latin 2, PC Latin 2) which is also referred to as "Latin-2" in Czech and Slovak regions.[2] Almost half the use of the encoding is for Polish, and it's the main legacy encoding for Polish, while virtually all use of it has been replaced by UTF-8 (on the web).
ISO-8859-2 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. Less than 0.04% of all web pages use ISO-8859-2 as of October 2022.[3][4] Microsoft has assigned code page 28592 a.k.a. Windows-28592 to ISO-8859-2 in Windows. IBM assigned code page 912 to ISO 8859-2,[5] until that code page was extended in 1999.[6] Code page 1111 is similar, but replaces byte B0 ° (degree sign) with U+02DA ˚ (ring above).
Windows-1250 is similar to ISO-8859-2 and has all the printable characters it has and more. However a few of them are rearranged (unlike Windows-1252, which keeps all printable characters from ISO-8859-1 in the same place).
Language coverage
[edit]These code values can be used for the following languages:
- ^ The missing letter Å is officially a part of the Finnish alphabet, however it has no native use and its usage is limited to foreign names only.
- ^ In 2017, the Council for German Orthography officially added a capital ẞ, but is not actually required as SS can be used instead.
- ^ This character set unifies Ș and Ț (S,T with commas below) with Ş and Ţ (S, T with cedillas), as did virtually all other character sets including Microsoft's Windows-1250 and the first version of Unicode. Unicode subsequently disunified them however, this complicated processing of Romanian data; pre-existing data and input methods would still contain the older cedilla codepoints, complicating text searching.[citation needed]
Code page layout
[edit]Differences from ISO-8859-1 have the Unicode code point number underneath.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | ||||||||||||||||
1x | ||||||||||||||||
2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
8x | ||||||||||||||||
9x | ||||||||||||||||
Ax | NBSP | Ą 0104 | ˘ 02D8 | Ł 0141 | ¤ | Ľ 013D | Ś 015A | § | ¨ | Š 0160 | Ş 015E | Ť 0164 | Ź 0179 | SHY | Ž 017D | Ż 017B |
Bx | ° | ą 0105 | ˛ 02DB | ł 0142 | ´ | ľ 013E | ś 015B | ˇ 02C7 | ¸ | š 0161 | ş 015F | ť 0165 | ź 017A | ˝ 02DD | ž 017E | ż 017C |
Cx | Ŕ 0154 | Á | Â | Ă 0102 | Ä | Ĺ 0139 | Ć 0106 | Ç | Č 010C | É | Ę 0118 | Ë | Ě 011A | Í | Î | Ď 010E |
Dx | Đ 0110 | Ń 0143 | Ň 0147 | Ó | Ô | Ő 0150 | Ö | × | Ř 0158 | Ů 016E | Ú | Ű 0170 | Ü | Ý | Ţ 0162 | ß |
Ex | ŕ 0155 | á | â | ă 0103 | ä | ĺ 013A | ć 0107 | ç | č 010D | é | ę 0119 | ë | ě 011B | í | î | ď 010F |
Fx | đ 0111 | ń 0144 | ň 0148 | ó | ô | ő 0151 | ö | ÷ | ř 0159 | ů 016F | ú | ű 0171 | ü | ý | ţ 0163 | ˙ 02D9 |
See also
[edit]References
[edit]- ^ "Microsoft Outlook Message Encodings". 10 January 2017.
- ^ "The Czech and Slovak Character Encoding Mess Explained". luki.sdf-eu.org. Retrieved 2022-02-27.
- ^ "Usage Statistics and Market Share of ISO-8859-2 for Websites, October 2022". w3techs.com. Retrieved 2022-10-23.
- ^ "Historical trends in the usage statistics of character encodings for websites, February 2022".
- ^ "Icu-data/Charset/Data/XML/Ibm-912_P100-1995.XML at main · unicode-org/Icu-data". GitHub.
- ^ "Icu-data/Charset/Data/Ucm/Ibm-912_P100-1999.ucm at main · unicode-org/Icu-data". GitHub.
External links
[edit]- ISO/IEC 8859-2:1999
- Standard ECMA-94: 8-Bit Single Byte Coded Graphic Character Sets - Latin Alphabets No. 1 to No. 4 2nd edition (June 1986)
- ISO-IR 101 Right-Hand Part of Latin Alphabet No.2 (February 1, 1986)
- ISO 8859-2 (Latin 2) Resources