Internet Character Sets and Encodings
Listed here are commonly used character sets (charset) and their encoding standards.
| Encoding standard |
Set |
Original (RFC 1345) |
UTF UCS Transform Format |
Unicode, UCS, ISO-10646 Universal Character Set |
ASCII (7 bit) |
US-ASCII |
UTF-7 (RFC 2152) |
ISO-10646-UCS-Basic |
Latin 1 (8 bit) |
ISO-8859-1 |
UTF-8 (RFC 2279) |
ISO-10646-Unicode-Latin1 |
Most languages (16 bit) |
|
UTF-16 (RFC 2781) |
ISO-10646-UCS-2 |
All languages (32 bit) |
|
|
ISO-10646-UCS-4 |
- Each row describes a character set that is a superset of the one before.
- Each column describes a character encoding standard.
- UTF encodes UCS to be compatible with the original encoding methods.
- Each cell refers the official character set and encoding name. Case doesn't matter.
- HTML 4, XHTML, and XML standards recommend that browsers internally use UCS.
- Standards before HTML 4 recommended that browsers internally use ISO-8859-1 which is the HTTP 1.1 default.
More information elsewhere
[up one level]
[home]
[about]
[copyright]
[contact]
This page changed 2003 May 10.