Pinyin Zhou logo Pinyin Joe's

Chinese Computing Help Desk


Custom Search

More Chinese Fonts & Apps

«« 1. More Fonts   « 2. More Apps: Input Methods & Tools    3. Encoding Standards

3. Chinese Character Encoding Standards (to help you understand the specs in third-party font and application packages)

Big 5

The character storage encoding standard of Taiwan for many years, Big 5 was originally developed by IBM. The original standard included over 13,500 traditional characters, and has since been extended to 18,300 in Microsoft's implementation of Big 5, their Code Page 950.

Microsoft JhengHei, first released with Vista, is a recent example of a Microsoft Code Page 950 font.


The "GB" stands for "Guojia Biaozhun", or "National Standard". The encoding standard adopted in mainland China in 1981, GB2312-1980 includes 6,763 simplified characters. The standard also includes 682 non-Han characters for a total of 7,445 characters, and many font vendors incorporate extensions bringing the total up to 7,600.

There are many GB2312 fonts out there from various vendors, and often they contain all the characters you're likely to need. In fact, most websites in the PRC use GB2312 encoding. Even Microsoft's web pages hosted there are in GB2312, although if you then click around and find yourself on a Microsoft web page targeted at a global Chinese-speaking audience and peek at the HTML you'll find those pages are in Unicode (utf-8).

« top


The "K" in GBK stands for "Kuozhan", meaning "extension". Adopted in 1993, GBK retained the code positions of the original GB set while packing in the rest of the 21,886 characters required for compatibility with Unicode 2.1 (ISO 10646-1). This has since been extended ever further in various implementations. Microsoft's implementation of GBK, Code Page 936, includes 22,000 characters.

The open Unicode standard was developed by several global software and computer platform vendors, and harmonized with a parallel effort by the ISO. The final Unicode/ISO specification is a true global standard, and the Chinese authorities clearly agree. But more work was needed, as there was not enough room in the GBK format to accommodate the characters added to Unicode between 1993 and 2000.

Microsoft YaHei, first released with Vista, is a recent example of a Code Page 936 font.

« top


The standard required by the PRC government since 2001, GB18030-2000 includes over 27,000 traditional and simplified characters, with room for many more, and even contains minority languages like Mongolian, Tibetan, and Yi.

GB18030 is (generally) compatible with Unicode standards, and backwards compatible with GB2312 and GBK. Mapping between all of these is now built into many conversion utilities. When converting back-and-forth between all the old and new standards there are occasional incompatibilities between GBK and Unicode, but most vendors have thought about this for you in advance and will keep you out of trouble.

Not all 27,000 characters will be in every font (and certainly most vendors don't include minority characters in their fonts, they just support the "code points"), but every font and every application sold in the PRC must now map to this standard.

The Microsoft SimSun font as released with Windows XP outside China supports Code Page 936, their GBK set. Microsoft released SimSun18030 to meet PRC requirements, and made it available for worldwide download in 2001. I believe that is the version of SimSun included worldwide since the release of Windows Vista.

« top

«« 1. More Fonts       « 2. More Apps: Input Methods & Tools       3. Encoding Standards

CJKV Information Processing 2nd Edition Dec 2008
New Edition:
CJKV Information Processing
Ken Lunde (O'Reily Press)


Home / What's New About Pinyin Joe About Pinyin About Pinyin Input Contact Pinyin Joe
Windows 10 & 8 Chinese Windows 7 & Vista Chinese Windows XP Chinese Ubuntu Linux Chinese Other OS: Android, Mac, ...
More Chinese Fonts More IMEs & tools Free Downloads FAQs Site Map

Copyright © 2005  All Rights Reserved.   Page copy protected against web site content infringement by Copyscape
"Microsoft", "Windows", "Linux", "Ubuntu", "Apple", "Macintosh" and any other trademarks on this site are the sole property of their respective owners.