FAQ: Fonts and input methods for ancient, classical, rare/obscure Chinese characters
Q: I'm going to be working on a project involving uncommon
Chinese characters. What font should I use? How can I input,
share and print these characters for publication? Help!
A: Only a few fonts and input methods can even come close to providing everything some of you will need, and you may also need to know about creating your own Chinese characters. I've provided the following info to scholars working on Qing Dynasty legal papers and other 19th century documents, 12th century "vulgar" characters, and even ancient religious texts, all with good results so far.
Chinese fonts with the most characters
You may be surprised to find that the fonts included with your operating system contain all the characters you need! The most recent versions of the PMingLiU and SimSun fonts included with Windows, for example, contain an impressive number of characters in their "ExtB" additions.
Those character sets are turned off in your input methods by default however, so see my FAQ on enabling Unicode CJK Extensions and the Hong Kong Cantonese character set.
Try the fonts that came with your system first. But the older or more literary the source material, the less likely that will be enough, and I can suggest one or two additional fonts. I maintain a list of free and commercial third-party Chinese fonts on this website, including Adobe Source Hans / Google Noto CJK, BabelStone Han, and HanaMin.
To install additional fonts in Windows or most other operating systems, all you need to do is copy the file into your Fonts folder (un-zipping or otherwise uncompressing first if necessary).
Some fonts come with installers that will do all that for you, usually a file ending in ".exe", in which case all you need to do is double click.
You may need Windows Administrator / Linux sudo / etc. privileges to install fonts.
Any new font should immediately show up in your font menus, but if you don't see it then try restarting your application or your entire system.
All of these fonts will work on any desktop OS, and everyone involved in a project must have the same fonts to enable exchange of documents for trasncription, editing, proofing, and printing. I write mostly about Windows, Ubuntu and Android, so if anyone on the team uses a Mac and needs more help, I recommend visiting the resources listed on my Mac OS Chinese support page.
How to input rare Chinese characters
IMEs, IME Pads, and TIPs
For this kind of work, you'll probably find yourself switching between more than one Chinese input method within one or more IMEs (input method editors), including an "IME Pad" or two. In the Microsoft Chinese Traditional IME, you'll most likely be using Unicode, and will need to enable the extended character sets, as I'll explain in the Unicode section below.
For general keyboard entry I suggest you use whatever you are comfortable with, which for most people is a Microsoft Chinese IME for traditional characters set to Hanyu Pinyin or Zhuyin Fuhao (Bopomofo) phonetic input, or one of the character-based input methods like Cangjie (which as of this writing was updated for Windows 7 and earlier as "New Changjie 2010").
After you have the necessary Unicode extensions enabled, if you know the pronunciation of a character, characters in those extensions will display in the candidate list with special colors after you press your <down arrow> and <right arrow> keys:
You should also know about the IME Pad. There is an IME Pad available in the Traditional IME's Tools menu, pictured below, which allows you to look characters up dictionary-style, by radical and stroke order, or with handwriting input.
There is also an IME Pad in the mainland/simplified MSPY IME, and that's worth looking at if you're collaborating with people in the mainland (or sometimes Singapore). In some versions of the MSPY 2010 update distributed in China this IME Pad also includes handwriting input. More on that below. This one opens via an icon on your task bar:
I used to also recommend the Japanese IME Pad, believe it or not, because it is even better for finding all the Chinese characters contained in the latest fonts, but there is a free alternative called BabelMap (see below) that has undergone many years of development since then, and I feel it is now a much better choice. Using the Japanese IME Pad can expose you to problems that may arise due to mixing languages and national encoding standards. More on both of those below.
Then there's handwriting. In addition to the handwriting features in the IME Pads (see the icon at the top left of the traditional character IME Pad screen shot above, and my comment about versions of the simplified character IME Pad), Microsoft also provides the following new handwriting options in Windows Vista, 7, and 8.
The Tablet PC Input Panel is available in Windows Vista and Windows 7 Ultimate or Enterprise versions, and in Windows 7 requires the installation of a Language Pack.
You'll need to enable rare Chinese characters in this keyboard's Tools menu: select "Options", then on the "Ink to text conversion" tab, tell it to "Recognize rarely used Chinese, kanji, or hanja characters when converting handwriting to typed text".
The Touch Keyboard shown below is available in all versions of Windows 8 and 10. I show how to enable "rarely used" characters and other setup details on my Windows 8 Chinese handwriting input and display language packs, and I also show how to dig out the old-style IME Pad. Check the menu above for Windows 10 instructions.
There are third-party handwriting software packages out there as well, but I can't provide any guidance on those.
Select Your Standard
Everyone involved in your project, from transcription to publishing, should be consulted before selecting an encoding standard. Mixing standards can cause serious problems when sharing files or sending your work out for printing.
Unicode is the international standard for how computers store and communicate letters, characters and other symbols in "ones and zeros", as opposed to national standards like ASCII in the US or Shift-JIS in Japan. But Microsoft and most other Chinese IMEs default to the national character encoding standards for one side of the Straits or the other - mainland "GB Code" or Taiwan "Big5" - and unless everyone involved in your project is using the same standard that could cause problems when exchanging files with others, as well as final publication. (It's actually a little more complicated than this deep inside the structure of Windows, as Microsoft uses "Code Pages" containing these character sets, but let's not get distracted.)
In my humble opinion this is the 21st Century, and Unicode is the present and the future. The BabelStone Han font and the excellent BabelMap tool I'm about to introduce below require Unicode, and you can get at most of those characters with common keyboard input methods after you set the Microsoft Taiwan IME to Unicode by going to Tools > Properties, and then in the General tab click the "Character set" button:
Unicode Lookup and Input
Some characters are unlikely to appear in the candidate lists of normal keyboard input methods, and even one or both of the IME Pads I've recommended may also fail to find them for you.
The really rare ones may be hiding in the Unicode CJK (Chinese/Japanese/Korean) Extensions, but there are ways to get most of those into your documents too.
You'll find PDF files of the entire CJK Unicode range on the Unicode Consortium's website, in the rightmost column (shown at left here): http://unicode.org/charts/
Those PDFs may be very useful if you deal with rare characters, and the Unicode Consortium also provides a Unihan Database reverse look-up tool that will allow you to enter a Unicode number or paste in a character to find a wide range of technical and linguistic reference information.
There are also software solutions for looking up and copying or inputting rare characters on your computer. Andrew West, creator of the BabelStone Han font introduced above, has also created BabelMap, a free app for browsing, searching and copying any glyph anywhere in Unicode.
BabelMap offers lookup by number or phonetics, tells you which of your fonts include a particular character, and offers many other great features. You'll find phonetic input in the Tools menu. You can copy and paste what you find there directly into your document. Note that you'll need a font like BabelStone installed to display the rare ones in or outside of this app.
There is also a web version, BabelMap Online, but last time I checked it did not offer phonetic lookup: http://www.babelstone.co.uk/unicode/babelmap.html
The Japanese IME Pad also provides lookup via Unicode range, and although I recommend BabelMap over this one, I do want to drop a screen shot in here for completeness:
Using a Japanese IME may cause problems in sharing documents by adding yet another language and national encoding standard to the mix.
For example, regardless of font you'll need to remind MS Word that you're working in Chinese, not Japanese, by clicking in the status bar at the bottom of your document. See the screen shot on the left.
All this switching around may leave invisible problems underlying the text that may return to plague you later. So, I'd rather you use BabelMap instead.
And then there's input by Unicode number. Last time I checked, these did not include many of the Unicode CJK Extension characters some scholars may require, but you should know about this option.
As you can see from the above examples, each Unicode glyph has a number (for example "4E00" from my screen shot of the Japanese IME Pad). Unicode input is included by default in Microsoft's PRC-developed MSPY IME, and the same thing can also be manually installed in the Taiwan-developed New Phonetic IME as well.
In MSPY, you'll find Unicode input in the Option menu:
For Microsoft Taiwan's IME, you or an IT support tech can follow these instructions posted by a Microsoft Program Manager to add Unicode number input:
When you get into the really rare characters however, you may be able to rely only on the BabelMap application described above (or the Japanese IME, but please keep the issue of mixed languages and encodings in mind).
What if a character is not in any font?
When all else fails, you can create your own Chinese characters for use on your own system. I will probably write this up myself someday, but the information is available on many other websites. Here's a random example that does a good job explaining how to do this in Windows using Eudcedit:
Those instructions are missing a couple of points very important to you, however. One is that you can paste in existing characters to help you get started, via Edit > Copy Character. Another is that you can save your created character from Eudcedit to a Chinese input method, via Edit > TextService Link, as explained in this Chinese-language post:
It is also possible to scan a character from a printed source, and then place the image whereever necessary in your publication, but I hope you don't have to do that either.
Need to know more? So do I. :-)
I get questions about these topics often enough to add the above info to my FAQ section, but your situation may be unique enough to require more thought. Maybe we'll both learn something from that, so please feel free to contact me if I can help with anything else.
« « Back to FAQ index