About Pinyin Input (And Other Methods)


How Do You Type Chinese?

I am often asked, usually by people who don't know an East Asian language, how it is possible to type Chinese characters on a standard Western keyboard even though there are thousands and thousands of characters to choose from.

They are astounded to learn that it is not only possible to type Chinese on a normal keyboard without adding keys, but also that the way we do this has for a very long time been much more advanced than our methods for typing alphabet-based languages. This is more than just "typing". We call it "input".

Input Methods and Candidate Lists

Chinese is entered on a keyboard using what we call "input methods". These are based on commonly known phonetics like Hanyu Pinyin, or on specialized keyboard overlays of character components that take a little more time to learn. Many systems are capable of handwriting recognition, but that is much slower, and most Chinese people will use that only as a last resort for just a few characters. Speech recognition also has its limits. Keyboard input methods are much faster.

Windows 7 Chinese Pinyin input method example

Phonetic input methods turn what at first glance looks a problem into an elegant solution. Many Chinese characters sound the same. Mandarin Chinese probably has the fewest sounds, only about 400, but there are thousands of characters. You'll need to know 3,000 to 5,000 to read newspapers, and at a scholarly level there are over 70,000! This means many characters sound the same, and that is why Chinese "words", spoken or written, are usually two characters or more. Phonetic input operates on this principal too, predicting what you are trying to enter, and requiring only a knowledge of phonetics like Hanyu Pinyin, Zhuyin Fuhao (Bopomofo), or Cantonese Jyutping. This works surprisingly fast!

Character-based input methods require knowledge of character structure, and memorization of a system that assigns character components to the keys of a standard keyboard. These range from five basic strokes on only five keys, to more complex arrangements of radicals and other components on the entire keyboard. Like phonetic input, they are designed to get out ahead of you and anticipate what you are typing. These input methods require more skill, but there is a payoff: faster and more accurate input, with less time spent looking at candidate lists. Wubihua, Wubu Zixing, and Cangjie are the most well known.

An "Input Method Editor" (IME) enables you to use input methods in any application. Above you see IMEs for four different regions of the world available in my Windows taskbar. An IME runs on your system in the background as you type. Some will start guessing what character you want after your very first keypress, while others will wait until you press the space bar. An IME will also attempt to predict what the next character will be. If it can, it will even suggest entire phrases. The more context you give it, the better it can perform, and over time most IMEs will sort predictions according to your frequency of usage.

These predictions appear in a "candidate list" that usually displays near where you are typing. Some IMEs will present the list and update it in real time as you type, while others require you to hit a key to make the list display. When you see the character(s) you want, just select them and they will pop into your document. Want to learn more? Users of all operating systems may benefit from these FAQ pages:

Later in this article I'll offer a summary of the Chinese input methods available today. But first, more fun facts and some amazing (and totally geeky) history!


The Fastest Keyboards in the East (West, South, and North)

Although most English speakers would be very happy to achieve the average touch-typing speed of about 40 words per minute in their own language, it is possible to easily achieve speeds of 50 to 100 Chinese characters per minute just by knowing the pronunciation. With input methods based on character components and strokes, especially expert-level methods like Cangjie and Wubi Zixing, speeds as high as 200 characters per minute are possible!

Wubi 86 keyboard layout - via Wikimedia Commons - click for information page

Basically, this is an Autocorrect that works, and works so well that I have never found the need to learn anything other than Pinyin input. And much of the foundation for these methods has been around for much longer than you may expect. How this amazing idea came to be is an interesting and surprising story, one that began over 100 years ago.


The First Candidate Lists: Chinese Typewriter Trays

In the 19th century, Western inventors introduced practical keyboard-based printing solutions for their own languages. These replaced movable type — individual metal blocks for each letter or character, arranged by hand for each page — with desktop typewriters and large hot metal typesetting machines. In response, many inventors threw themselves into the dream of developing a solution for Chinese characters as well. The question was how to do this without creating a machine with over 50,000 keys!

Double Pigeon brand Chinese typewriter - via Wikimedia Commons - click for information pageAs Stanford history professor Tom Mullaney explains in a recent Google tech talk open new site in new window and in his forthcoming book, The Chinese Typewriter: A Global History of the Information Age (MIT Press, 2017), beginning in the 1890s many Chinese, Japanese, and Western inventors worked on Hanzi typewriter designs. Most resulted in what look to me like the hybrid children of a Western typewriter and a typesetter's font of movable type.

By the mid-20th century, the most common Chinese typewriter looked like the Double Pigeon (双鸽) model shown here. It has a desktop-sized tray containing up to 2,450 characters on movable type blocks. Other characters are swapped in as needed. The tray moves right-and-left and the carriage moves freely forward, back, and side-to-side, until a desired character is under the mechanism where it can be stamped onto a paper page. (Want to see one up close? A recent successful Kickstarter campaign open new site in new window will send an exhibition around the world. )

Mullaney says that in those trays we find a predecessor to the "predictive text" we now take for granted in our Chinese candidate lists and alphabetic Autocorrect today. Originally arranged according to a traditional dictionary (by 214 radicals, and the number of strokes after each radical), over time typists grouped characters together in more and more efficient ways. Looking at the later arrangements you can almost see today's candidate lists. To me they look like strangely completed crossword puzzles.

This had a direct effect on typing speed, with at least one fellow recognized in 1956 by the PRC government for achieving 80 words per minute thanks to his tray configuration. Machines like this were in use well into the late 20th century, with each tray organized to meet the unique needs of their owners, just as a digital candidate list will now sort itself according to your frequency of use.


The First Keyboard Input Methods: From Typewriters to Mainframes

Lin Yutang Ming Kwai Chinese typewriter patent drawing - via - click for informationMeanwhile others were at work on the riddle of keyboard input. One of them was the famous author and all-around Renaissance man Lin Yutang.

In 1946 Lin invented the Ming Kwai (明快) Chinese typewriter. It was similar in size and shape to a Western machine, but with keys for a character component/stroke input method based on the structure of a Chinese dictionary he had written. On my office wall I have a framed copy of this drawing from Lin Yutang's 1953 US patent. open new site in new window Alas, only one prototype of Lin's typewriter was ever produced.

Other ideas included an IBM Chinese typewriter open new site in new window with a specially designed keyboard of numbers only. This machine required memorization of unique numbers for over 5,000 characters. The numbers were based on radical/stroke dictionary order, as in Chinese Morse Code. (I once knew a Chinese World War II veteran who had mastered Chinese telegraphy at a very high speed. Apparently the prospect of otherwise being sent to the front line focused his attention.)

Unfortunately for all these ideas — but fortunately for us — we were now entering the postwar computer era. As Mullaney points out, just a few years later in the 1950s an MIT scientist independently convened a research team open new site in new window to reinvent stroke-based input for mainframe computers. Chinese and other East Asian universities and companies were of course on the same track.

Mullaney argues that input method software developed since then has not replaced the original Chinese typewriters with something entirely new, but rather consciously digitized the innovations of the Chinese typewriter era and then built upon that foundation. More insights are available with his extensive collection, which will soon be touring the world thanks to a "Save the Chinese Typewriter" Kickstarter campaign. open new site in new window


A Trip Down (640K) Memory Lane

If this is getting "tl;dr" feel free to skip down to the last section for information on the state of the art today. But the rest of the 20th century was an exciting time in the history of Chinese input methods and applications, and I feel the need to sit here on the porch and jaw on for a spell.

TianMa poster - circa 1985 - personal collectionIn the 1980s I was privileged to help launch and support one of the first world-class Chinese input method and word processing software packages, TianMa for MS-DOS. (Also for Wang minicomputers. Remember them?)

TianMa Pinyin input worked so well it seemed like magic, especially at a time when personal computers were still a new thing. But because phonetic input methods — and Chinese computing in general — were so new, I often had to explain why it was better than a Chinese typewriter or typesetting. And because even most native Chinese speakers were familiar only with handwriting, I often had to start off with an explanation of how a Chinese typewriter worked.

That often involved much pantomine and sound effects. Sometimes I think I missed my calling.

This being the DOS days, TianMa offered only amber or green bitmapped text on a black background. The Pinyin to character conversion was done inline, with no candidate list unless you went back to change an incorrect character. You typed Pinyin — the more the better, so it could look for words and phrases — and then hit a function button to convert to characters.

It was very fast, and though it often resulted in many typos lost in that sea of converted text, the whole experience was amazing for most of us. It even converted between Simplified and Traditional, which of course required manual correction (and still does). We loved it.


Multilingual Computing for the Masses ( the Office, Mostly)

Daisy-chained printer port dongles - via Wikimedia Commons - click for information pageBefore programs like TianMa, Chinese language software usually required a separate Chinese operating system, or at least a Chinese "environment" that would take over your system and enable Chinese language. But TianMa and its competitors were normal applications. Multilingual computing for the masses! Or was it?

TianMa sold for several hundred US dollars, and could not be shared. Because of the demands this system placed on our 8-bit processors, floppy disk drives, and maximum 640K RAM, and because software piracy threatened the viability of the entire venture from the start, each installation required a circuit board plugged into a slot inside your desktop computer. This was even worse than the anti-piracy printer port "dongles" required by many competitors.

TianMa flew off into obscurity long ago, along with other commercial software packages from FeiMa and Brushwriter to TwinBridge and Chinese Star, not to mention the standalone Stone (四通) word processor / electronic typewriter. There is at least one Chinese word processing and input method software product still available for sale, NJ Star, and it offers some excellent features. But today there are free Chinese and international office suites from Kingsoft and OpenOffice, and when looking for an IME users can assume those are now given away too. Like the much of the music business was for a while, for better or for worse Chinese apps and IMEs want to be free.


Today's Keyboard Input Methods

Innovations in Chinese typewriter trays and keyboards, followed by many years of development on mainframes, minicomputers, and personal computers, resulted in the input methods available for our computing and communication devices today. Software has made phonetic input possible, and also enabled highly sophisticated keyboard overlays based on character components and strokes.

Sogou Pinyin

For Simplified characters, Microsoft Pinyin (MSPY) is of course the default IME in Windows 10, Windows 7, and other releases, and basic Pinyin input is a staple of Linux, Android, Macs, and iOS as well. But the most popular input method in mainland China is Sogou Pinyin, a free add-on for Windows, Linux, and mobiles. Competitors Baidu, QQ, and others are hot on Sogou's heels with their own IMEs, especially in the booming mobile market. Those free add-ons come with offers of e-commerce and cloud services that provide the developers some payback. Google has also released the free Google Pinyin IME which, at least in its initial release, owed a great deal of inspiration to Sogou.

Character-based input methods popular in the mainland range from the basic five-key five-stroke Wubihua, to the sophisticated full-keyboard Wubi Zixing. Most of the providers mentioned above provide offer one or more variants of these IMEs as well, and of course some are included with Windows, Linux, Android, and other operating systems. You'll find more information on many of the above-mentioned optional IMEs on the pages here for Windows third-party add-ins, Wubi FAQ, Ubuntu, and major Android input methods.

For Traditional characters, some of the earliest and best examples of character-based and phonetic input are available. This market is more dependent upon IMEs developed by and for Microsoft, Google, Apple, and Linux projects, and there are far fewer third-party choices available. This may be because the entire Traditional character market is similar to Scandinavia in user population and economic clout. Certainly not to be ignored, but not as massive as the Simplified character region, which ranks alongside the global English and Spanish markets.

Phonetic IMEs for Traditional Chinese include Microsoft Bopomofo (based on the earlier Phonetic and New Phonetic), which supports Hanyu Pinyin, Tongyong Pinyin, and multiple Zhuyin Fuhao keyboard layouts, as well as the Cantonese Phonetic IME (CPIME) included with Windows 10 and available for separate download in several phonetic variants for several versions of Windows. The Chewing IME for Linux offers features similar to the MS Bopomofo (New Phonetic) in the Ibus and fcitx frameworks for Ubuntu and other distros.

Character-based IMEs for Traditional characters include Cangjie (often spelled "Changjie"), Quick (Jianyi, a simplified form of Cangjie popular in Hong Kong), and Array, each with a strong userbase because they were some of the best and earliest computer input methods for personal computers. You can set these up by following my instructions for a phonetic IME but then selecting one of these keyboards instead.


And that's the short version of the Chinese input story so far! I hope you've enjoyed the tale. Want more details? As I mentioned above, users of all operating systems may benefit from the following pages:

