Thursday, June 29, 2006
Say it in Chinese
Expressing binary information by using text characters has always been wasteful. The simplest solutions use octal or hexadecimal representations, and they waste half or more of storage space or transmission bandwidth. In a Unicode environment like that provided by ASP.NET 2.0, the bloat is much worse.
A Unicode environment, however, permits a safe and relatively efficient alternative, using the segment of the 16-bit code points occupied by the "Unihan" characters. Set the high-order two bits of each character to 01, and use the low-order fourteen bits for binary data. Data expressed in this way will populate 16K of the 27K code points assigned by 16-bit Unicode to Chinese characters. Data bit efficiency is 87.5 percent.
A Unicode environment, however, permits a safe and relatively efficient alternative, using the segment of the 16-bit code points occupied by the "Unihan" characters. Set the high-order two bits of each character to 01, and use the low-order fourteen bits for binary data. Data expressed in this way will populate 16K of the 27K code points assigned by 16-bit Unicode to Chinese characters. Data bit efficiency is 87.5 percent.
Subscribe to Posts [Atom]