Unicode Characters
The Story of Chess
Once Upon a Time
Once upon a time, when Sessa invented chess, he showed this to the king.
Name Your Price
The king was impressed and offered to pay Sessa.
He said, 1 grain of wheat on the first square, 2 on the next…
How Much?
The king’s treasurer told the king that would be too much.
18,446,744,073,709,551,615 grains
18 thousand trillion
Once Upon a Time
My first computer could only display 128 characters.
- A byte is 8 bits
- 128 uses 7 bits
- Extra bit was a parity check
Parity Check?
7 bits of data | Count | 8 bits with parity |
---|---|---|
0000000 | 0 | 0000000 0 |
1010001 | 3 | 1010001 1 |
1101001 | 4 | 1101001 0 |
1111111 | 7 | 1111111 1 |
It got better…
- Parity not too effective.
- Switched to 8 bits
- Now we get 256 characters.
- 128 MORE characters…
What should we display?
- European characters
- ¿Cómo golpear la piñata?
- Góðan daginn⁈
- Or Greek? καλημέρα
Or Graphic symbols and line drawings:
┏━━━┱───┐ ┃ ☻ ┃ ☺ │ ┣━━━╉───┤ ┃ ⚑ ┃ ⚐ │ ┗━━━┹───┘
What about:
- Chinese: 早安
- Japanese: おはよう
- Thai: อรุณสวัสดิ์
- Korean: 안녕하세요
- Bengali: সুপ্রভাত
- Nordic Runes: ᚠᚢᚦᚨᚱᚲ
- Math symbols: ∞ ÷ √
Gets Worse
Some languages write from right-to-left.
- Hebrew: בֹּקֶר טוֹב
- Arabic: صباح الخير
But…
- 8 bits was a byte … a “character”
- Everything would have to change:
- Computer displays
- Programming languages
- All programs
- All files and databases
- English email would suddenly be larger!
Could we do both?
A book report is a just a series of ones and zeros, so how should we interpret this?
How Big is Big?
- Use two bytes: 65,536 characters
- Almost would work
- Japanese: 3,000+
- Traditional Chinese: 10,000+
- Really needs more
But what about Ancient Egyptian?
Need a bit more… 4 bytes would give us 4,294,967,296
We Solved It…Kinda
Unicode-8 (called UTF-8) is pretty good.
- Character from 1 to 4 bytes
- 70,000 characters for Chinese
- First 7 bits are same for old computers