Characters can also be represented in binary. Characters are usually grouped together in what is known as a character set. A character set is more than just the letters that form a language.
Characters include alphanumeric data (letters and numbers), symbols (*, &, : etc.) and control characters (Shift, Escape etc.).
Each character is represented by a unique binary code.
ASCII Code
Standard ASCII uses 7 bits to represent each character
from 0000000 to 1111111 which can be used to store 128 characters.
The codes 0 – 31 represent control characters which are special non-printing characters
e.g TAB and Return.
Extended ASCII, which was created when IBM designed its PCs, uses 8 bits to represent each character.
from 0000000 to 1111111
which can be used to store 256 characters.
Unicode
The main issue with using ASCII or Extended ASCII is that 128 or 256 characters limit the amount of character sets that can be held. Representing the character sets for several different language structures is not possible in ASCII, there are just not enough available characters.
Unicode was created to overcome this issue. Unicode uses 16 bits to represent each character. This means that Unicode is capable of representing 65,536 different characters and a much wider range of character sets.
- Unicode can represent 65,536 charaters
- Unicode uses 16 bits to represent each character
- Unicode can represent a greater range of character sets than ASCII
- There are adapted forms of the original Unicode standard capable of representing millions of characters
Advantages and Disadvantages
- Due to a much larger character set, additional language characters could be identified
- The first 128 code were identical to the original ASCII codes so compatibility with ASCII was maintained
- Because it was a 16-bit code, each character took double amount of memory to store as ASCII so file sixes increased as did transmission times.