Skip to main content

Data Representation

Computer can only store bits. In order to display text or other forms of data, we need a way to use bits to represent them aka "representing data as bits".

Text as Bits

Character data is comprised of symbols and numerals that are not used in calculations. And it is commonly referred to as "text".

Examples:

  • Symbols (e.g., Latin letters, chinese characters...)
  • Numerals (0,1,2,3,...9)
  • Unprintable characters (space, newline, tab)

Encoding

To represent characters as bit strings we need an encoding.

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers.

Requirements:

  • A set of characters to be represented.
  • A length n for bit strings. (e.g., 7-bit ASCII char)
  • A mapping from characters to bit strings (one-to-one relationship).

encoding table

ASCII

ASCII abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. It was designed for teleprinters.

  • Letters are ordered lexicographically.
  • Numbers 0-9 are ordered lexicographically.
  • Related characters are organized in blocks (upper case, lower case, and numbers).
  • Upper case vs. lower case is just 1 bit.
  • Requires 7 bits for each character. The leftmost bit is set to 0.
  • Provides 128 characters (272^7).

ASCII Chart:

ascii chart
ASCII Chart

Encode "the" in ASCII:

characterASCII Code
t0 111 01000\ 111\ 0100
h0 110 10000\ 110\ 1000
e0 110 01010\ 110\ 0101
info

bit 1 is at the rightmost column.

Unicode

👍 Unicode is a modern system of character encoding that supports most writing systems.

  • UTF-32 uses 32 bits for each character.
  • UTF-16 uses one or two 16-bit strings per code point.
  • UTF-8 ues between one and four 8-bit strings per code point. It is backwards compatible with 7-bit ASCII.
info

Use "🪟+ ." to type emoji.

Numbers as Bits

How to store numbers as bits? We can use binary numbers. Consider the following example.

note

Example: 10101010

Starting from position 0 (rightmost)

  • 0 has a multiplier of 20=12^0 = 1
  • 1 has a multiplier of 21=22^1 = 2
  • 0 has a multiplier of 22=42^2 = 4
  • 1 has a multiplier of 23=82^3 = 8
  • Total is 10 (in base-10).

Adding Binary Numbers

We can use bit operations to build half adders and full adders to perform additions.

binary math operations - half/full adder

Half Adder

A half adder can be used to add two 1-bit numbers.

half adder

Suppose we have 1-bit binary numbers xˉ\bar{x} and yˉ\bar{y}. We want to compute the result xˉ+yˉ=zˉ\bar{x} + \bar{y} = \bar{z}.

zˉ0=xˉ0yˉ0zˉ1=xˉ1&yˉ1\begin{aligned} \bar{z}_0 & = \bar{x}_0 \oplus \bar{y}_0 \\ \bar{z}_1 & = \bar{x}_1 \& \bar{y}_1 \end{aligned}

Full Adder

The half adder is used to add only two numbers. To overcome this problem, the full adder was developed. The full adder is used to add three 1-bit binary numbers A, B, and carry C. The full adder has three input states and two output states i.e., sum and carry.

-- javatpoint.com

full adder truth table

full adder

In general, for n-bit binary numbers xˉ\bar{x} and yˉ\bar{y}. The sum zˉ\bar{z} is a n+1-bit binary numbers where

zˉj=xˉjyˉjcˉjcˉj+1=(xˉj&yˉj)(xˉj&cˉj)(yˉj&cˉj)\begin{aligned} \bar{z}_j & = \bar{x}_j \oplus \bar{y}_j \oplus \bar{c}_j \\ \bar{c}_{j+1} & = (\bar{x}_j \& \bar{y}_j) | (\bar{x}_j \& \bar{c}_j) | (\bar{y}_j \& \bar{c}_j) \end{aligned}

Modular Arithmetic

What happen when we have a number that is too big? (e.g., exceeds 8-bit limit)

Answer: CPU will perform integer modulo 2n2^n where nn is the number of bits.

EXAMPLE

Suppose we are adding two 8-bit numbers.

10000000(128) + 10000001(129) = 100000001(257, 9-bits)

Drop the leftmost bit to get 8-bits 00000001(1).

This "dorp" action can be achieved by performing a mod 2n2^n on the result. nn is the number of bits.

257 mod 256=1257 \ \textrm{mod} \ 256 = 1

2's Complement

2's complement is a method of signed number representation.

Properties of 2's complement:

  • For nn bits, we can represent from 2n1-2^{n-1} to 2n112^{n-1} -1
    • e.g., 8-bit strings, the range is 128-128~127127
  • If the leftmost bit is 0, then the number is positive. Otherwise, it is negative.
  • Always perform mod 2n\textrm{mod}\ 2^n after each addition.

3-bit 2's complement table:

BitsUnsigned valueTwo's complement value
00000
00111
01022
01133
1004-4
1015-3
1106-2
1117-1

Hexadecimal

Hexadecimal is a base-16 number system. So the numeral in position ii gets multiplier of 16i16^i.

Base-10, Base-2, and Base-16 table:

Decimal (Base 10)Binary (Base 2)Hexadecimal (Base 16)
000110x0
100010x1
200100x2
300110x3
401000x4
501010x5
601100x6
701110x7
810000x8
910010x9
1010100xA
1110110xB
1211000xC
1311010xD
1411100xE
1511110xF

Interpret Hex

Hex to base-2 string, just need to concatenate the converted strings together.

BASE-2

Write 4F as an 8-bit string:

4 -> 0100, F -> 1111. So 4F -> 0100 1111

Hex to base-10 number, the steps are similar to "convert base-2 to base-10"; the multiplier.

BASE-10

Write 4F as a base-10 number:

4 gets a multiplier of 16116^1 because it is in position 1. F(15) gets a multiplier of 16016^0 because it is in position 0.

4×161+15×160=794 \times 16^1 + 15 \times 16^0 = 79

Python

How to format integer in hex?

>>> f"{79:X}"
'4F'
>>> f"{255:X}"
'FF'

Floating Point Numbers

A floating-point number is represented approximately with a fixed number of significant digits (the significand) and scaled using an exponent in some fixed base; the base for the scaling is normally two.

-- wikipedia

floating point number

References