Day 2 of 5
⏱ ~60 minutes
How CPUs Work in 5 Days — Day 2

The Arithmetic Logic Unit

The ALU is the heart of the CPU — it performs every arithmetic and logical operation. Today you'll build one from the logic gates you learned yesterday.

What the ALU Does

The ALU takes two N-bit inputs (A and B) and a 3–4 bit operation code (opcode), and produces an N-bit result plus status flags. Operations: ADD, SUB, AND, OR, XOR, NOT, shift left, shift right, compare. Status flags — Zero (result=0), Negative (result<0), Carry (unsigned overflow), Overflow (signed overflow) — feed into branch instructions, enabling conditional logic.

Subtraction via Two's Complement

Subtraction is addition of the negative. In two's complement, negate a number by flipping all bits and adding 1. So A−B = A + (¬B + 1). The ALU uses a single adder circuit for both addition and subtraction: pass B through inverters controlled by a SUB signal, and set carry-in to 1. No separate subtractor needed — elegant hardware reuse.

Status Flags and Conditional Branching

The ALU sets status flags after every operation. The Zero flag (ZF) enables if(a == b). The Carry flag (CF) enables multi-word arithmetic. The Overflow flag (OF) detects signed integer overflow. The Sign flag (SF) enables if(a < 0). Conditional jump instructions (JE, JNE, JL, JG) read these flags — this is how every branch in your code works at the hardware level.

python
# Simulate a simple 8-bit ALU

def to_bits(n, width=8):
    return [(n >> i) & 1 for i in range(width)]

def from_bits(bits):
    return sum(b << i for i,b in enumerate(bits))

def add_bits(a_bits, b_bits, cin=0):
    result, carry = [], cin
    for a,b in zip(a_bits, b_bits):
        s = a ^ b ^ carry
        carry = (a&b) | (a&carry) | (b&carry)
        result.append(s)
    return result, carry

def alu(a, b, op):
    """8-bit ALU. op: 'ADD','SUB','AND','OR','XOR','SHL','SHR'"""
    a8, b8 = to_bits(a & 0xFF), to_bits(b & 0xFF)
    
    if op == 'ADD':
        r, carry = add_bits(a8, b8)
    elif op == 'SUB':
        b_neg = [1-x for x in b8]  # flip bits
        r, carry = add_bits(a8, b_neg, cin=1)  # add 1
    elif op == 'AND': r, carry = [x&y for x,y in zip(a8,b8)], 0
    elif op == 'OR':  r, carry = [x|y for x,y in zip(a8,b8)], 0
    elif op == 'XOR': r, carry = [x^y for x,y in zip(a8,b8)], 0
    elif op == 'SHL': r, carry = [0]+a8[:7], a8[7]
    elif op == 'SHR': r, carry = a8[1:]+[0], a8[0]
    else: raise ValueError(f'Unknown op: {op}')
    
    result = from_bits(r)
    zero_flag = (result == 0)
    sign_flag = r[-1]  # MSB
    overflow  = carry
    print(f"{op}({a}, {b}) = {result}  Z={int(zero_flag)} S={sign_flag} C={carry}")
    return result

alu(42,  58,  'ADD')  # 100
alu(100, 58,  'SUB')  # 42
alu(0b11001100, 0b10101010, 'AND')  # bit masking
alu(5, 1, 'SHL')  # multiply by 2 = 10
💡
Right-shifting by N is integer division by 2^N. Left-shifting by N is multiplication by 2^N. CPUs use shifts instead of multiplies/divides for powers of two — they're single-cycle operations vs multi-cycle multiply.
📝 Day 2 Exercise
Extend the ALU with Comparison
  1. Add a 'CMP' operation that subtracts B from A but discards the result, only returning flags.
  2. Use CMP + flags to implement: def greater_than(a, b): ... without using Python's > operator.
  3. Add 'NEG' (negate: two's complement of A). Verify: NEG(NEG(5)) == 5 for all 8-bit values.
  4. Implement 8-bit multiplication using repeated addition and shifting. Count how many additions 7×13 requires.
  5. Research why hardware multipliers (Wallace trees) are faster. What's the cycle count for a hardware 64-bit multiply on a modern Intel CPU?

Day 2 Summary

  • The ALU performs ADD, SUB, AND, OR, XOR, shift, and compare — all in hardware
  • Subtraction is addition in two's complement: flip bits, add 1 — no separate subtractor needed
  • Status flags (Zero, Carry, Sign, Overflow) drive all conditional branch instructions
  • Shifts are single-cycle multiply/divide by powers of two — compilers use them automatically
Challenge

Implement a 16-bit ALU that handles signed integer overflow detection correctly. Test with 32767+1 (should set overflow flag). Then research how floating-point arithmetic differs: the CPU uses a separate FPU (Floating Point Unit) with IEEE 754 representation. Why can't the integer ALU handle 0.1 + 0.2?

Finished this lesson?