Linear Algebra for ML: Vectors, Matrices, and AI Explained

Key Takeaways

  • Every neural network layer is a matrix multiplication — linear algebra is ML's operating language
  • Vectors represent data points; matrices represent datasets and transformations
  • Dot product measures similarity — the foundation of attention mechanisms and cosine similarity in embeddings
  • Eigenvectors of the covariance matrix are the principal components in PCA
  • SVD decomposes any matrix — the basis for recommender systems, NLP embeddings, and compression

Machine Learning Is Linear Algebra at Scale

You don't need to prove theorems to do machine learning. But you do need to understand what's happening when you write model.fit(X, y). The answer is linear algebra, applied millions of times per second on GPU hardware designed specifically for it.

When you train a neural network, the forward pass multiplies weight matrices by input vectors. Backpropagation computes gradients using matrix operations (Jacobians). Attention in transformer models is a scaled dot-product of query and key matrices. Embeddings are vectors in high-dimensional space. All of this is linear algebra.

Vectors: Data Points in Space

A vector is an ordered list of numbers: v = [3, -1, 4, 1, 5]. In ML, vectors represent data points. An image with 784 pixels is a 784-dimensional vector. A word embedding (like Word2Vec or OpenAI's ada-002) is a 1536-dimensional vector. A user in a recommendation system is a vector in some latent feature space.

Key vector operations:

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Addition: [5, 7, 9]
print(a + b)

# Scalar multiplication: [2, 4, 6]
print(2 * a)

# Dot product: 1*4 + 2*5 + 3*6 = 32
print(np.dot(a, b))

# Magnitude (L2 norm): sqrt(1² + 2² + 3²)
print(np.linalg.norm(a))  # 3.742

# Cosine similarity (measures angle between vectors)
cos_sim = np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
print(cos_sim)  # 0.974 — very similar direction

The dot product and cosine similarity are why embedding-based search works. When you search for "king - man + woman" in word embeddings, you're doing vector arithmetic. When you find similar documents, you compute cosine similarity between their embedding vectors.

Matrices: Datasets and Transformations

A matrix is a 2D array of numbers. In ML: a dataset with n samples and d features is an n×d matrix X. The weight matrix of a neural network layer is an h×d matrix W (h=hidden units, d=input dimensions).

# A dataset: 4 samples, 3 features each
X = np.array([
    [1.2, 0.5, 3.1],
    [2.3, 1.1, 0.8],
    [0.9, 2.2, 4.5],
    [3.1, 0.3, 1.2]
])  # shape: (4, 3)

# Weight matrix for a layer with 2 neurons
W = np.array([
    [0.5, -0.3, 0.8],
    [1.2,  0.6, -0.4]
])  # shape: (2, 3)

# Forward pass: output = X @ W.T
output = X @ W.T   # shape: (4, 2) — 4 samples, 2 neuron outputs
print(output.shape)  # (4, 2)

Matrix Multiplication Is the Core of Neural Networks

A neural network layer with weights W, input x, and bias b computes: output = activation(W @ x + b). This is just matrix multiplication plus a pointwise non-linearity.

For a batch of inputs (processing multiple samples simultaneously): W is (output_size × input_size), batch is (input_size × batch_size), output is (output_size × batch_size). GPUs are optimized specifically for this operation — thousands of multiply-accumulate operations in parallel.

# Two-layer neural network forward pass
def forward(X, W1, b1, W2, b2):
    # Layer 1
    z1 = X @ W1.T + b1        # Linear transformation
    a1 = np.maximum(0, z1)     # ReLU activation

    # Layer 2
    z2 = a1 @ W2.T + b2       # Linear transformation
    output = 1 / (1 + np.exp(-z2))  # Sigmoid for classification

    return output

Linear Transformations: What Matrices Do to Space

A matrix represents a linear transformation — it maps vectors from one space to another. Understanding what transformations matrices represent makes neural networks less mysterious.

The idea that "neural networks learn useful representations" means: each layer applies a learned transformation that makes the data easier to classify. Early layers learn simple features (edges in images); later layers combine these into complex concepts (faces, objects).

Eigenvalues and Eigenvectors: Principal Directions

For a square matrix A, an eigenvector v satisfies: Av = λv. Matrix A times v just scales v — it doesn't change direction. λ is the eigenvalue.

A = np.array([[3, 1], [0, 2]])

eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues)       # [3. 2.]
print("Eigenvectors:\n", eigenvectors)   # columns are eigenvectors

PCA (Principal Component Analysis) uses eigenvalues and eigenvectors to reduce dimensionality:

  1. Center the data (subtract mean)
  2. Compute the covariance matrix (d×d matrix capturing how features vary together)
  3. Find eigenvectors of the covariance matrix — these are the principal components (directions of maximum variance)
  4. Project data onto the top k eigenvectors to reduce from d dimensions to k dimensions

The eigenvalue tells you how much variance each component captures. Sort by eigenvalue descending — keep the top k to retain the most information.

SVD: The Swiss Army Knife of Linear Algebra

Singular Value Decomposition (SVD) decomposes any matrix M into three matrices: M = U Σ Vᵀ where U and V are orthogonal matrices, and Σ is a diagonal matrix of singular values (non-negative, in descending order).

SVD applications in ML:

Learn the Math Behind AI at Precision AI Academy

Our bootcamp bridges linear algebra, calculus, and statistics directly to hands-on ML projects — so the math makes sense in context. Five cities, October 2026.

$1,490 · October 2026 · Denver, LA, NYC, Chicago, Dallas
Reserve Your Seat

Frequently Asked Questions

Why does machine learning need linear algebra?

Every neural network layer is a matrix multiplication. Datasets are matrices. Training is matrix operations at scale. Attention, embeddings, PCA, SVD — all linear algebra. You can't deeply understand ML without it.

What is matrix multiplication and why is it central to deep learning?

Matrix multiply combines two matrices. In neural nets, each layer computes output = W @ x + b. With batched inputs, you process all samples in parallel — this is why GPU hardware (optimized for matmul) is essential for training.

What are eigenvectors and eigenvalues, and where are they used in ML?

Eigenvectors are directions a matrix doesn't rotate — only scales. Used in PCA (eigenvectors of covariance matrix = principal components), PageRank (dominant eigenvector of web graph), and spectral clustering.

BP
Bo Peng

Founder of Precision AI Academy. Software engineer and AI practitioner who builds and deploys ML systems. Teaches AI mathematics to working professionals without the jargon.