What is the difference between AI, machine learning, and deep learning?

AI is the broadest term — any system that mimics human intelligence. Machine learning is a subset of AI where systems learn from data instead of being explicitly programmed. Deep learning is a subset of machine learning that uses neural networks with many layers to learn complex patterns.

What does RAG mean in AI?

RAG stands for Retrieval Augmented Generation. It is a technique that gives an AI model access to external documents or databases so it can answer questions with up-to-date, source-backed information rather than relying solely on its training data.

What is an AI hallucination?

An AI hallucination is when a model confidently generates information that is factually wrong or completely fabricated. This happens because language models predict plausible text, not verified facts.

AI Glossary — 85+ AI Terms Explained in Plain English

Q: What is a Large Language Model (LLM)?

An LLM is a neural network trained on massive amounts of text that can generate, summarize, translate, and reason about human language. Examples include GPT-4, Claude, Gemini, and Llama.

A5 terms

AI Agent

An AI system that can take actions autonomously — browsing the web, writing code, calling APIs, managing files — to accomplish a goal with minimal human guidance. Unlike a chatbot that only responds to prompts, an agent plans multi-step workflows and executes them on its own.

Agents are reshaping how knowledge work gets done — from automated research to code generation to customer support triage.

Alignment

The challenge of making AI systems behave in ways that match human values and intentions. Alignment research focuses on ensuring AI does what we actually want, not just what we technically asked for. It spans safety, ethics, and the gap between instruction and intent.

As AI models grow more capable, alignment is the single biggest factor in whether that capability helps or harms society.

API (Application Programming Interface)

A set of rules that lets one software application talk to another. In AI, APIs are how you send prompts to models like GPT or Claude and get responses back — without needing to run the model yourself. Think of it as a waiter taking your order to the kitchen.

APIs are how every AI-powered app — from chatbots to code copilots to image generators — actually connects to AI models.

Attention Mechanism

A technique that lets a neural network focus on the most relevant parts of its input when generating output. Instead of treating every word equally, attention assigns weights so the model can "pay attention" to what matters most for each prediction. It is the core innovation behind transformers.

Attention is why modern AI can understand context across long documents, conversations, and codebases.

AutoML (Automated Machine Learning)

Tools and techniques that automate the process of building machine learning models — from data preprocessing to feature selection to model tuning. AutoML lets non-experts create production-quality models without writing algorithms from scratch.

AutoML democratizes AI by letting business analysts and domain experts build models that used to require a PhD in data science.

B4 terms

Backpropagation

The algorithm that lets neural networks learn from their mistakes. After making a prediction, the network calculates how wrong it was (the loss), then works backward through each layer to adjust its internal weights so the next prediction is better. It is the fundamental training algorithm for all deep learning.

Every AI model you use was trained with backpropagation — understanding it helps you understand why models sometimes fail and how fine-tuning works.

BERT (Bidirectional Encoder Representations from Transformers)

A language model from Google that reads text in both directions (left-to-right and right-to-left) simultaneously, giving it a deeper understanding of context. BERT revolutionized search, sentiment analysis, and question-answering when it launched in 2018.

BERT is still the backbone of most enterprise search systems and NLP pipelines — and the architectural ancestor of today's LLMs.

Bias (in AI)

Systematic errors in AI output caused by skewed training data, flawed assumptions, or unrepresentative samples. If a hiring model is trained mostly on resumes from one demographic, it may unfairly penalize others. Bias can be statistical (a technical modeling issue) or societal (reflecting real-world inequities baked into data).

Understanding AI bias is essential for anyone deploying AI in hiring, lending, healthcare, or criminal justice — it is both a moral and legal responsibility.

Batch Size

The number of training examples a model processes before updating its internal weights. A batch size of 32 means the model looks at 32 examples, calculates the average error, and then adjusts. Larger batches are more stable but use more memory; smaller batches learn noisier but can generalize better.

Batch size is one of the first knobs you turn when training or fine-tuning a model — it directly affects speed, memory usage, and model quality.

C7 terms

Chain-of-Thought (CoT)

A prompting technique where you ask the AI to show its reasoning step-by-step before giving a final answer. Instead of jumping to a conclusion, the model "thinks out loud," which dramatically improves accuracy on math, logic, and complex reasoning tasks.

Adding "think step by step" to your prompts is one of the simplest ways to get better results from any AI model.

ChatGPT

OpenAI's conversational AI product, built on the GPT family of large language models. Launched in November 2022, ChatGPT brought AI into the mainstream by giving anyone a simple chat interface to interact with a powerful language model. It can write, code, analyze, summarize, and brainstorm.

ChatGPT is the product that started the generative AI revolution — understanding it is table stakes for any professional in 2026.

Classification

A type of machine learning task where the model assigns input data to predefined categories. Is this email spam or not spam? Is this X-ray showing a tumor or healthy tissue? Classification models output a label (or probability of each label) rather than a continuous number.

Classification powers spam filters, medical diagnosis, fraud detection, sentiment analysis, and most real-world AI applications.

Claude

Anthropic's family of AI assistants, designed with a focus on safety, helpfulness, and honesty. Claude models (Haiku, Sonnet, Opus) offer strong reasoning, long context windows, and careful handling of sensitive topics. Claude is a leading alternative to ChatGPT in both consumer and enterprise use.

Claude is a top-tier choice for enterprise AI, coding assistance, and professional work — knowing its strengths helps you pick the right tool for each task.

Computer Vision

The field of AI that teaches machines to interpret and understand visual information — images, videos, and live camera feeds. Computer vision models can detect objects, recognize faces, read text in photos, and understand scenes with near-human accuracy.

Computer vision drives self-driving cars, manufacturing quality control, medical imaging, and the visual AI features in your phone.

Context Window

The maximum amount of text (measured in tokens) that an AI model can consider at once — both your input and its output combined. A model with a 200K context window can process roughly 150,000 words in a single conversation, while a 4K window limits you to about 3,000 words.

Context window size determines whether you can feed an entire codebase, legal contract, or research paper into the model at once.

Cosine Similarity

A mathematical measure of how similar two vectors are, based on the angle between them rather than their magnitude. In AI, it is the standard way to compare embeddings — two documents about "machine learning" will have vectors pointing in nearly the same direction, yielding a cosine similarity close to 1.

Cosine similarity is the math behind semantic search, recommendation engines, and RAG systems — it is how AI finds "similar" content.

D4 terms

Deep Learning

A subset of machine learning that uses neural networks with many layers (hence "deep") to learn complex patterns from data. Deep learning powers image recognition, speech-to-text, language translation, and generative AI. It thrives on large datasets and GPU computing power.

Deep learning is the engine behind virtually every AI breakthrough since 2012 — from AlexNet to GPT-4 to Stable Diffusion.

Diffusion Model

A type of generative model that creates images (or other data) by learning to reverse a process of gradually adding noise. During training, the model learns how to "denoise" — and at generation time, it starts from pure random noise and iteratively refines it into a coherent image.

Diffusion models power Stable Diffusion, DALL-E, and Midjourney — the technology behind the AI image generation revolution.

Discriminator

One half of a GAN (Generative Adversarial Network). The discriminator's job is to distinguish between real data and fake data produced by the generator. As it gets better at spotting fakes, the generator is forced to produce more realistic output — creating a competitive training loop.

Understanding discriminators helps you understand GANs, deepfake detection, and adversarial AI training.

Distillation

A technique for compressing a large, powerful AI model (the "teacher") into a smaller, faster model (the "student") that retains most of the teacher's performance. The student learns to mimic the teacher's outputs rather than training from scratch on raw data.

Distillation is how companies deploy AI on phones, edge devices, and cost-sensitive applications — making powerful AI portable and affordable.

E5 terms

Embedding

A way of representing text, images, or other data as a list of numbers (a vector) that captures its meaning. Words with similar meanings end up close together in this numerical space — "king" and "queen" are neighbors, while "king" and "toaster" are far apart.

Embeddings are the foundation of semantic search, RAG, recommendation systems, and how AI understands meaning — not just keywords.

Encoder-Decoder

A neural network architecture with two parts: an encoder that reads and compresses input into a representation, and a decoder that generates output from that representation. Originally designed for translation (English in, French out), this architecture influenced the design of transformers.

The encoder-decoder pattern is foundational — it appears in translation, summarization, image captioning, and speech recognition systems.

Epoch

One complete pass through the entire training dataset. If you have 10,000 examples and train for 3 epochs, the model sees every example three times. More epochs can improve learning but risk overfitting — memorizing the training data instead of learning general patterns.

Choosing the right number of epochs is a key decision in model training — too few means underfitting, too many means overfitting.

Evaluation Metrics

Standardized measurements used to assess how well an AI model performs. Common metrics include accuracy (% correct), precision (how many positives were truly positive), recall (how many actual positives were found), and F1 score (the balance between precision and recall).

You cannot improve what you cannot measure — evaluation metrics are how teams decide if a model is ready for production or needs more work.

Explainability (XAI)

The ability to understand and explain why an AI model made a specific decision. Explainability techniques let you peek inside the "black box" to see which features or inputs drove the output. This is critical in regulated industries where decisions must be justified.

In healthcare, finance, and government, regulators increasingly require that AI decisions be explainable — not just accurate.

F3 terms

Few-Shot Learning

A technique where you give an AI model a handful of examples in the prompt to teach it what you want — without any retraining. Show the model 3 examples of how to format a response, and it generalizes the pattern to new inputs. Zero-shot means no examples; few-shot means a few.

Few-shot prompting is one of the most practical skills in prompt engineering — it lets you customize AI output without writing any code.

Fine-Tuning

Taking a pre-trained AI model and training it further on a smaller, specialized dataset to improve performance on a specific task or domain. Instead of training from scratch (which costs millions), you start with what the model already knows and teach it your specific needs.

Fine-tuning is how companies create custom AI models for their industry — turning a general model into a medical, legal, or financial specialist.

Foundation Model

A large AI model trained on broad, diverse data that can be adapted to many downstream tasks. GPT-4, Claude, Llama, and Gemini are all foundation models. They serve as the "foundation" that gets fine-tuned, prompted, or extended for specific applications.

Foundation models have become the platform layer of AI — every AI application in 2026 is built on top of one.

G5 terms

GANs (Generative Adversarial Networks)

A pair of neural networks — a generator and a discriminator — that compete against each other. The generator creates fake data, the discriminator tries to spot the fakes, and both improve through competition. GANs were the dominant image generation technique before diffusion models.

GANs pioneered AI image generation and remain important for deepfake detection, data augmentation, and super-resolution tasks.

GPT (Generative Pre-trained Transformer)

A family of large language models from OpenAI. GPT models are "generative" (they produce text), "pre-trained" (trained on massive data before specialization), and built on the "transformer" architecture. GPT-4 and GPT-4o are the latest versions powering ChatGPT.

GPT models set the benchmark that launched the generative AI industry — understanding GPT is understanding modern AI.

Gradient Descent

The optimization algorithm that neural networks use to learn. It works by calculating the slope (gradient) of the error function and taking small steps "downhill" to reduce error. Imagine being blindfolded on a hill and feeling the slope under your feet to find the valley — that is gradient descent.

Gradient descent is the engine of all neural network training — it is the process by which raw data becomes intelligence.

Grounding

Connecting an AI model's outputs to verifiable, real-world sources of truth — documents, databases, APIs, or live data. Grounding reduces hallucinations by forcing the model to base its answers on actual evidence rather than just its training data.

Grounding is essential for enterprise AI — no company can afford an AI that makes up facts about their products, policies, or customers.

Guardrails

Rules, filters, and safety mechanisms that constrain what an AI model can say or do. Guardrails prevent harmful outputs, keep responses on-topic, enforce formatting rules, and block sensitive information from leaking. They are the "bumper lanes" that keep AI behavior within acceptable bounds.

Every production AI deployment needs guardrails — they are the difference between a demo and a product your legal team will approve.

H3 terms

Hallucination

When an AI confidently generates information that is factually wrong or completely fabricated. The model is not "lying" — it is predicting the most plausible next tokens, and sometimes plausible text is completely false. Hallucinations can include fake citations, invented statistics, or fictional events.

Hallucination is the #1 risk in deploying generative AI — every professional using AI must know how to detect and mitigate it.

HNSW (Hierarchical Navigable Small World)

A graph-based algorithm for extremely fast approximate nearest-neighbor search in high-dimensional vector spaces. When you search a vector database with millions of embeddings, HNSW is the algorithm that finds the closest matches in milliseconds instead of scanning every single vector.

HNSW powers the fast retrieval behind RAG systems, recommendation engines, and semantic search at scale.

Hyperparameter

A setting you choose before training begins — like learning rate, batch size, number of layers, or dropout rate. Unlike regular parameters (which the model learns during training), hyperparameters are the knobs the engineer turns. Getting them right can mean the difference between a mediocre model and a great one.

Hyperparameter tuning is a core skill in ML engineering — small changes can yield big performance improvements.

I2 terms

Inference

The process of using a trained AI model to make predictions on new data. Training is when the model learns; inference is when it applies what it learned. Every time you send a prompt to ChatGPT or Claude and get a response, that is inference happening on a server somewhere.

Inference cost and speed are what determine whether an AI product is economically viable — fast, cheap inference is the holy grail.

In-Context Learning

The ability of large language models to learn new tasks from examples provided directly in the prompt — without any weight updates or retraining. You show the model a few examples, and it picks up the pattern instantly. This is what makes LLMs so flexible compared to traditional ML models.

In-context learning is why you can get an LLM to do almost anything with the right prompt — no coding, no training, just examples.

J1 term

JSON Mode

A feature in AI APIs that forces the model to output valid JSON (JavaScript Object Notation) — structured, machine-readable data instead of free-form text. This is essential when the AI's output needs to be parsed by code, stored in a database, or passed to another system.

JSON mode is how developers integrate AI into real applications — without it, you are parsing free text and hoping for the best.

K2 terms

Knowledge Distillation

A model compression technique where a smaller "student" model is trained to replicate the behavior of a larger "teacher" model. The student learns not just the correct answers but the teacher's confidence levels across all possible answers — capturing nuanced knowledge in a fraction of the size.

Knowledge distillation is how companies get GPT-4-level quality into models small enough to run on a phone or at 1/100th the cost.

k-NN (k-Nearest Neighbors)

One of the simplest machine learning algorithms: to classify a new data point, look at the k closest data points in your training set and go with the majority vote. If 4 out of 5 nearest neighbors are "spam," classify the new email as spam. No training required — just comparison.

k-NN is foundational to understanding how similarity-based AI works, and its principles underpin modern vector search.

L4 terms

LangChain

A popular open-source framework for building applications with large language models. LangChain provides pre-built components for common AI patterns — chaining prompts together, connecting to databases, adding memory to conversations, and building agents. It is the "React of AI apps."

LangChain is the most widely adopted AI app framework — knowing it opens doors to building RAG systems, chatbots, and AI agents.

Large Language Model (LLM)

A neural network trained on massive text datasets that can generate, summarize, translate, and reason about human language. LLMs like GPT-4, Claude, Gemini, and Llama contain billions of parameters and can perform tasks they were never explicitly trained for, simply by following instructions in natural language.

LLMs are the technology behind the AI revolution — understanding how they work is the most valuable technical literacy skill of 2026.

LoRA (Low-Rank Adaptation)

A technique for fine-tuning large models efficiently by only updating a small number of additional parameters instead of the entire model. LoRA "freezes" the original model weights and trains tiny adapter layers, making fine-tuning 10-100x cheaper in compute and memory.

LoRA made fine-tuning accessible to startups and individuals — you can customize a 70B-parameter model on a single GPU.

Loss Function

A mathematical function that measures how wrong a model's predictions are. The model's entire goal during training is to minimize its loss function. Different tasks use different loss functions — cross-entropy for classification, mean squared error for regression, etc.

The loss function is the compass of model training — choosing the right one determines whether your model learns what you actually care about.

M4 terms

Machine Learning (ML)

A subset of AI where systems learn patterns from data instead of being explicitly programmed with rules. Instead of writing "if temperature > 90, then hot," you show the model thousands of examples and it learns the pattern on its own. ML includes supervised learning, unsupervised learning, and reinforcement learning.

Machine learning is the foundational discipline behind all modern AI — every AI product you use is machine learning under the hood.

MCP (Model Context Protocol)

An open protocol developed by Anthropic that standardizes how AI models connect to external tools, data sources, and services. MCP provides a universal interface — like USB for AI — so models can read files, query databases, call APIs, and use tools through a single, consistent protocol.

MCP is becoming the industry standard for connecting AI to the real world — it is how agents interact with your tools and data.

Mixture of Experts (MoE)

A model architecture that routes each input to a subset of specialized "expert" networks rather than running the entire model. A MoE model might have 8 experts but only activate 2 for any given token, giving it the knowledge of a huge model with the compute cost of a small one.

MoE is the architecture behind Mixtral and likely GPT-4 — it is how the industry builds smarter models without proportionally more compute.

Multimodal

An AI model that can process and generate multiple types of data — text, images, audio, video, code — within a single system. GPT-4o, Claude, and Gemini are multimodal: you can show them a photo, ask a question about it, and get a text answer.

Multimodal AI is the future — the most capable models in 2026 can see, read, hear, and reason across all data types simultaneously.

N3 terms

Natural Language Processing (NLP)

The field of AI focused on enabling computers to understand, interpret, and generate human language. NLP covers everything from spell-check and autocomplete to machine translation, sentiment analysis, and the conversational abilities of ChatGPT and Claude.

NLP is the broadest and most commercially impactful subfield of AI — it powers every chatbot, search engine, and voice assistant.

Neural Network

A computing system inspired by the human brain, made up of layers of interconnected nodes (neurons) that process information. Each connection has a weight that gets adjusted during training. Stacking many layers creates a "deep" neural network — the foundation of deep learning and modern AI.

Neural networks are the building block of all modern AI — understanding them helps you understand everything from image recognition to LLMs.

Normalization

A technique for scaling input data or internal network activations to a standard range, which helps models train faster and more stably. Batch normalization and layer normalization are common variants. Without normalization, neural networks can struggle with "exploding" or "vanishing" values during training.

Normalization is a standard practice in every modern neural network — it is one of the quiet techniques that makes deep learning actually work.

O3 terms

OpenAI

The AI research company behind ChatGPT, GPT-4, DALL-E, and Whisper. Founded in 2015 as a nonprofit, OpenAI transitioned to a "capped profit" structure and has become the most commercially prominent AI company. Their products brought generative AI to the mainstream.

OpenAI's products and APIs are the starting point for most AI projects — understanding their ecosystem is essential for any AI practitioner.

Overfitting

When a model memorizes the training data so well that it performs poorly on new, unseen data. An overfitted model is like a student who memorizes the answer key but cannot solve a new problem. It scores perfectly on practice tests but fails the real exam.

Overfitting is the most common failure mode in machine learning — recognizing and preventing it is a core ML skill.

ONNX (Open Neural Network Exchange)

An open format for representing machine learning models that allows you to move models between different frameworks (PyTorch, TensorFlow, etc.) and deploy them anywhere. Think of ONNX as a universal file format — like PDF for AI models.

ONNX prevents framework lock-in and enables deployment to edge devices, mobile, and specialized hardware.

P5 terms

Parameter

A single learnable value inside a neural network — essentially one number that gets adjusted during training. When people say a model has "70 billion parameters," they mean it has 70 billion tunable numbers. More parameters generally means more capacity to learn patterns, but also more compute to train and run.

Parameter count is the most common measure of model size — it directly correlates with cost, capability, and hardware requirements.

Perplexity

A metric that measures how "surprised" a language model is by new text. Lower perplexity means the model predicts the text more confidently and accurately. A model with perplexity of 10 means it is, on average, choosing between 10 equally likely next words.

Perplexity is the standard benchmark for comparing language models — lower is better.

Pre-training

The initial, large-scale training phase where a model learns general knowledge from massive datasets — often trillions of tokens from the internet, books, and code. Pre-training is the expensive part (millions of dollars in compute) that creates a foundation model, which can then be fine-tuned cheaply for specific tasks.

Pre-training is what makes foundation models possible — it is the multi-million-dollar investment that the rest of the industry builds on top of.

Prompt Engineering

The practice of crafting inputs (prompts) to an AI model to get the best possible output. This includes writing clear instructions, providing examples, setting context, specifying output format, and using techniques like chain-of-thought. It is part science, part craft, and part communication skill.

Prompt engineering is the most immediately valuable AI skill for any professional — better prompts mean dramatically better results with zero code.

PyTorch

The most popular open-source framework for building and training deep learning models, created by Meta. PyTorch uses a "define-by-run" approach that feels natural to Python developers, making it the default choice for AI research and increasingly for production deployments.

PyTorch is the lingua franca of AI development — most cutting-edge models and research papers use it.

Q2 terms

Quantization

A technique for reducing model size and speeding up inference by using lower-precision numbers. Instead of storing each parameter as a 32-bit floating point number, you can use 8-bit or even 4-bit integers. This can shrink a model by 4-8x with minimal quality loss.

Quantization is how people run 70B-parameter models on consumer hardware — it makes powerful AI accessible without expensive GPUs.

Query (in Attention)

In the attention mechanism, each token generates three vectors: a Query, a Key, and a Value. The Query represents "what am I looking for?" It gets matched against Keys from other tokens to determine which tokens to pay attention to, and the corresponding Values are combined into the output.

Understanding Q/K/V is the key to understanding how transformers actually process language — it is the core mechanism behind all modern LLMs.

R5 terms

RAG (Retrieval Augmented Generation)

A technique that gives an AI model access to external documents so it can answer questions with up-to-date, source-backed information. RAG works in two steps: first, it searches a knowledge base for relevant documents; then, it feeds those documents to the LLM along with the user's question.

RAG is the #1 technique for building AI that answers questions about YOUR data — it is the most commercially deployed AI pattern in 2026.

Reasoning

An AI model's ability to think through problems logically, break down complex questions, weigh evidence, and arrive at conclusions through multi-step inference. Reasoning capabilities separate simple pattern matching from genuine problem-solving — and have improved dramatically in recent models like o1, Claude, and Gemini.

Reasoning ability is the frontier of AI progress — it is what makes AI useful for complex work like analysis, strategy, and engineering.

Reinforcement Learning (RL)

A type of machine learning where an agent learns by interacting with an environment and receiving rewards or penalties for its actions. Instead of learning from labeled examples, it learns from trial and error — like a dog learning tricks through treats. RLHF (RL from Human Feedback) is how ChatGPT was trained to be helpful.

Reinforcement learning is how AI learns to play games, drive cars, and — through RLHF — behave helpfully in conversations.

ReLU (Rectified Linear Unit)

The most commonly used activation function in neural networks. It simply outputs the input if it is positive, and zero if it is negative: f(x) = max(0, x). Despite its simplicity, ReLU solved a critical training problem (vanishing gradients) and remains the default choice in most architectures.

ReLU is one of those small innovations that made deep learning actually work — it is inside virtually every neural network you use.

Retrieval

The process of finding and fetching relevant information from a large collection of documents or data. In AI, retrieval has evolved from keyword matching to semantic search using embeddings — finding documents that mean similar things, not just contain the same words.

Retrieval is the "R" in RAG — the quality of your retrieval directly determines the quality of your AI's answers.

S5 terms

Semantic Search

Search that understands meaning rather than just matching keywords. If you search for "how to fix a slow computer," semantic search also returns results about "optimizing PC performance" and "speeding up your laptop" — because it understands these mean the same thing, even though the words are different.

Semantic search is replacing keyword search everywhere — from Google to enterprise knowledge bases to e-commerce product discovery.

Self-Attention

A specific type of attention mechanism where each element in a sequence attends to all other elements in the same sequence. In a sentence, each word looks at every other word to understand context. "The bank was steep" uses self-attention to figure out that "bank" means riverbank, not a financial institution.

Self-attention is the defining mechanism of the transformer architecture — it is the secret sauce behind every modern LLM.

Stable Diffusion

An open-source AI image generation model that creates images from text descriptions. Unlike DALL-E (closed-source), Stable Diffusion can be downloaded, modified, and run locally. It sparked an explosion of AI art tools, workflows, and fine-tuned variants for specific styles and use cases.

Stable Diffusion proved that powerful generative AI can be open-source — it created an entire ecosystem of image generation tools and businesses.

Supervised Learning

A type of machine learning where the model learns from labeled examples — input-output pairs where the correct answer is provided. Show the model 10,000 photos labeled "cat" or "dog," and it learns to classify new photos. Most production ML systems use supervised learning.

Supervised learning is the most widely deployed type of ML — if you are building an AI product, you are probably using it.

System Prompt

A hidden instruction given to an AI model before the user's conversation begins, defining its behavior, personality, constraints, and role. The system prompt is like a job description for the AI — it sets the rules the model follows throughout the conversation, even if the user tries to override them.

System prompts are how every AI chatbot, copilot, and agent gets its personality and rules — writing them well is a critical AI skill.

T6 terms

Temperature

A setting that controls how random or creative an AI model's output is. Temperature 0 makes the model always pick the most likely next word (deterministic, factual), while temperature 1+ makes it more willing to pick less likely words (creative, varied, but potentially less accurate).

Temperature is the single easiest setting to adjust for better results — low for facts and code, higher for creative writing and brainstorming.

Token

The basic unit of text that AI models process. A token is roughly 3/4 of a word in English — "hamburger" becomes ["ham", "burger"]. Models read, think in, and generate tokens, not words. When a model has a "128K context window," that means 128,000 tokens, or roughly 96,000 words.

Tokens determine pricing, context limits, and response length — understanding tokens helps you manage AI costs and work within model limits.

Tokenizer

The component that converts raw text into tokens (numbers) that a model can process, and back again. Different models use different tokenizers with different vocabularies — this is why the same sentence might be 15 tokens in GPT-4 but 18 tokens in Claude. The tokenizer determines a model's "alphabet."

Tokenizer differences explain why models handle code, math, and multilingual text differently — and why token counts vary across providers.

Transformer

The neural network architecture behind GPT, Claude, Gemini, and virtually every modern AI model. Introduced in the 2017 paper "Attention Is All You Need," the transformer uses self-attention to process all words in parallel rather than one at a time, enabling massive scaling and superior language understanding.

The transformer is the single most important invention in modern AI — every LLM, image model, and AI product you use is built on it.

Transfer Learning

Using knowledge learned from one task to improve performance on a different but related task. A model trained to recognize cats can transfer that knowledge to recognizing dogs much faster than starting from scratch. In LLMs, the entire pre-train/fine-tune paradigm is transfer learning at massive scale.

Transfer learning is why modern AI is practical — it lets you build specialized models without billions of examples or dollars of compute.

Training Data

The dataset used to teach a machine learning model. Training data quality directly determines model quality — garbage in, garbage out. For LLMs, training data includes trillions of tokens from the internet, books, code, and curated sources. For custom models, it is your labeled, domain-specific examples.

Training data is the most underrated factor in AI quality — a mediocre model with great data often beats a great model with mediocre data.

U2 terms

Underfitting

When a model is too simple to capture the patterns in the data — it performs poorly on both training data and new data. If overfitting is memorizing the answer key, underfitting is not studying at all. Common causes: too few parameters, too little training, or overly aggressive regularization.

Recognizing underfitting tells you your model needs more capacity, more data, or more training time — it is the opposite problem from overfitting.

Unsupervised Learning

A type of machine learning where the model finds patterns in data without labeled examples. Instead of being told "this is a cat," the model discovers clusters and structures on its own. Unsupervised learning powers customer segmentation, anomaly detection, and dimensionality reduction.

Unsupervised learning is essential when you have lots of data but no labels — which describes most real-world business data.

V3 terms

Vector Database

A specialized database that stores data as numerical vectors (embeddings) and retrieves results by similarity rather than exact matching. When you search "documents about company growth," a vector database finds semantically similar content even if those exact words never appear. Popular options include Pinecone, Weaviate, Chroma, and Qdrant.

Vector databases are the backbone of RAG systems — every enterprise AI search and retrieval system runs on one.

Vision Model

An AI model specifically designed to understand and process visual information — images, diagrams, screenshots, documents, and video frames. Vision models can describe images, extract text, identify objects, and answer questions about visual content. Most modern LLMs now include vision capabilities.

Vision models enable AI to work with the visual world — from reading receipts to analyzing medical scans to understanding UI screenshots.

Variational Autoencoder (VAE)

A type of generative model that learns to compress data into a compact "latent space" and then reconstruct it. VAEs can generate new data by sampling from this latent space. They are used as components in diffusion models (like Stable Diffusion) and for data compression, generation, and anomaly detection.

VAEs are a building block inside larger generative systems — understanding them helps you understand how image generation models work under the hood.

W2 terms

Weight

A numerical value in a neural network that determines the strength of the connection between two neurons. During training, weights are adjusted via backpropagation to minimize error. The "knowledge" of a trained model is entirely encoded in its weights — they are what make a model smart.

When people talk about "downloading model weights" or "open-weight models," this is what they mean — the actual learned knowledge of the AI.

Word2Vec

A pioneering algorithm from Google (2013) that represents words as numerical vectors, capturing semantic relationships. Word2Vec famously demonstrated that "king - man + woman = queen" in vector space. While superseded by transformer-based embeddings, Word2Vec proved that meaning could be represented as math.

Word2Vec is where the embedding revolution started — it laid the conceptual groundwork for all modern semantic AI systems.

Z1 term

Zero-Shot Learning

The ability of an AI model to perform a task it has never been explicitly shown examples of. You simply describe the task in natural language — "classify this review as positive or negative" — and the model does it, without a single training example. This is a hallmark capability of large language models.

Zero-shot capability is what makes LLMs feel magical — it is why you can ask Claude or GPT to do almost anything and get a useful answer.

No terms found

Try a different search term or browse by letter above.

The AI Glossary for Real People

No terms found

Learn AI Hands-On, Not Just in a Glossary