Regular Expressions Guide [2026]: Regex Made Simple

Learn regular expressions from the ground up. Understand regex syntax, common patterns, and how to use them in Python, JavaScript, and more.

Key Takeaways

  • Regex is a pattern language for matching, extracting, and transforming text
  • Most regex tasks use fewer than 10 metacharacters: . * + ? [] {} ^ $ | \
  • Use raw strings in Python (r'pattern') to avoid backslash conflicts
  • Named capture groups make complex patterns readable and maintainable
  • Test regex interactively at regex101.com before adding it to code

Regular expressions look terrifying at first glance. A pattern like ^[\w.+-]+@[\w-]+\.[\w.-]+$ seems like someone fell asleep on a keyboard. But regex has simple rules, and once you know about 10 metacharacters, you can write patterns that would otherwise require 50 lines of string manipulation code. This guide takes you from zero to productive in one read.

Regex Basics: The Core Metacharacters

Literal characters match themselves. cat matches the string "cat" anywhere in the input. Metacharacters have special meaning: . matches any single character (except newline). * matches 0 or more of the preceding element. + matches 1 or more. ? matches 0 or 1 (makes the preceding element optional). {n} matches exactly n times. {n,m} matches between n and m times. ^ anchors to start of string. $ anchors to end. | means OR. \ escapes metacharacters (so \. matches a literal dot, not any character).

Character Classes and Shorthand

Square brackets define a set of characters to match. [aeiou] matches any vowel. [a-z] matches any lowercase letter. [0-9] matches any digit. [^abc] matches any character NOT in the set. Shorthand classes: \d matches any digit (equivalent to [0-9]). \w matches word characters (letters, digits, underscore). \s matches whitespace (space, tab, newline). Uppercase versions invert: \D matches non-digits, \W matches non-word characters, \S matches non-whitespace. These shorthand classes are the workhorses of most regex patterns.

Groups and Capture: Extract What You Need

Parentheses create groups. (\d{4})-(\d{2})-(\d{2}) matches a date like 2026-04-10 and captures year, month, and day separately. In Python: match.group(1) returns the first captured group. Named groups are cleaner for complex patterns:

import re
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
match = re.search(pattern, '2026-04-10')
if match:
    print(match.group('year'))   # 2026
    print(match.group('month'))  # 04

Non-capturing groups (?:pattern) group without capturing — useful for applying quantifiers without storing the match.

Regex in Python: re Module Essentials

The Python re module has five functions you'll use constantly: re.search(pattern, string) — find first match anywhere in string. re.match(pattern, string) — match only at the start. re.findall(pattern, string) — return list of all matches. re.sub(pattern, replacement, string) — replace matches. re.compile(pattern) — compile pattern for reuse (faster when using same pattern many times). Always use raw strings: r'\d+' not '\\d+'. The re.IGNORECASE flag makes matching case-insensitive. re.MULTILINE makes ^ and $ match line starts/ends, not just string start/end.

Common Regex Patterns You Can Use Right Now

Email (simplified): [\w.+-]+@[\w-]+\.[\w.]+. US phone number: \(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}. URL: https?://[\w./-]+. IP address: \d{1,3}(\.\d{1,3}){3}. Hashtag: #[\w]+. HTML tag: <[^>]+>. Credit card (16 digits): \d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}. Don't use these blindly in production — edge cases exist. But they're solid starting points for data cleaning and text extraction tasks.

Lookahead and Lookbehind: Advanced Matching

Lookaheads and lookbehinds let you match based on context without including that context in the match. Positive lookahead (?=pattern): match if followed by pattern. Negative lookahead (?!pattern): match if NOT followed by pattern. Positive lookbehind (?<=pattern): match if preceded by pattern. Example: \d+(?= dollars) matches numbers followed by ' dollars' — useful for extracting prices. (?<=\$)\d+ matches digits preceded by a dollar sign. These are powerful for data extraction from semi-structured text.

Frequently Asked Questions

Is regex the same in Python and JavaScript?
Mostly yes — the core syntax is the same. Key differences: Python uses re module functions while JavaScript uses string methods like .match() and .replace(). Python raw strings (r'') handle backslashes cleanly. JavaScript regex is written as /pattern/flags literals. Named groups work in both but with slightly different syntax.
When should I use regex vs string methods?
Use string methods (split, replace, startswith, etc.) for simple, fixed patterns. Use regex when patterns are variable, complex, or need to match multiple possible formats. If your string logic needs more than 3-4 chained method calls, regex is probably cleaner.
How do I test my regex?
Use regex101.com — it shows matches in real time, explains what each part of your pattern does, and lets you test against multiple inputs. It also generates the re.search() code for Python automatically.
What does greedy vs lazy matching mean?
By default, quantifiers are greedy — they match as much as possible. Add ? after a quantifier to make it lazy (match as little as possible). For example, .* matches the longest possible string, while .*? matches the shortest. This matters when extracting content between HTML tags.

Ready to Level Up Your Skills?

From regex to full Python data science skills — our bootcamp covers text processing, machine learning, data pipelines, and AI tools in 3 intensive days. Next cohorts October 2026 in 5 cities. Only $1,490.

View Bootcamp Details

About the Author

Bo Peng is an AI Instructor and Founder of Precision AI Academy. He has trained 400+ professionals in AI, machine learning, and cloud technologies. His bootcamps run in Denver, NYC, Dallas, LA, and Chicago.