Developer Utilities

When Regex Is the Wrong Tool

Regex is powerful and dangerous. Here is when it is the right tool, when it hangs your server, and what ECMAScript 2018+ changed that most advice still ignores.

ToolerasMay 9, 202627 min read5,945 words

Here's a composite of a scenario that plays out about four times a year in real production systems. A team ships a route-matching regex that takes URL path parameters and normalizes them before the handler runs. The pattern is short. It reads cleanly. It passes code review. It passes unit tests on the usual paths. It runs in production for months without anyone thinking about it twice. Then someone submits a URL the shape of which the authors never tested — a string of fifty repeating separator characters, or a parameter chain with the wrong delimiter — and the regex doesn't throw, doesn't crash. It just starts thinking. Three seconds of CPU become thirty, then three hundred. One request pegs a core. The load balancer routes the next few to other servers. Those CPUs go next. Eighteen minutes later the on-call engineer has pieced together enough to kill traffic and revert the last deploy. The regex hasn't bugged. It's done exactly what it said. The developer just never considered that backtracking is exponential when the pattern and the input collaborate on it.

That exact shape is how path-to-regexp took down Node services in 2024 (CVE-2024-45296, CVE-2024-52798, and CVE-2026-4867 in the follow-on). Koa had one on its X-Forwarded-Proto header parser (CVE-2025-25200). Huggingface Transformers had two in the same module — one in the weight-decay filter, one in the docstring preprocessor (CVE-2025-6921, CVE-2025-2099). CPython's tarfile module had a ReDoS in its PAX header parser (CVE-2024-6232). The ssh2 library had one in its key parser (CVE-2025-70034). Each is a pattern that looked perfectly reasonable until an attacker found the adversarial input. None of the authors were incompetent.

The argument of this post is that regex is one of the few tools in programming where knowing when not to use it is more valuable than knowing how. That's not because regex is bad — for the problems it fits, nothing beats it on terseness or speed. It's because the distance between "regex that works" and "regex that hangs your server on adversarial input" is smaller than most of us were taught. This post covers the real shape of catastrophic backtracking, the 2024-2026 ReDoS CVEs worth learning from, seven cases where regex is the wrong tool and what to use instead, the cases where it's still the best, and the ECMAScript 2018+ features that most advice on the web hasn't caught up to. If you just need to test a pattern right now, paste it into our regex tester. If you want to know what you're actually doing, read on.

What regex actually is, in sixty seconds

Skip this section if you've used regex for longer than a year. If you haven't, here's the grounding: a regular expression is a pattern that describes a set of strings. The regex engine takes your pattern and an input string, and either tells you the input matches (and optionally where and what groups were captured) or tells you it doesn't.

The part most tutorials skip is how the engine does the matching, which matters because two engines with the same pattern can produce different runtime behavior on the same input. There are two main families.

Backtracking engines turn the pattern into a nondeterministic finite automaton (NFA) and walk it depth-first, keeping a stack of states they can back up to if the current path fails. PCRE, ECMAScript's JavaScript engine, Python's re, Java's java.util.regex, and Ruby's default engine are all backtracking. They're flexible — backreferences, lookaround, possessive quantifiers, anything you can describe structurally — but their worst-case runtime is exponential in the input length when the pattern has ambiguity the engine has to explore.

Linear-time engines compile the pattern to a deterministic finite automaton (DFA) or equivalent structure and walk it once per input character. Google's RE2, RE2J (Java port), and Rust's regex crate are in this family. They guarantee polynomial time — typically linear — at the cost of a smaller feature set. No backreferences. No general backtracking. Patterns that the engine can't compile into a DFA efficiently are simply rejected or refused.

The tradeoff is the whole story of regex security. Backtracking engines let you write anything; linear-time engines protect you from your own pattern. When you're hashing through untrusted input, you want the second family. When you're writing a one-off expression against known-shape data, the first is usually fine.

Why `^(a+)+$` hangs your browser

The canonical evil-regex pattern is simple enough to fit in a tweet and evil enough to destroy a server. Let's walk through it.

Pattern: ^(a+)+$

Input that matches: a string of a's with nothing else. Say aaaaa. Works instantly.

Input that breaks it: a string of a's with a non-a at the end that makes the anchor fail. Say aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaX.

Here's why. The inner a+ is greedy and matches as many a's as possible. The outer (...)+ then tries to match more of what's inside — more a's — but the first group already ate them all. Fine, say the engine, the inner group will give one back. Now the outer group can match a second iteration. But wait, the input ends in X, so $ fails. The engine backtracks. The inner group gives another a back. The outer group tries different ways to split the remaining a's. Every possible partition of the input into one-or-more groups of one-or-more a's has to be tried before the engine can conclude the input doesn't match.

The number of such partitions is exponential in the number of a's. For aaaaX it's 16 partitions. For aaaaaaaaX it's 256. For sixteen a's and a trailing X, it's 65,536. For thirty, it's over a billion. Your browser will either warn you about an unresponsive script or genuinely lock up that tab.

Paste it into our regex tester and start small. Pattern: ^(a+)+$. Test input: aaaaX. Instant. Double the a's: aaaaaaaaX. Still instant. Add another eight: aaaaaaaaaaaaaaaaX. Starting to notice a pause. At aaaaaaaaaaaaaaaaaaaaaaaaaaX, you're waiting seconds. At thirty a's with a trailing X, the tab is effectively dead until the browser steps in. This isn't a bug in our tool — it's the ECMAScript engine doing what you asked. Our tester is client-side JavaScript with no timeout logic, so the damage stays in your tab; a production server, without the browser's script-timeout guardrails, would be worse.

The specific pattern shape that causes this is nested quantifiers with overlap — an outer + or * wrapping a group that contains its own + or *, where the two can both match the same characters. (a+)+ is the textbook example. (a|a)+ is another. (a|aa)+ is another. (.*)* is one that writes itself by accident. The moment you see nested quantifiers, check whether an attacker could construct an input that makes the engine try an exponential number of partitions.

The fix is usually one of three things:

Rewrite the pattern so the inner and outer quantifiers can't overlap. a+ by itself is equivalent to (a+)+ for matching purposes, minus the ambiguity.
Use possessive quantifiers or atomic groups in languages that support them (PCRE, Java). (a+)+ becomes (?>a+)+ and the engine can't back up into the group. ECMAScript doesn't support these, which is part of why JavaScript-land has outsized ReDoS exposure.
Run the regex in a linear-time engine. In Node, the re2 package wraps Google's RE2. Patterns that would backtrack are either compiled into a linear-time DFA or rejected at compile time.

Most real-world ReDoS bugs are subtler than ^(a+)+$. They involve alternation, lookaround, or character classes that happen to overlap. But the underlying cause is always the same: the engine exploring too many paths through an ambiguous pattern.

The ReDoS hall of fame, 2024–2026

None of these were theoretical. Each one took down production code running in thousands of deployments. All of them had a common shape: a pattern that looked fine, shipped without pathological-input testing, and turned into a denial-of-service vector.

path-to-regexp — CVE-2024-45296, CVE-2024-52798, CVE-2026-4867. Three separate ReDoS vulnerabilities in the route-matching library used by Express, Next.js, and a good chunk of the Node ecosystem. The library generated dynamic regexes from path patterns like /users/:id — and under certain parameter configurations, the generated regex exhibited catastrophic backtracking on maliciously-crafted URLs. The 2026 follow-on (CVE-2026-4867) specifically affected patterns with three or more parameters in a single segment, like /:a-:b-:c; the earlier fix for CVE-2024-45296 had only protected the two-parameter case. If you use Express and haven't updated recently, check your lockfile.

ssh2 — CVE-2025-70034. An inefficient regex in the ssh2 library's key parser. Unauthenticated attackers could send a crafted key string and hang the library during the initial handshake.

Koa — CVE-2025-25200. The X-Forwarded-Proto and X-Forwarded-Host header parser in Koa used a backtracking-prone regex. An attacker with the ability to set these headers — anyone in a proxy chain, any reverse proxy users, any client if the app trusted headers directly — could cause excessive CPU usage. The irony of a URL-parsing regex being the vulnerability makes the case for using new URL() on its own.

CPython tarfile — CVE-2024-6232. CPython's tarfile module used a regex to parse PAX extended headers. A crafted tar file could contain headers that triggered catastrophic backtracking, pegging the interpreter during extraction. This affected anything that processed untrusted tar files — CI/CD pipelines, unpacking services, file upload handlers.

Huggingface Transformers — CVE-2025-6921, CVE-2025-2099. Two separate ReDoS bugs in the Transformers library. The first was in the weight-decay regex filters, where user-controlled patterns in include_in_weight_decay / exclude_from_weight_decay lists could be malicious. The second was in a docstring preprocessor where inputs with a high newline count triggered exponential backtracking. ML training pipelines that accepted user-specified configs were vulnerable.

The common thread across all five projects: none of the authors were careless. Huggingface's ML team isn't a bunch of regex beginners. CPython's maintainers are careful. Koa's developers are among Node's most experienced. These were patterns written by thoughtful engineers that turned out to have adversarial corner cases nobody anticipated, because nobody had modeled the input as adversarial. If Express's path matcher can ship three ReDoS CVEs in two years, the pattern of caring about this has to shift from "competent engineers write safe regex" to "every regex applied to untrusted input is potentially a vulnerability until tested otherwise."

Seven times regex is the wrong tool

This is the part of the post that'll make a few people mad. Some of these are almost religious arguments in programmer culture and I mean them all seriously.

1. Don't parse HTML with regex

This is the most famous "don't" in software. The canonical Stack Overflow answer opens with "You can't parse [X]HTML with regex" and demonstrates the point with garbled text that renders HTML as a warning in visual form — a one-time read, worth the click.

HTML isn't regular. It has nested structure. Tags can be self-closing or not depending on the tag. Attributes can be quoted with single quotes, double quotes, or unquoted entirely. Comments can span multiple lines and can contain almost anything. Script and style tag contents have different escaping rules than the rest of the document. Unicode, HTML entities, CDATA sections, DOCTYPEs, processing instructions — the list of edge cases that break a "simple" HTML-parsing regex is most of HTML.

What people try:

// Pulls all href values out of anchors, or so it seems
const links = html.match(/<a[^>]+href="([^"]+)"[^>]*>/g);

This breaks on <a href='…'> (single quotes), on <a class="x" href="…"> with attributes in different orders, on <a href="…" data-x="y > z"> where > appears inside an attribute value, on commented-out anchors, and on anchors split across lines. Each fix adds cases and spawns new ones.

What to do instead:

const parser = new DOMParser();
const doc = parser.parseFromString(html, "text/html");
const links = [...doc.querySelectorAll("a[href]")].map((a) => a.href);

In Node, cheerio, jsdom, or the built-in node:html-parser (experimental) give you the same DOM API against a string. These handle every edge case correctly because they implement the HTML5 parsing algorithm, which is about a hundred pages of spec that describes exactly what a browser does with malformed input. You cannot replicate that with regex. People have tried. The graveyard is large.

There are narrow exceptions. Extracting a single, known-shape attribute from a known-source document you produced yourself? Regex is fine. Extracting something like this from arbitrary untrusted HTML on the open web? Use a parser.

2. Don't parse JSON with regex

Every language that ships with a JSON parser ships with it for a reason. Use it.

const parsed = JSON.parse(jsonString);

That's the answer. JSON has nested structure, string escapes, unicode code points in escapes (\uXXXX), surrogate pairs for code points above U+FFFF, numbers in a specific format that regex can't fully validate (negative zeros, exponential notation, leading-zero rules). Trying to regex-match anything real in a JSON string — even something as simple as "find all the values of a specific key" — breaks the moment someone nests that key inside a string value.

If you want to format or inspect JSON, paste it into our JSON formatter. If you need a specific path inside a JSON document programmatically, JSON.parse it and walk the object, or use a JSON query language like JSONPath or jq. Regex against JSON is always the wrong choice.

3. Don't parse URLs with regex

Koa learned this the hard way with CVE-2025-25200 — a regex that was supposed to normalize the X-Forwarded-Host header accepted an attacker's crafted input and hung. The broader point is that URL parsing is exactly the kind of problem that looks simple and isn't. Schemes, authorities, userinfo, IPv6 literals in brackets, percent-encoding, query strings, fragments, internationalized domain names — URLs have more syntactic variety than most developers remember.

What people try:

// Extracts the hostname
const host = url.match(/^https?:\/\/([^/]+)/)?.[1];

This breaks on http://user:pass@host/ (auth leak into "host"), on http://[::1]:8080/ (IPv6 literal), on http://xn--e1afmkfd.xn--p1ai/ (IDN punycode), and on any URL with a port you didn't account for separately. Each "fix" invites another case.

What to do instead:

const url = new URL(userInput, "https://example.com");
console.log(url.hostname, url.pathname, url.searchParams.get("q"));

URL() handles all of it. It throws on invalid URLs instead of silently accepting garbage. It's available in every modern browser and every Node version you'd care about. If you need just encoding or decoding — the URL-encoded bits — our URL encoder does that without needing you to think about it. If you need to pull apart a URL's structure, call new URL() and access the properties.

4. Don't parse CSV with regex

CSV looks like "split on commas" for about ten minutes. Then you hit your first quoted string containing a comma. Then you hit an escaped quote inside a quoted string. Then you hit a newline inside a quoted string. Then you hit files with \r\n versus \n versus \r line endings and realize you need to split on lines before you split on commas, except you can't split on lines inside quoted fields.

The full grammar for "compliant" CSV is in RFC 4180 and it's not regular — a quoted field can contain anything, including the record separator. A regex can match the 80% of CSVs that don't use any of these features. It will silently corrupt the other 20%.

What people try:

// Splits a line into fields
const fields = line.split(",");

This breaks on Smith, John,Engineer where Smith, John is a single quoted field: "Smith, John",Engineer. It also breaks on cells containing newlines, on doubled quotes inside quoted fields ("She said ""hi"""), and on escape conventions that vary between Excel and other producers.

What to do instead:

import Papa from "papaparse";
const rows = Papa.parse(csvText, { header: true }).data;

In Python, the stdlib csv module. In most modern languages, there's a stdlib or widely-used CSV parser that handles the edge cases. The parser knows that a quote inside a quoted field should be doubled. Regex doesn't.

5. Don't validate email addresses with regex

Every developer has, at some point, copied a "validate email with regex" one-liner off Stack Overflow. Every developer has also, at some point, learned that the one-liner rejects a legitimate email address a user has actually been using for twenty years.

The RFC 5322 email grammar is permissive to the point of absurdity. Valid email addresses can contain spaces (quoted), comments, domain literals with IP addresses, and characters that no modern email provider actually accepts. Any regex that tries to match RFC 5322 exactly either ends up being a 6,000-character monster or silently permits things like admin(real comment)@example.com. Any regex that tries to match "what email addresses actually look like" in practice rejects plenty of legal ones.

The HTML5 spec punted on this and defined a willful violation of RFC 5322 for the <input type="email"> validation — a simpler pattern that accepts the common case and rejects obvious garbage. If you need a pragmatic regex, use theirs:

^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$

For real validation of a real address — the kind that matters for account signup — send the email. Have the user click a link. The bounce (or non-arrival) is the only test that actually confirms the address works. A regex can check the string has an @ and a dot somewhere after it. That's basically all it's good for. Everything else is a trap.

6. Don't parse dates or timestamps with regex

Regex can extract the digits from 2026-05-09T14:32:00Z — four digits, dash, two digits, dash, two digits, and so on. What it can't do is validate the date semantically. Is 2026-02-30 a valid date? No — February doesn't have 30 days. Regex can't tell. Is 2026-02-29 a valid date? Depends on leap-year rules. Regex doesn't know the Gregorian calendar.

Worse, time zones are their own rabbit hole. Is 14:32:00+05:30 India Standard Time? Yes. Is it +530? Also yes in some formats. Is it +05:30:00? Technically valid in ISO 8601 extended. Is Z the same as +00:00? Yes, except when discussing leap seconds.

What people try:

// Validates a date
const valid = /^\d{4}-\d{2}-\d{2}$/.test(input);

This accepts 2026-02-30, 2026-13-45, and 9999-99-99. It's a format check dressed up as a validity check.

What to do instead:

// Temporal is in Safari, Firefox, and Node 22+; polyfill available for older targets
const parsed = Temporal.PlainDate.from(input);   // throws on invalid

// Or with date-fns:
import { parseISO, isValid } from "date-fns";
const valid = isValid(parseISO(input));

In Python, datetime.fromisoformat() plus the stdlib zoneinfo. These handle the semantics correctly. Regex gets you the characters; the library tells you if those characters mean a real moment in time.

7. Don't parse numbers with regex

\d+\.\d+ looks like a floating-point regex. It's wrong the moment the input is in a locale that uses comma as a decimal separator, or thousands-grouped like 1,234,567.89, or expressed in scientific notation like 1.5e10, or has a sign, or is in a currency format, or is written in an Eastern script that uses different digit characters.

What people try:

// Validates a number
const valid = /^\d+\.\d+$/.test(input);

This rejects 1.5e10, -42.7, 1,234.56, €42,00 (German/French decimal), and 42 (no decimal). It accepts 00.00. It's format-pattern-matching, not number validation.

What to do instead:

// Parse, then check
const n = Number(input);
const valid = Number.isFinite(n);

For locale-aware parsing, Intl.NumberFormat constructor plus its .formatToParts() helper lets you reverse-engineer the separators for your target locale. For strict validation of a specific format you control, document the format to your users and use your language's native number parser — which already handles all the edge cases regex doesn't.

Regex pulling out the four digits of a year from a known-shape log line? Fine. Regex being the authoritative number-validator? Wrong tool.

When regex is actually the best tool

The list above is specific. Most programming tasks aren't on it. Here's where regex really is the right answer.

Log line extraction. \[(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z)\] (ERROR|WARN|INFO) (.*) pulled from your own structured log lines is cleaner than any parser. You control the shape of the input. The pattern is explicit. Every alternative is heavier.

Find-and-replace in a code editor. s/foo_\w+/bar_\1/g is how you rename thirty similar symbols in one action. No parser is coming for this use case.

Tokenizing language keywords in a lexer. When you're writing a compiler or interpreter, the first pass is usually regex-driven. if|else|while|for|return matched against a source buffer is fast, correct, and obvious. More sophisticated tokenization (like handling string literals with escape sequences) needs more than regex — but the keyword pass is classic regex territory.

Validating a format you control end-to-end. If your product's invoice numbers are always INV-YYYY-NNNN, a regex is the right tool. You control both the generator and the validator. There are no adversarial inputs because no outside system generates your invoice numbers.

Pattern extraction from known-shape data. Pulling all GitHub issue references (#1234) from a commit message. Extracting every hex color code from a CSS file. Finding every TODO comment in a source tree. Any "I want all the substrings that look like X" query against your own data is exactly what regex was designed for.

The common thread across these cases: the input is either controlled by you or is regular by nature. That's the sweet spot.

ECMAScript 2018+ features most advice ignores

A lot of the regex advice circulating on the web dates from an era when JavaScript regex was genuinely crippled compared to PCRE. That era ended in 2018. Here's what modern engines actually support, in case you've been writing around missing features you already have.

Lookbehind, including variable-length. (?<=USD )[\d,]+ matches a dollar amount preceded by USD. Variable-length works too: (?<=(foo|foobar) )baz matches baz preceded by either word. PCRE requires fixed-length lookbehind in most configurations; ECMAScript allows arbitrary-length. JavaScript is genuinely more flexible than PCRE here, though Safari only landed lookbehind in 16.4 (March 2023) — so if your user base still includes iOS Safari below 16.4, you need a fallback. V8 (Chrome 62, 2017) and SpiderMonkey (Firefox 78, 2020) have had it longer; Node has supported it since version 10.

Unicode property escapes. \p{L} matches any letter in any script. \p{N} matches any number. \p{Script=Hangul} matches Korean characters specifically. \p{Emoji} matches emoji. This is way more powerful than [a-zA-Z] for anything involving non-English text. Requires the u flag:

const hasNonLatin = /\p{L}/u.test("привет");  // true

The s dotall flag. Historically . didn't match newlines in JavaScript. With the s flag, it does. /<script>.*<\/script>/s now actually matches a multi-line script tag — not that you should be parsing HTML with regex, but the capability matters.

Named capture groups. (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2}) captures into match.groups.year, match.groups.month, match.groups.day. Way more readable than positional $1, $2, $3. Supported since Node 10 and all modern browsers.

Match indices with the d flag. ES2022 added a d flag that returns not just the matched text but the start and end indices of each match and each capture group. Useful for syntax highlighting, editor extensions, or any case where you need to know exactly where in the input a match occurred.

If you've been writing JavaScript regex in a 2015 mindset, ten minutes with the MDN regex guide will refresh what's available. Our regex tester exposes the g/m/i/s flags directly. Named groups appear in the match output automatically.

Flavor differences that matter when you copy-paste

regex101 is probably the most-used regex tester on the internet. It's excellent. It's also a flavor switcher — you pick PCRE, Python re, ECMAScript, Golang, or Java, and the engine behaves accordingly. The trap is that patterns built in one flavor and pasted into another can silently behave differently.

A partial table of differences that bite in practice:

Feature	ECMAScript	PCRE	Python `re`	Java	Go RE2
Named group syntax	`(?<name>...)`	`(?<name>...)` or `(?P<name>...)`	`(?P<name>...)`	`(?<name>...)`	`(?P<name>...)`
Variable-length lookbehind	yes	yes (with quirks)	no (until 3.7+)	limited	no
Backreferences	yes	yes	yes	yes	no (by design)
Possessive quantifiers `a++`	no	yes	3.11+	yes	no
Atomic groups `(?>...)`	no	yes	3.11+	yes	no
Unicode categories `\p{L}`	yes (`u` flag)	yes	yes	yes	limited
Recursion `(?R)`	no	yes	no	no	no

Three of these matter in practice. The named-group syntax difference means a Python pattern with (?P<name>...) silently fails in JavaScript because JavaScript doesn't recognize (?P<...>...) at all — it treats the P as a literal. Paste-from-regex101-with-PCRE, test in browser, scratch your head for an hour.

The backreferences difference is subtler. ECMAScript lets you use \1 to match whatever the first capture group matched. Go RE2 doesn't. If you wrote a pattern that relies on backreferences — matching paired HTML tags (don't), detecting duplicate words, verifying checksums — and you move from a Node service to a Go service, your pattern has to be rewritten.

The possessive-quantifier difference is why JavaScript-land has outsized ReDoS exposure compared to Java or PCRE. In Java or PCRE you can write (a++)+ to tell the engine not to back into the inner group. In ECMAScript you can't. You either rewrite the pattern structurally or you run it through RE2.

The practical takeaway: when you copy a pattern from regex101, check that the flavor in regex101 matches your production flavor. Our regex tester is ECMAScript-only, which is limiting if you need to test PCRE or Python but a feature if your production target is JavaScript — you won't get bitten by flavor drift. For flavor-switching, regex101 remains the best tool on the internet. We're the fast, no-signup alternative when you already know you're writing ECMAScript.

Writing safe regex

A compact set of habits that prevent most ReDoS bugs.

Avoid nested quantifiers with overlap. (a+)+, (a*)*, (a|a)+, (.*)* — any pattern where an outer + or * wraps a group that can match the same characters as itself — is the evil-regex shape. Flatten to a+ or a* if the nesting is redundant. Restructure to eliminate the ambiguity if it isn't. This one habit catches 80% of catastrophic-backtracking bugs.

Anchor patterns against the shape of the input. ^[a-z]+$ is safer than [a-z]+ because the engine doesn't have to try the pattern at every offset. Anchoring tells the engine where the match must start and end, which eliminates entire classes of backtracking.

Use character classes instead of alternation when possible. [abc] is equivalent to (a|b|c) but much faster in every backtracking engine — the engine tests a single character class rather than three alternation branches. For alternation of longer strings, factor out common prefixes: (foo|foobar) becomes foo(bar)? which avoids redundant path exploration.

Set a timeout or input-size limit on untrusted input. Python 3.11+ added re.Pattern timeouts in a limited form. Java's Matcher.find() can be interrupted. In Node, if you expect adversarial input, use the re2 package — Google's RE2 engine — which is linear-time by construction. Limit the input size before you apply the regex: a 10KB URL is already suspicious; truncate to 2KB before you try to parse it, and a ReDoS becomes a polynomial-time annoyance instead of an exponential disaster.

Test against adversarial input. The development habit that catches this best is maintaining a set of "weird inputs" for every regex you write — unusually long, unusually repeated, unusually nested, unusually unicode, unusually empty. Run them through a CI check with a per-case timeout. If any input takes more than, say, 100ms, fail the build. safe-regex for Node is a linter that detects common evil-regex shapes statically. It won't catch everything but it'll catch the textbook cases.

Use RE2 or similar when input is untrusted. This is the nuclear option and it's almost always the right one for user-controlled input. RE2 is linear-time, immune to catastrophic backtracking by design, and its feature set is 90% of what you need. The remaining 10% (backreferences, complex lookaround) is often best avoided on untrusted input anyway because the patterns that need them are usually the fragile ones.

What we built with our regex tester

Our regex tester is deliberately narrow. It supports the ECMAScript engine — the one that runs in your Node service or your browser JavaScript — with the four flags people actually use: g for global, m for multiline, i for case-insensitive, and s for dotall. It has a cheatsheet panel with the common metacharacters, quantifiers, anchors, and groups. Six presets cover the queries people paste in repeatedly: email, URL, IP address, US phone number, hex color, and ISO date. Named capture groups work; they show up in the match output next to positional groups. There's a replace mode for testing substitutions.

What it doesn't do is maybe more important than what it does. It doesn't switch flavors — regex101 already does that well and we're not competing with it. It doesn't detect catastrophic-backtracking patterns or time out long-running matches. That means the evil-regex demo in the last section will genuinely eat CPU in your browser tab until the browser warns you or you close it. That's a feature gap, honestly — we'd consider building a pattern-shape linter that warns on nested quantifiers, or a wrapped engine that kills matches exceeding a threshold. Neither is shipped today. If you need a timeout-aware production regex, run your match inside a worker with a kill timer, or use RE2 directly.

The tool does one job and does it fast. No signup. No server round-trip. No pattern data leaving your browser. If your pattern is confidential — a security rule, an internal URL schema — it's safer in our tool than anywhere that calls out to a backend to evaluate.

FAQ

What is a regex?

A regular expression is a pattern that describes a set of strings. You pass the pattern and an input to a regex engine; the engine tells you whether the input matches and where. Every mainstream programming language ships one. They're used for searching, validation, and light text processing.

How do I test a regex without running my code?

Use a regex tester. Ours is ECMAScript-only, good for JavaScript or Node testing, shows matches and captures live as you type. regex101 is the standard for multi-flavor testing. For a fast local CLI, grep -E or rg (ripgrep) against a sample file works well too.

How do I match an email address with regex?

You don't, really. The simplified HTML5 input-validation pattern is the best pragmatic compromise, but for actual validation of actual user email addresses, send the email and verify the reply. A regex can check that the string looks like an email; it can't check that the email works.

Can you parse HTML with regex?

Technically yes for trivial cases. Practically no for anything real. Use DOMParser in the browser, cheerio or jsdom in Node. The famous Stack Overflow answer on this subject remains accurate and funny.

What's catastrophic backtracking?

When a regex engine explores an exponential number of possible match paths because the pattern is ambiguous and the input causes the engine to try every ambiguity before giving up. Manifests as a regex that runs instantly on short inputs and hangs on slightly longer ones. The classic example is ^(a+)+$ against a long string of a's followed by a non-a character.

What's ReDoS?

Regular expression Denial of Service. An attacker submits input that triggers catastrophic backtracking in a server's regex, pegging the CPU and potentially bringing the service down. Real CVEs in 2024–2026 affected path-to-regexp (3x), Koa, ssh2, CPython's tarfile module, and Huggingface Transformers. It's not theoretical.

Does JavaScript support lookbehind?

Yes, though not uniformly. V8 (Chrome 62, Node 10) has had lookbehind since 2017. SpiderMonkey (Firefox 78) since 2020. Safari held out until 16.4 in March 2023, making it the last major engine to support it. Old advice claiming "JavaScript doesn't support lookbehind" is outdated, but if your user base still runs iOS Safari below 16.4, you still need a fallback.

What's the difference between PCRE and ECMAScript regex?

PCRE supports features ECMAScript doesn't: possessive quantifiers, atomic groups, recursion, certain escape sequences. ECMAScript supports some features PCRE requires special syntax for (variable-length lookbehind). The biggest practical difference for security is that JavaScript lacks atomic groups and possessive quantifiers, which means you can't easily prevent backtracking structurally — you either rewrite the pattern or run it through a linear-time engine like RE2.

Is regex the right tool for URL parsing?

No. Use new URL() in JavaScript, urllib.parse in Python, or the standard URL-parsing function in your language. URL structure has too many edge cases (IPv6 literals, userinfo, fragment encoding, internationalized domain names) to regex reliably. Koa's CVE-2025-25200 is a direct example of why.

When is regex not the right tool?

When the input has nested structure (HTML, JSON, XML, code), when it needs semantic validation (dates, emails, numbers with locales), when it comes from untrusted sources and you haven't thought about ReDoS, or when a real parser for the format already exists and is only a library install away. Most "should I use regex for X" answers are no.

What are named capture groups?

Capture groups with names instead of numbers. (?<year>\d{4}) captures four digits under the name "year"; in the match, you access it as match.groups.year. Makes patterns self-documenting, especially when you have multiple groups. Supported by most modern engines (ECMAScript, PCRE, Python 3, Java).

Should I use regex or a parser?

If the format you're matching has nested structure, semantic rules, or a standard parser already exists, use the parser. If the input is flat, regular, and you control or know its shape, regex is usually the right answer. When in doubt, try the parser first — the refactor from regex to parser is always harder than the reverse.

One more thing

We built our regex tester because the problem with regex isn't the syntax — that's learnable in an afternoon. The problem is the silence when it goes wrong. A pattern that matches on your test input and fails on an attacker's input doesn't tell you. A pattern that runs in microseconds on strings your QA thought of and in hours on strings they didn't doesn't tell you. A pattern that subtly drifted between the PCRE flavor you built it in and the ECMAScript flavor your code runs in doesn't tell you. The tooling that exists — regex101, our tester, grep, rg — all give you the same silence. They test the case you happen to paste in. What they can't do is imagine the case you didn't think of.

That's the gap this post is trying to address in prose, since no tester today fills it automatically. Our regex tester is fast and private and supports the four flags that matter, and it's deliberately not a PCRE switcher (regex101 is better at that) or a ReDoS detector (neither of us catches that today). Those are feature gaps we'd take seriously if the community-interest case were strong enough; for now, the honest answer is that the pattern-shape discipline in the "safe regex" section does more than any tool can.

If you want to see the same "wrong tool for the job" shape in a different domain, our hash algorithm guide covers developers using SHA-256 where they should be using Argon2, or MD5 where they should be using SHA-256. The cross-domain lesson is the same: ninety percent of production bugs are tools used outside their lane. For authentication-tooling specifically, our JWT decoder guide is the sibling developer-security post. For your next regex, paste it into the tester and try the adversarial input before you ship.

regexregex-testerredoscatastrophic-backtrackingecmascript-regexpcreregex-performanceregex-securityjavascript-regexlookbehindunicode-property-escapesre2regex-flavorsregex-cheatsheetdebugging

All articles

Practice with free tools

200+ free developer tools that run in your browser.

Browse all tools →

When Regex Is the Wrong Tool

What regex actually is, in sixty seconds

Why `^(a+)+$` hangs your browser

The ReDoS hall of fame, 2024–2026

Seven times regex is the wrong tool

1. Don't parse HTML with regex

2. Don't parse JSON with regex

3. Don't parse URLs with regex

4. Don't parse CSV with regex

5. Don't validate email addresses with regex

6. Don't parse dates or timestamps with regex

7. Don't parse numbers with regex

When regex is actually the best tool

ECMAScript 2018+ features most advice ignores

Flavor differences that matter when you copy-paste

Writing safe regex

What we built with our regex tester

FAQ

One more thing

Related articles

When Base64 is the wrong tool

The password defaults that actually matter

What a JWT actually is and what you shouldn't put in one

Practice with free tools

What regex actually is, in sixty seconds

Why ^(a+)+$ hangs your browser

The ReDoS hall of fame, 2024–2026

Seven times regex is the wrong tool

1. Don't parse HTML with regex

2. Don't parse JSON with regex

3. Don't parse URLs with regex

4. Don't parse CSV with regex

5. Don't validate email addresses with regex

6. Don't parse dates or timestamps with regex

7. Don't parse numbers with regex

When regex is actually the best tool

ECMAScript 2018+ features most advice ignores

Flavor differences that matter when you copy-paste

Writing safe regex

What we built with our regex tester

FAQ

One more thing

Related articles

When Base64 is the wrong tool

The password defaults that actually matter

What a JWT actually is and what you shouldn't put in one

Practice with free tools

Why `^(a+)+$` hangs your browser