How to detect typosquatting domains (2026 playbook)

Samira Haddad

May 20, 202613 min read

Contents

Introduction

Most brand-targeting phishing campaigns still start with a domain. Attackers do not need to defeat your email gateway, your endpoint stack, or your zero-trust controls if they can convince a customer or employee that arnazon-secure.com belongs to you. Typosquatting — registering hostnames that look or read like a trusted name — remains one of the cheapest, fastest, and most durable phishing primitives in 2026.

Detection alone is not the goal. A typosquat program has to move from signal to case to takedown with evidence a registrar, host, or marketplace will act on. This playbook covers the detection sources that actually fire early, how to score risk so analysts triage the right cases first, what the evidence package should contain, and the KPIs that prove the program is reducing customer harm — not just producing dashboards. For the mechanics underneath the strategy, pair this article with our guide on how typosquat detection works.

Anatomy of a typosquat

Calling everything a "typo" obscures what is actually happening. A modern brand-abuse program catalogs at least these variants and treats each as a separate detection class:

Character typos and transpositions: amazno.com, gogle.com, microsfot.com. The classic case — single keystroke errors a tired user makes on mobile.
Homoglyph attacks and IDN homographs: replacing Latin characters with visually similar Cyrillic or Greek glyphs, often delivered through Punycode (xn--) labels. Looks identical in many fonts; renders as the brand name in the address bar.
Combosquatting: the brand name plus an action word — brand-secure.com, brand-login.support, verify-brand-account.com. Increasingly the dominant class for phishing kits because the brand string is intact for SEO and victim recognition.
TLD and ccTLD swaps: brand.co, brand.io, brand.cm (a notorious typosquat sink for .com), or new gTLDs aimed at credibility (brand.support, brand.bank where unrestricted, brand.app).
Sound-alike and translation squats: phonetic variants, plurals, and localized translations that bypass naive string matchers.
Cybersquatting and parking: not all squats are live phishing — many sit on parked-ad pages waiting to be sold, sometimes for years, and convert into weaponized infrastructure within hours when a campaign launches.

A useful working definition for triage: a typosquat is any registered hostname whose resemblance plus context creates plausible customer confusion. Resemblance alone produces noise. Context — MX records, nameserver, hosting fingerprint, content, paid distribution — is what turns it into a case worth working.

Where typosquats are first visible

No single source catches everything in time. A 2026 typosquat program treats detection as a layered pipeline, not a single feed. Each layer trades off latency, recall, and false-positive rate.

New domain registration feeds (zone files and registrar TLD lists): the earliest possible signal — often hours after a domain is created. Best recall for .com, .net, and most legacy gTLDs; coverage thins on ccTLDs without published zones.
Certificate Transparency logs: when a domain operator requests a TLS certificate, the issuance is logged publicly. CT is the single best signal that a domain has moved from parked to operationalized — the kit is being prepped to look legitimate. See ICANN's primer on certificate transparency.
Passive DNS: shows when a domain begins resolving, where it points, and how the infrastructure shifts over time. Particularly valuable for catching reactivation of an aged squat.
WHOIS / RDAP: registrant fingerprints (where not redacted), creation/expiry dates, registrar choice, and nameserver patterns. Cluster across many domains to spot the same actor running a campaign.
Email infrastructure signals: MX records, SPF/DKIM/DMARC posture, and mail-receiving configuration. A typosquat with valid MX is being prepared for credential phishing or BEC; one without is more likely a web-only lure.
Content and screenshot capture: once a candidate is identified, render the live page and compare layout, logos, and form actions against your real properties. This is the signal that converts "suspicious" into "confirmed brand abuse" for a registrar abuse desk.
Search and paid surfaces: abusive domains often surface in branded search or paid ads before they show up in your inbox. Cross-reference with SEO poisoning protection and ad-fraud feeds so a typosquat that buys traffic gets escalated immediately.

Layered detection means recall stays high without flooding analysts. Threat intel enrichment joins the layers — a CT hit on a combosquat with valid MX and a recently registered Cloudflare-fronted host is not the same case as a parked TLD swap with no certificate.

Permutation engines and dictionaries

The seed for any typosquat program is the list of marks you defend: brand names, product names, executive surnames, primary login subdomains, and high-value campaigns. From those marks, a permutation engine generates the candidate hostnames to watch for. Modern engines combine:

Edit-distance variants: single-character substitutions, insertions, deletions, and transpositions across the keyboard. Levenshtein distance ≤ 2 catches most human typos; tune by mark length.
Homoglyph maps: per-script substitution tables (Cyrillic, Greek, Armenian, full-width Latin) plus Unicode confusables. Render candidates through a real font to filter visually-distinct hits.
Combosquat dictionaries: action and trust words (secure, login, verify, support, help, billing, account, portal), localized variants, and current-campaign keywords.
TLD set: at minimum all gTLDs you do not own plus high-risk ccTLDs (.cm, .co, .io, .app, .support, .bank where unrestricted) and any ccTLDs aligned to markets you operate in.
Phonetic and translation expansions: soundex/metaphone for English marks; localized translations for brand terms in markets where attackers run native-language lures.

The mistake to avoid is treating permutation output as detections. It is a watchlist. The detections happen when an entry on that watchlist appears in zone files, CT, passive DNS, or your inbound intel feeds. Domain monitoring and takedowns ties the watchlist to live feeds so the analyst only sees something when it actually becomes real.

Risk scoring beyond name similarity

String similarity is necessary but not sufficient. A 2026 risk score blends similarity with operationalization signals so the queue surfaces the cases most likely to harm customers this week:

Name and brand confusion (recall): edit distance, homoglyph density, combosquat term weighting, brand-confusable substring placement (prefix vs middle vs suffix).
Activation signals: CT certificate present, valid MX, A/AAAA resolving, NS pointing to a host class that ships phishing kits, port 443 returning a non-default response.
Content fidelity: page renders a brand-similar layout, hosts a logo asset, points form actions at credential-capture endpoints, or includes telltale phishing-kit strings.
Distribution evidence: appears in paid ads, branded SERPs, email infrastructure spam telemetry, or referrer traffic at your real properties.
Actor clustering: shares WHOIS fingerprint, registrar, NS, TLS issuer, or kit fingerprint with other confirmed bad domains.
Customer surface alignment: matches a high-value login or payment subdomain, an active campaign name, or an executive surname covered by executive impersonation protection.

Publish the score definition with the team and revisit quarterly. Scoring drift is real: a feature like "Cloudflare-fronted" that was a strong tell three years ago is now table stakes for most legitimate sites and contributes little signal. Treat the score model the way you would treat a fraud model — with labeled outcomes, regular reviews, and a way to override on judgment.

Evidence that holds up at the registrar

A detection only matters if it can become a takedown. Registrars, hosts, marketplaces, and search platforms will not act on screenshots alone; they need a reproducible evidence package that ties the abuse to a brand right. The same pack should satisfy internal counsel and, where escalation is warranted, law enforcement. Build the package from these blocks:

Identifiers: the abusive domain, hostnames, IPs, NS records, MX records, and TLS certificate chain at observation time — with timestamps in UTC.
Registration and ownership: WHOIS or RDAP excerpts, registrar identity, creation date, and any history of prior brand-abuse activity at the same registrant fingerprint.
Content captures: full-page screenshots, HTML snapshots, HAR/network captures, and a hash of every captured artifact. Include both the lure landing and the credential-submission flow.
Brand right and confusion claim: the trademark registration, the protected mark in use on the live abusive site, and a concise statement of likely consumer confusion. Keep this language reusable.
Harm narrative: evidence of distribution (paid ad placements, smishing samples, social shares), and where applicable, victim reports already received.
Provider routing: the abuse contact for the registrar, host, CDN, and TLS issuer, with the policy clauses the case violates. Citing the platform's own terms accelerates action.

Templates pay back fast. See our walkthrough on documenting evidence for abuse reports for a working template you can adopt as-is.

Takedown channels for typosquat domains

Not every typosquat is removed the same way. Match the channel to the abuse so dwell time stays low:

Registrar abuse: the fastest path when the registrar has a published abuse policy and the evidence cleanly maps to phishing or trademark infringement. Often resolves in hours for tier-1 registrars.
Host / CDN abuse: when the registrar is unresponsive or the operator is shielded behind a reverse proxy, removing the content host can still neutralize victim risk even if the registration persists.
TLS issuer revocation: Certificate Authorities will revoke certificates issued to clearly fraudulent domains. Useful as a secondary lever and as a signal to browsers.
UDRP / URS dispute: when ownership is the real fight, the WIPO UDRP process and ICANN's URS provide a structured path to transfer or suspend the domain. Slower than abuse channels but durable.
Search and ads: simultaneously remove the abusive listings from paid and organic surfaces. A removed domain that still ranks for branded queries continues to harm customers for days.
Marketplace, social, and app store: when the typosquat is fronted by impersonation on another platform — a fake support handle linking to the squat — file there in parallel. See social media monitoring and takedowns.

ICANN's policy work is the substrate every registrar operates inside; the ICANN policy hub is worth reading at least once so your evidence language aligns with the system you are pulling on. For enterprise framing, the NIST Cybersecurity Framework covers how detection and response responsibilities split across functions — useful when justifying typosquat program investment to the steering committee.

Typosquat program KPIs for 2026

Volume is a diagnostic, not an outcome. A credible typosquat program reports on a small steady set of KPIs that map to customer harm reduction:

Time-to-confirm: from first detection to analyst classification with risk score and owner.
Time-to-suspend (p50 / p90): from confirmation to the abusive domain or content no longer serving typical victims. Segment by channel and registrar class.
Evidence completeness: share of escalations that ship with the full package on the first send. Aim above 90%; rework is where dwell time accumulates.
Recycle rate: share of closed typosquat cases where the same kit, registrant pattern, or infrastructure fingerprint reappears within 30 / 60 / 90 days. High recycle rate signals that follow-on investment is needed — registrar relationship work, law-enforcement packages, or persistent blocklists.
Customer-visible exposure window: for high-severity cases, the elapsed time abusive content was reachable to a typical customer (resolution + cached copies + branded SERP visibility).
Catch-rate against external sightings: how often a customer report or external feed surfaces a domain your program had not yet flagged. This is your recall canary.

Pair these with the broader framework in takedown metrics that matter (2026) so the typosquat numbers compose into the executive view, not compete with it.

Operational checklist

Maintain a versioned mark list: brand names, login subdomains, executive surnames, active campaigns. Review quarterly with marketing and security.
Run layered detection: zone files + CT + passive DNS + WHOIS/RDAP + content capture, fed into one case object.
Score risk, do not just match strings: activation signals beat similarity for triage ordering.
Templatize the evidence pack: identifiers, registration, captures, brand-right claim, harm narrative, provider routing.
Route to the right channel: registrar, host, CA, UDRP, search/ads, or platform abuse — in parallel where the case justifies it.
Track recycle rate: the first removal is rarely the last. Persistent actors require persistent enforcement.
Report KPIs honestly: segmented, with sample sizes, and tied to customer harm not ticket counts.

Public reporting fits inside the same picture: phishing fraud is documented and escalated through channels such as the FBI's Internet Crime Complaint Center (IC3), and the FBI's phishing reference is useful for educating non-security stakeholders about why typosquats matter. CISA frames cyber defense as a shared responsibility — see CISA's cyber threats and response hub.

When you want this workflow without rebuilding it from spreadsheets and scripts, PhishEye runs the layered detection, scoring, evidence assembly, and multi-channel takedown in one workspace. Start free, log in, book a demo, or contact sales and we will map the program to your current marks and registrar stack.

Authoritative references

On PhishEye: explore the typosquatting protection product, the domain monitoring & takedowns capability, and the guides library — including how typosquat detection works and documenting evidence for abuse reports.

← Back to blog