Brand impersonation, population scale

Who's pretending to be your wallet.

We swept 242 wallet / fintech / bank / saas brands across 1,160 TLDs (1,081 gTLDs + 79 ccTLDs) through five detection layers - TLD-squat, DL-1, homoglyph, IDN-punycode, and combosquat. gTLDs come from ICANN CZDS zone files; ccTLDs are probed via DoH-as-existence-oracle (catches .ru / .cn / .ir / .de without zone access). Every entry below is a real, currently-resolving domain. The list isn't speculation; it's an operational threat surface as of the most recent run.

last run - loading...
Blocklist blind-spot

Crowdsourced phishing intel doesn't see this.

loading...

feeds compared - PhishTank online-valid - URLhaus recent - CryptoScamDB blacklist - OpenPhish public feed

Search

Find a specific domain in the corpus.

Substring match across all 40K+ matches. Useful for checking if a specific apex you saw elsewhere is in our set.

Detection layers

Each match is tagged with the most-specific layer that produced it. Combosquat is the most realistic phishing phenotype; tld-squat is the most brazen.

loading...
TLD heatmap (all)

Where impostors concentrate.

All TLDs ranked. gTLDs dominate by absolute count.

top 25
loading...
ccTLD coverage · new

Country-code impostors.

DoH-probed across 85 ccTLDs — covers .ru / .cn / .ir / .de / .fr / .ph via existence test, not zone files.

loading…
loading...
Brand × ccTLD geography

Top brand × ccTLD pairs by impostor count.

ccTLD impersonation density per (brand, ccTLD), with brand-portfolio rows excluded. We use the defensive_likely flag (TXT-cluster verification token shared with the legitimate apex, or NS records shared with the legitimate apex) to separate portfolio from impersonation; flag_defensive.py currently tags ~11.5% of ccTLD-squat rows as portfolio, leaving the rest as real impersonation candidates. The "SLD literally equals the brand" rule we used initially was retired — it was too aggressive (would mark metamask.ru as portfolio when it's an impostor). Cell counts are live-fetched.

loading…
loading top pairs…
Full grid
loading…

cell shade = log-scaled count · scroll horizontally on narrow screens · click any cell to drill into the brand

Brand grid.

click any card to drill into matches
loading...
Methodology 5 detection layers · multi-feed cross-reference (see /feeds/) · DoH liveness · linked-bad graph · precision-filtered (live counts above).

Candidate generation (5 layers)

  • TLD-squat: exact <brand>.<X> across 1,160 TLDs (1,081 gTLDs from CZDS + 79 ccTLDs via DoH-as-existence-oracle).
  • DL-1: every Damerau-Levenshtein-1 neighbour.
  • Homoglyph: confusables — m/rn, l/1, o/0, i/1, e/3, a/4, s/5, b/6.
  • IDN-punycode: Cyrillic / Greek / Latin lookalikes via xn-- encoding.
  • Combosquat: <brand>+<keyword> with phish/wallet keywords.

Verification + enrichment

  • gTLDs: joined against zone_file (~265M apex). ccTLDs: DoH-probe presence proof.
  • Re-resolved over DoH for the live flag; live IPs get ASN/country/ISP.
  • Joined against RDAP (registrar / dates / country) and Tranco L7KW4 (2026-04-23).
  • Cross-referenced against every abuse feed we ingest (PhishTank, URLhaus, CryptoScamDB, OpenPhish, ThreatFox, Phishing.Database, MetaMask eth-phishing-detect, Phishing Army, abuse.ch MalwareBazaar / Feodo / SSLBL — live status on /feeds/) for known-bad.
  • Non-CDN IP/ASN adjacency to a known-bad row = linked-bad. Trichotomy = known / linked / blind.
  • SaaS-verification TXT or shared-NS with legitimate apex would tag a row defensive_likely (brand-owned portfolio) and exclude it from impostor counts. Heuristic is wired; the post-match flag_defensive.py run is queued — current flagged count is small.
Tranco snapshot: ... — Dowdall (CrUX + Farsight + Majestic + Cloudflare Radar + Cisco Umbrella, 30-day).
Brands: 242 raw = 42 hand-curated + 200 auto-Tranco. Code: web/lookalike_mining/{match,cctld_probe,enrich,liveness,cluster,linked_bad,cluster_txt,flag_defensive}.py.
Precision filter (default-on): drops infrastructure brands (gtld-servers, akamaized, msftncsi, …), DL-1 of brand keys < 5 chars, Tranco-popular collisions (rank ≤ 100K), and any defensive_likely rows (TXT-cluster or shared-NS with legitimate apex). The headline KPIs above are live-fetched and reflect the filtered corpus. Pass ?precision_mode=raw on any endpoint to see the unfiltered corpus.
Honest limits RDAP sparse · Tranco-rank ≠ proof · ccTLD via DoH-probe only · DL-1 noisy · linked-bad is a graph claim.
  • RDAP coverage on matches is sparse — long tail has null RDAP. Post-match RDAP run is queued.
  • A non-null Tranco rank isn't proof of impersonation — popular site collisions exist (e.g. olx.org is DL-1 of okx). The page surfaces the rank for human judgment.
  • ccTLD coverage is DoH-probe only — presence + liveness, not enumeration. OpenINTEL access would close that gap; we have not yet applied.
  • DL-1 over-expands on short brand keys; the precision filter drops dl1 hits when the brand key is < 5 chars.
  • Linked-bad is a graph claim, not a confirmed-bad claim. CDN/cloud ASNs are filtered before linking.
  • "Live" means it answered A/AAAA at the last DoH sweep — some attackers cloak by IP, so live ≠ visibly malicious.
  • Zone-history is a single in-progress snapshot today — pre-victimization-window claims (zone-presence-time vs feed-flag-time) require multiple snapshots and aren't yet measurable on this corpus; framed only as future work.
  • defensive_likely tags ~11.5% of ccTLD-squat rows currently (TXT-cluster + shared-NS heuristic). The complementary "SLD literally equals the brand" rule was retired because it over-flagged real impersonation cases like metamask.ru.
Pipeline
loading…