Who's pretending to be your wallet.
We swept 242
wallet / fintech / bank / saas brands across
1,160
TLDs (1,081 gTLDs + 79 ccTLDs) through five detection layers - TLD-squat,
DL-1, homoglyph, IDN-punycode, and combosquat. gTLDs come from ICANN CZDS
zone files; ccTLDs are probed via DoH-as-existence-oracle (catches
.ru / .cn / .ir / .de without zone access).
Every entry below is a real, currently-resolving domain. The list isn't
speculation; it's an operational threat surface as of the most recent run.
Crowdsourced phishing intel doesn't see this.
loading...
feeds compared - PhishTank online-valid - URLhaus recent - CryptoScamDB blacklist - OpenPhish public feed
Find a specific domain in the corpus.
Substring match across all 40K+ matches. Useful for checking if a specific apex you saw elsewhere is in our set.
Each match is tagged with the most-specific layer that produced it. Combosquat is the most realistic phishing phenotype; tld-squat is the most brazen.
Where impostors concentrate.
All TLDs ranked. gTLDs dominate by absolute count.
Country-code impostors.
DoH-probed across 85 ccTLDs — covers .ru / .cn / .ir / .de / .fr / .ph via existence test, not zone files.
Brand × ccTLD geography Top brand × ccTLD pairs by impostor count.
ccTLD impersonation density per (brand, ccTLD), with brand-portfolio
rows excluded. We use the defensive_likely flag
(TXT-cluster verification token shared with the legitimate apex,
or NS records shared with the legitimate apex) to separate
portfolio from impersonation; flag_defensive.py currently
tags ~11.5% of ccTLD-squat rows as portfolio, leaving the rest as
real impersonation candidates. The "SLD literally equals the brand"
rule we used initially was retired — it was too aggressive
(would mark metamask.ru as portfolio when it's
an impostor). Cell counts are live-fetched.
loading… ▶
Top brand × ccTLD pairs by impostor count.
ccTLD impersonation density per (brand, ccTLD), with brand-portfolio
rows excluded. We use the defensive_likely flag
(TXT-cluster verification token shared with the legitimate apex,
or NS records shared with the legitimate apex) to separate
portfolio from impersonation; flag_defensive.py currently
tags ~11.5% of ccTLD-squat rows as portfolio, leaving the rest as
real impersonation candidates. The "SLD literally equals the brand"
rule we used initially was retired — it was too aggressive
(would mark metamask.ru as portfolio when it's
an impostor). Cell counts are live-fetched.
cell shade = log-scaled count · scroll horizontally on narrow screens · click any cell to drill into the brand
Brand grid.
click any card to drill into matchesMethodology 5 detection layers · multi-feed cross-reference (see /feeds/) · DoH liveness · linked-bad graph · precision-filtered (live counts above). ▶
Candidate generation (5 layers)
- TLD-squat: exact
<brand>.<X>across 1,160 TLDs (1,081 gTLDs from CZDS + 79 ccTLDs via DoH-as-existence-oracle). - DL-1: every Damerau-Levenshtein-1 neighbour.
- Homoglyph: confusables — m/rn, l/1, o/0, i/1, e/3, a/4, s/5, b/6.
- IDN-punycode: Cyrillic / Greek / Latin lookalikes via xn-- encoding.
- Combosquat:
<brand>+<keyword>with phish/wallet keywords.
Verification + enrichment
- gTLDs: joined against
zone_file(~265M apex). ccTLDs: DoH-probe presence proof. - Re-resolved over DoH for the live flag; live IPs get ASN/country/ISP.
- Joined against RDAP (registrar / dates / country) and Tranco L7KW4 (2026-04-23).
- Cross-referenced against every abuse feed we ingest (PhishTank, URLhaus, CryptoScamDB, OpenPhish, ThreatFox, Phishing.Database, MetaMask eth-phishing-detect, Phishing Army, abuse.ch MalwareBazaar / Feodo / SSLBL — live status on /feeds/) for known-bad.
- Non-CDN IP/ASN adjacency to a known-bad row = linked-bad. Trichotomy = known / linked / blind.
- SaaS-verification TXT or shared-NS with legitimate apex would tag a row defensive_likely (brand-owned portfolio) and exclude it from impostor counts. Heuristic is wired; the post-match
flag_defensive.pyrun is queued — current flagged count is small.
web/lookalike_mining/{match,cctld_probe,enrich,liveness,cluster,linked_bad,cluster_txt,flag_defensive}.py.defensive_likely rows (TXT-cluster or shared-NS with legitimate apex). The headline KPIs above are live-fetched and reflect the filtered corpus. Pass ?precision_mode=raw on any endpoint to see the unfiltered corpus.Honest limits RDAP sparse · Tranco-rank ≠ proof · ccTLD via DoH-probe only · DL-1 noisy · linked-bad is a graph claim. ▶
- RDAP coverage on matches is sparse — long tail has null RDAP. Post-match RDAP run is queued.
- A non-null Tranco rank isn't proof of impersonation — popular site collisions exist (e.g.
olx.orgis DL-1 ofokx). The page surfaces the rank for human judgment. - ccTLD coverage is DoH-probe only — presence + liveness, not enumeration. OpenINTEL access would close that gap; we have not yet applied.
- DL-1 over-expands on short brand keys; the precision filter drops dl1 hits when the brand key is < 5 chars.
- Linked-bad is a graph claim, not a confirmed-bad claim. CDN/cloud ASNs are filtered before linking.
- "Live" means it answered A/AAAA at the last DoH sweep — some attackers cloak by IP, so live ≠ visibly malicious.
- Zone-history is a single in-progress snapshot today — pre-victimization-window claims (zone-presence-time vs feed-flag-time) require multiple snapshots and aren't yet measurable on this corpus; framed only as future work.
- defensive_likely tags ~11.5% of ccTLD-squat rows currently (TXT-cluster + shared-NS heuristic). The complementary "SLD literally equals the brand" rule was retired because it over-flagged real impersonation cases like
metamask.ru.