Changelog

How the project has evolved.

A dated record of the platform, the dataset, and the research. Entries are filed by month and tagged by the kind of change, milestones, data-shape changes, and infrastructure moves each have their own tone.

2026-04-26 research

ccTLD coverage + trichotomy + operator profiles

Brand-impersonation pipeline expanded: 5 detection layers (added idn_punycode + cctld_squat) × 1,160 TLDs (1,081 gTLDs from CZDS + 79 ccTLDs via DoH-as-existence-oracle). Linked-bad graph layer added: rows sharing a non-CDN IP or non-cloud ASN with a known-bad row are tagged adjacent (CDN/cloud anchor ASNs filtered before linking). Trichotomy is the headline result: known-bad / linked-bad / true-blind-spot. Defensive-likely heuristic wired (TXT-cluster or shared-NS with legitimate apex). Abuse-feed cross-reference extended to 11 sources (PhishTank, URLhaus, CryptoScamDB, OpenPhish, ThreatFox, Phishing.Database, MetaMask eth-phishing-detect, Phishing Army, abuse.ch MalwareBazaar / Feodo / SSLBL) — adding ~600K rows of crypto + general-phishing intel; live coverage on /feeds/. Server-side precision filter added to the API: drops infrastructure brands, dl1-of-short-brand noise, ccTLD-squat portfolio rows, and Tranco-popular collisions; pass ?precision_mode=raw on any endpoint to see the unfiltered corpus. New pages: /feeds/ (live abuse-feed status), /operators/ (IP + TXT cluster profiles), /lookalikes-page/ + /d-page/ + /registrar-page/ (catch-alls so non-curated paths stop 404'ing). New /eval/ artifacts: per-claim citation chain, trichotomy Sankey diagram, multi-feed coverage bars. New /lookalikes/ artifact: brand × ccTLD heatmap collapsed to top-pairs view (the full grid is sparse pre-flag-defensive). New /hosting/ artifact: top-ASN-by-impostor table (joins resolves_a × ip_intel.asn). Site-wide pipeline-status strip above the footer. New API surface: /v1/lookalikes/{brand-tld-matrix, brand-summary, feeds, operators, daemons, top-asn, dns/{d}}, /v1/public/{domain, registrar} (no-auth alias for the catch-all pages). /certs/ + CertStream daemon + crt.sh poller decommissioned. New external dataset mirror: 16 feeds in /data/domaindefender/external_feeds/<DATE>/ on cb1, with manifest.json + daily refresh. All headline counts on the site are now live-fetched from the API rather than baked at build time.

2026-04-25 research

Brand-impersonation surface + blind-spot measurement live

Population-scale lookalikes pipeline shipped: 42 brands × 4 detection layers (TLD-squat, DL-1, homoglyph, combosquat) joined against 1,081 gTLDs ⇒ 40,350 real apex domains in zone files we track. Re-resolved every match over DoH (31,339 live, 77.7%). Cross-referenced against PhishTank, URLhaus, CryptoScamDB, OpenPhish: only 50 of 40,350 (0.12%) appear in any abuse feed - the 99.88% blind-spot is the headline finding. New pages: /lookalikes/, per-brand SSG /lookalikes/<brand>/ ×42, /eval/, /infra/, /hosting/, /pipeline/, /datasets/, /recent/, /keywords/. New API surface: /v1/lookalikes/{summary, brand, blindspot, infra, recent, search, hosting, by-tld, methodology, campaigns, export.csv}. Daily refresh timer at 06:00. CertStream daemon + crt.sh fallback poller installed (both currently waiting on upstream).

2026-04-24 milestone

DomainDefender Intelligence API (v1) — internal alpha

FastAPI service live with 12+ endpoints under /v1/: domain lookup, TLD and registrar aggregates, country stats, multi-facet search with cursor pagination, and the lifecycle endpoints (fresh / expiring / pending-delete / stale) that are DomainDefender's core differentiator against infrastructure-graph vendors. X-API-Key auth with SHA-256 hashed keys, per-tier rate limits, OpenAPI docs at /v1/docs, systemd-managed. First key issued to internal owner. Commercial tiers, Stripe, and public signup deferred pending IP / commercialization review.

2026-04-24 infra

Cloudflare Access gate

Public URL now sits behind Cloudflare Access: visitors authenticate via a one-time PIN sent to their email, then the allow-list policy checks the email against an internal whitelist. Access automated end-to-end via CF API; session duration 24h.

2026-04-24 data

Three.js live wave-grid background

A full-viewport canvas behind every page renders ~3,000 cyan-purple particles animated with double sine waves, additive blending, fog, and gentle mouse parallax. Respects prefers-reduced-motion and pauses on hidden tabs. Inspired by mtd-playground-demo.vercel.app's Three.js plane.

2026-04-24 data

Platform page redesign

Replaces the text-heavy layer list with numbered layer cards, per-layer icons, live DB counts per layer, and an animated SVG data-flow strip showing records moving through the five pipeline stations. Headline honesty banner reflects the crawlbox2 → crawlbox1 migration plan.

2026-04-24 infra

Deploy pipeline + systemd service + atomic-swap builds

Preview now runs as a systemd user service with linger enabled, so it survives reboots. Build script uses an atomic dist/ swap so rebuilds never blank the live site. wrangler CLI installed and deploy script ready to push dist/ to Cloudflare Pages once credentials land.

2026-04-24 infra

Storage plan: everything targets crawlbox1

Decided all data (2a zone_file, 2b domain_metadata, future 2c+) will live on crawlbox1:27018. crawlbox2 stays the current source until Run 3 finishes, then a mongodump + scp + mongorestore migration runs via scripts/migrate_mongo_to_crawlbox1.sh. Team env_setup.sh files updated to a single unified MONGO_* block.

2026-04-24 data

Interactive world map with real country polygons

Masthead gets a compact rotating globe with drag-to-rotate, a lens switcher (All / Fresh 30d / Expiring / Pending), and click-for-popover showing top registrars, top TLDs, DNSSEC %, and lens counts per country. Full /worldmap section upgraded to real country outlines from Natural Earth 110m in both flat and globe views.

2026-04-24 data

Platform page redesign

Replaces the text-heavy layer list with numbered layer cards, per-layer icons, live record counts from the database, and an animated data-flow strip showing records crossing the five pipeline stations.

2026-04-24 data

Real-schema honesty pass

Schema introspection revealed several components assumed fields that don't exist in domain_metadata (asn, hosting_country, rdap_server, ip_addresses). CloudProviders, RegistrarASNFlow, and PerRegistryThrottle components were removed; WorldMap, HostingGeography, and the country x TLD heatmap switched to the real registrant_country field with an ISO-3166 filter; schema and per-domain deep-link pages corrected.

2026-04 data

Naming-pattern and EPP-status views land

Public aggregations for SLD length, hyphen/numeric/IDN patterns, and EPP status distribution are derived directly from the dataset at build time.

2026-04 milestone

Open research site goes public

Project platform, methodology, and live dataset slices are published openly. Dataset access is available to researchers on request.

2026-03 infra

Registry-aware RDAP orchestration

Lookup workers moved to per-registry daily quota tracking with adaptive backoff. Identity-Digital-gated TLDs now complete without triggering WAF bans.

2026-03 data

Nested registrar-entity parsing

Registrar names that only appear in sub-entity vCards (e.g. .berlin) are now extracted correctly. Backfill reached 4,925 previously-unlabelled records.

2026-02 infra

Primary Mongo moves to crawlbox1

New MongoDB 7.0 node provisioned as the primary data store. Migration from the bootstrap crawlbox scheduled post-run-3.

2026-02 research

Zone-wide lookup pass

First dataset-wide data refresh targeting the full CZDS footprint. Covers RDAP registration facts for every reachable TLD.

2026-01 data

DNSSEC, EPP status, and expiry indexed

Record shape extended to carry DNSSEC flag, full EPP status vector, and expiry dates. Opens the door to lifecycle and abuse-lock analyses.

2025-12 infra

Batch IP-to-ASN lookup

Hosting columns populated via ip-api batch endpoint for every zone-published A record, with daily quota budgeting.

2025-11 milestone

First end-to-end data pass

Zone collection → RDAP lookup → IP/ASN lookup → Mongo persistence completes on a single TLD sample end-to-end.

2025-10 milestone

Project inception

DomainDefender chartered as an open research platform for DNS lifecycle measurement.

Pipeline
loading…