Spec — Comp Analysis page (NEW)

"Five comparables + ARV + 70%-rule offer, in ~3 seconds." The PropStream replacement. Prototype: panel-comps in reference/closer/public/v2/index.html.

UX flow

User enters a subject address → clicks Run comps →.
Status line streams: Geocoding address… → Fetching candidates… → AI selecting comps… → done.
Results render:
- ARV bar: low/high band (e.g. $385K–$442K), 5 comp dots plotted by sale price, a 70%-rule max-offer marker ($268K), confidence 4/5 bars, footnote n=5 · 0.4mi · 97d avg age.
- AI "Deal read" prose summary (streamed, --accent styling).
- Selected comparables grid (5 of N candidates): each card → address, beds/baths, sqft, distance (mi), sold price, sold age (23d ago), and an adjustment delta (+$3.4K / −$11K, bull/bear colored).
Right sidebar: Subject property stats, This month usage (comps run, avg time 2.8s, avg API cost $0.11, saved-to-leads), Recent searches.

Backend pipeline (Cloudflare Worker)

geocode → fetch sold comps (ATTOM / external) → AI selects best 5 + writes deal-read (Anthropic) → compute ARV band + 70%/80% offer → persist + cache. Graceful degrade to mock data if an external call fails or times out. Track api_cost_cents and generation_time_ms per query.

ARV math (server): median of adjusted comp $/sqft × subject sqft → band = [median×~0.93, median×~1.04]; offer_70pct = arv_median × 0.70.

D1 schema (new migration)

comp_queries(id, user_id, input_address, geocoded_lat, geocoded_lng, status[pending|geocoded|candidates_fetched|ai_selected|complete|failed], error_message, total_candidates, selected_count, api_cost_cents, generation_time_ms, created_at) — idx (user_id, created_at DESC).
comparables(id, subject_lead_id?, comp_query_id, comp_address, comp_city, comp_state, comp_zip, comp_property_type, beds, baths, sqft, year_built, lot_size_sqft, sold_price_cents, sold_date, days_since_sold, distance_miles, adjustment_cents, data_source, external_mls_id, created_at) — idx (comp_query_id), (subject_lead_id, sold_date DESC).
arv_estimates(id, subject_lead_id?, comp_query_id, arv_low_cents, arv_high_cents, arv_median_cents, confidence_score[0-100], confidence_breakdown(JSON), offer_70pct_cents, offer_80pct_cents, rehab_estimate_cents, market_summary, comps_used_count, radius_miles, created_at, expires_at) — TTL ~30d.
leads augment: comps_run_count INT DEFAULT 0, last_arv_estimate_id TEXT, last_comp_query_date TEXT.

API (`/api/v1`)

POST /comps/run — {address, radius_miles?=0.5, comp_count?=5, min_quality_score?} → {query_id, subject{…}, comps[…], arv{low,high,median}, offer_70pct_cents, confidence, market_summary, api_cost_cents, generation_time_ms}. Auth required.
GET /comps/:queryId — re-fetch a completed analysis (cache hit, no re-query).
GET /leads/:leadId/comps — history for a lead (paginated).
GET /user/comps/history?limit=10 — recent searches (sidebar).
GET /user/comps/stats — month/YTD usage (sidebar), 1h cache.
POST /comps/:queryId/save — {lead_id?, notes?} → links to lead, bumps comps_run_count.

Secrets / env

ATTOM_API_KEY (or chosen comps source), ANTHROPIC_API_KEY — into .dev.vars locally. Prototype referenced an external https://biglead.velli.cc/api/comps; the real impl should run inside the soldi worker so cost/latency are tracked.

Build notes

Slice 1 can ship against a deterministic mock provider (seeded comps) so the full UI + persistence is verifiable offline; swap in the live ATTOM+AI provider behind the same POST /comps/run contract in a follow-up slice.
Stream the AI deal-read via chunked response or poll GET /comps/:queryId.

Implementation plan — live provider (ROADMAP slice 5)

Added 2026-07-06. Status: NOT BUILT — production runs the deterministic mock. This section is the implementation-ready plan; a cold-start agent should need nothing beyond this file + the repo.

Ground truth today (verified 2026-07-06)

The mock provider is app/worker/comps.ts: pure, deterministic (FNV-1a seed from the normalized address → mulberry32 PRNG), no D1/external calls, exports the ARV/offer factor constants above. The Comps page + persistence already work end to end against it, live.
Neither ATTOM_API_KEY nor ANTHROPIC_API_KEY exists anywhere — not in the repo-root .env, not in app/wrangler.jsonc, and (per prior secret-list proofs) not on the soldi worker. Procuring both keys is the human prerequisite for this slice — an ATTOM plan must be purchased/trialed and an Anthropic key issued before live work can be proven.
The old prototype's external biglead.velli.cc/api/comps endpoint is explicitly NOT the plan; the pipeline runs inside the soldi worker so api_cost_cents/generation_time_ms are first-party data.

Provider contract (the seam)

Define one interface both providers implement, so the route never knows which ran:

interface CompsProvider {
  run(input: { address: string; radiusMiles: number; compCount: number }):
    Promise<{
      subject: SubjectFacts;            // geocoded + property facts
      comps: Comparable[];              // exactly compCount, adjusted
      arv: { lowCents: number; highCents: number; medianCents: number };
      offer70Cents: number; offer80Cents: number;
      confidence: number;               // 0–100
      marketSummary: string;            // the AI deal read (may be templated in mock)
      meta: { provider: 'mock' | 'attom_ai'; apiCostCents: number;
              generationTimeMs: number; totalCandidates: number };
    }>;
}

Extract the current mock behind this interface first (pure refactor, tests stay green), then add attom-provider.ts. The ARV/offer math (median adjusted $/sqft × subject sqft; ×0.93/×1.04 band; ×0.70/×0.80 offers) stays server-side shared code used by both providers — the AI selects comps and writes prose; it never does the arithmetic.

Live pipeline (inside the worker, per request)

Geocode + subject facts — ATTOM property detail endpoint by address (ATTOM returns lat/lng + beds/baths/sqft in one call; no separate geocoder needed in v1).
Candidate fetch — ATTOM sales-comparables/recent-sales within radius_miles, sold within ~12 months, cap ~25 candidates.
AI selection + deal read — one Anthropic Messages call (small/fast model tier; temperature 0; JSON tool-output for the selected 5 comp ids + per-comp adjustment cents + confidence + prose summary). Prompt includes the candidate table + subject facts only — no PII beyond addresses.
Math + persist — shared ARV/offer math over the AI-selected comps; write comp_queries/comparables/arv_estimates rows exactly as the mock path does, with data_source='attom' and real api_cost_cents.

Cost & latency tracking (first-class, not logging)

api_cost_cents = ATTOM per-call cost (config: ATTOM_COST_CENTS_PER_CALL var, since ATTOM bills by plan) + Anthropic cost computed from the response usage token counts × model rates (keep the rate table in one const).
generation_time_ms = wall-clock around the full pipeline; also record per-stage timings inside comp_queries.status transitions (geocoded → candidates_fetched → ai_selected → complete).
The existing sidebar stats (GET /user/comps/stats) already surface these — no UI work needed beyond truthful numbers.
Budget guard: per-user daily cap (config COMPS_DAILY_CAP, default 25 live runs/day) → 429 with a clear error; cache hits (GET /comps/:queryId) are free and uncapped.

Fallback-to-mock behavior (hard requirement)

Wrap each external stage in a timeout (geocode/candidates 5s, AI 20s).
On any external failure/timeout or when either key is unset, fall back to the deterministic mock for the whole run and persist honestly: meta.provider='mock', data_source='mock', api_cost_cents=0, and surface a "sample data" badge in the UI (the response already carries provider meta). Never blend live + mock data in one result; never fail the request because an external dependency hiccuped.
Config gate mirrors Stripe's pattern: compsLiveConfigured(env) = both keys present. Local dev without keys behaves exactly as today.

Env / secrets

ATTOM_API_KEY, ANTHROPIC_API_KEY → .dev.vars locally, wrangler secret put for production (names-only proof in receipts, as with Stripe).
Vars (non-secret, in wrangler.jsonc): ATTOM_COST_CENTS_PER_CALL, COMPS_DAILY_CAP, optional COMPS_LIVE_ENABLED kill switch so deploy ≠ spend.

Test plan

Unit: ATTOM response mapper (their JSON → Comparable, integer cents, null-safe sqft/price); Anthropic tool-output parser (rejects malformed JSON, clamps adjustments); cost computation from usage counts; shared ARV math unchanged (existing tests keep passing).
Integration (worker tests, fetch mocked): live path happy case persists the same row shapes as mock; each failure stage (geocode 5xx, zero candidates, AI timeout, AI garbage output) falls back to mock with provider='mock' and 200; missing keys → mock without any outbound call; daily cap → 429; cache re-fetch does not re-query.
Programmatic: bun run verify; wrangler secret list shows both key names; one live smoke against real ATTOM/Anthropic keys with a known address recording actual cost + latency into the receipt (this is the only stage that spends money — cap it at ≤5 calls).
Browser: run comps on the live app for one address — status line streams stages, ARV band + deal read render, sidebar cost/latency show real values, and a forced-failure address (or key temporarily withheld in a preview env) shows the sample-data badge. Desktop + 375px, zero console errors.

Out of scope for this slice

Streaming the deal-read token-by-token (poll GET /comps/:queryId is fine in v1); rehab estimation; non-ATTOM sources; comp photos.