--- name: claw-email-parser description: Wraps the email-extractor MCP tool; orchestrates fetch → extract → write canonical Email JSON. The thin LLM layer that decides what to do with low-confidence extractions. --- # claw-email-parser ## Purpose Atlas's intake skill. Given a date range, pull and extract emails into canonical JSON, surface anything the rule-based extractor was uncertain about, and let the LLM make a judgment call (or punt to the boss). ## Inputs - `since`: ISO date — e.g. `2025-05-09` (V0 default = 12 months ago) - `until`: ISO date — e.g. today - `mode`: `full_backfill` | `incremental` (incremental reads `state/.last_sync`) ## Outputs - N × `state/extracted/YYYY-MM//.json` - Updated `state/.last_sync` - Run summary in `state/runs/YYYY-MM-DD.extract.json` with counts: fetched, extracted, failed, low_confidence_intents, new_customer_domains, new_alias_collisions ## Judgment Rules (LLM layer) The MCP tool handles 95% mechanically. The LLM layer handles: 1. **Intent classification ambiguity** (confidence < 0.6) — read the message and call it 2. **Customer domain disambiguation** — `support@notify.clientco.com` vs `wang@clientco.com` — same customer? boss-confirm or auto-merge based on org-name presence 3. **Alias merging proposals** — when the same human appears under 2+ identities, propose a merge with evidence (signature line match, project overlap) 4. **Dequote escape hatch** — if regex strategies leave a clearly garbage `body_text_clean` (e.g., 90% punctuation), retry with LLM-based dequoting ## Failure Modes | Failure | Behavior | |---------|---------| | Email server unreachable | Retry with backoff; if 3 retries fail, write failure to run summary, exit gracefully | | Single message extraction fails | Skip + log, do not abort run | | Quota / rate limit | Exponential backoff; checkpoint progress to resume next run | | LLM call fails | Mark intent = `unknown`, low_confidence = true, continue | ## Sample I/O **Input:** Run incremental from `state/.last_sync` = 2026-05-08T00:00:00Z **Output (run summary):** ```json { "run_id": "2026-05-09T07:30:00Z-extract", "fetched": 47, "extracted_ok": 45, "extraction_failed": 2, "low_confidence_intents": 4, "new_customer_domains": ["@newprospect.io"], "new_alias_collisions": 1, "duration_ms": 31200 } ``` ## Dependencies - MCP tool: `email-extractor` (see `mcp-tools/email-extractor.md`) - State: read `state/people/aliases.json`, `state/customers/domain_map.json`