assistant-claw/atlas/skills/claw-email-parser/SKILL.md

---
name: claw-email-parser
description: Wraps the email-extractor MCP tool; orchestrates fetch → extract → write canonical Email JSON. The thin LLM layer that decides what to do with low-confidence extractions.
---

# claw-email-parser

## Purpose

Atlas's intake skill. Given a date range, pull and extract emails into canonical JSON, surface anything the rule-based extractor was uncertain about, and let the LLM make a judgment call (or punt to the boss).

## Inputs

- `since`: ISO date — e.g. `2025-05-09` (V0 default = 12 months ago)
- `until`: ISO date — e.g. today
- `mode`: `full_backfill` | `incremental` (incremental reads `state/.last_sync`)

## Outputs

- N × `state/extracted/YYYY-MM/<thread_id>/<msg_id>.json`
- Updated `state/.last_sync`
- Run summary in `state/runs/YYYY-MM-DD.extract.json` with counts: fetched, extracted, failed, low_confidence_intents, new_customer_domains, new_alias_collisions

## Judgment Rules (LLM layer)

The MCP tool handles 95% mechanically. The LLM layer handles:

1. **Intent classification ambiguity** (confidence < 0.6) — read the message and call it
2. **Customer domain disambiguation** — `support@notify.clientco.com` vs `wang@clientco.com` — same customer? boss-confirm or auto-merge based on org-name presence
3. **Alias merging proposals** — when the same human appears under 2+ identities, propose a merge with evidence (signature line match, project overlap)
4. **Dequote escape hatch** — if regex strategies leave a clearly garbage `body_text_clean` (e.g., 90% punctuation), retry with LLM-based dequoting

## Failure Modes

| Failure | Behavior |
|---------|---------|
| Email server unreachable | Retry with backoff; if 3 retries fail, write failure to run summary, exit gracefully |
| Single message extraction fails | Skip + log, do not abort run |
| Quota / rate limit | Exponential backoff; checkpoint progress to resume next run |
| LLM call fails | Mark intent = `unknown`, low_confidence = true, continue |

## Sample I/O

**Input:** Run incremental from `state/.last_sync` = 2026-05-08T00:00:00Z

**Output (run summary):**
```json
{
  "run_id": "2026-05-09T07:30:00Z-extract",
  "fetched": 47,
  "extracted_ok": 45,
  "extraction_failed": 2,
  "low_confidence_intents": 4,
  "new_customer_domains": ["@newprospect.io"],
  "new_alias_collisions": 1,
  "duration_ms": 31200
}
```

## Dependencies

- MCP tool: `email-extractor` (see `mcp-tools/email-extractor.md`)
- State: read `state/people/aliases.json`, `state/customers/domain_map.json`