assistant-claw/skills/claw-email-parser/SKILL.md
Atlas refactor bd0be97630 Refactor: drop Vega framing, promote Atlas to repo root
This repo IS Atlas (总助 Claw / 老板视角项目执行雷达). The earlier
two-profile framing (Atlas + Vega placeholder) was a misread — Vega is
the agent persona answering Multica issues, not the product. Vega has
no relationship to assistant-claw the product.

Changes:
- Move atlas/* to top-level (git mv preserves history)
- Remove empty Vega placeholders prompts/.gitkeep, tools/.gitkeep
- Delete atlas/ wrapper directory (now empty)
- Update path references in INTEGRATION-hermes.md, scripts/mirror-...sh,
  docs/decisions/0001-mirror-nuwa-skill.md
- Rewrite README.md as Atlas-only, remove dual-profile language

After this commit:
- Top-level OpenClaw 8 files (IDENTITY/SOUL/USER/AGENTS/TOOLS/MEMORY/
  BOOTSTRAP/HEARTBEAT + CLAUDE symlink + zh-CN mirrors)
- skills/{6 sub-skills + DESCRIPTION + README}
- mcp-tools/{spec + Python implementation}
- state-schemas/{project, person, customer + README}
- autopilots/{5 atlas-*.yaml}
- client-deck/, docs/decisions/, scripts/

The ~/.hermes/skills/atlas/ destination convention preserved (atlas as
a skill namespace on the operator's machine, distinct from source path).
2026-05-09 17:54:18 +08:00

72 lines
2.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
name: claw-email-parser
description: Wraps the email-extractor MCP tool; orchestrates fetch → extract → write canonical Email JSON. The thin LLM layer that decides what to do with low-confidence extractions.
version: 0.1.0
author: Moments / Atlas team
license: MIT
metadata:
hermes:
category: atlas
tags: [email, imap, extraction, atlas, intake]
related_skills: [claw-project-tracker, claw-people-observer, claw-customer-radar]
---
# claw-email-parser
## Purpose
Atlas's intake skill. Given a date range, pull and extract emails into canonical JSON, surface anything the rule-based extractor was uncertain about, and let the LLM make a judgment call (or punt to the boss).
## Inputs
- `since`: ISO date — e.g. `2025-05-09` (V0 default = 12 months ago)
- `until`: ISO date — e.g. today
- `mode`: `full_backfill` | `incremental` (incremental reads `state/.last_sync`)
## Outputs
- N × `state/extracted/YYYY-MM/<thread_id>/<msg_id>.json`
- Updated `state/.last_sync`
- Run summary in `state/runs/YYYY-MM-DD.extract.json` with counts: fetched, extracted, failed, low_confidence_intents, new_customer_domains, new_alias_collisions
## Judgment Rules (LLM layer)
The MCP tool handles 95% mechanically. The LLM layer handles:
1. **Intent classification ambiguity** (confidence < 0.6) read the message and call it
2. **Customer domain disambiguation** `support@notify.clientco.com` vs `wang@clientco.com` same customer? boss-confirm or auto-merge based on org-name presence
3. **Alias merging proposals** when the same human appears under 2+ identities, propose a merge with evidence (signature line match, project overlap)
4. **Dequote escape hatch** if regex strategies leave a clearly garbage `body_text_clean` (e.g., 90% punctuation), retry with LLM-based dequoting
## Failure Modes
| Failure | Behavior |
|---------|---------|
| Email server unreachable | Retry with backoff; if 3 retries fail, write failure to run summary, exit gracefully |
| Single message extraction fails | Skip + log, do not abort run |
| Quota / rate limit | Exponential backoff; checkpoint progress to resume next run |
| LLM call fails | Mark intent = `unknown`, low_confidence = true, continue |
## Sample I/O
**Input:** Run incremental from `state/.last_sync` = 2026-05-08T00:00:00Z
**Output (run summary):**
```json
{
"run_id": "2026-05-09T07:30:00Z-extract",
"fetched": 47,
"extracted_ok": 45,
"extraction_failed": 2,
"low_confidence_intents": 4,
"new_customer_domains": ["@newprospect.io"],
"new_alias_collisions": 1,
"duration_ms": 31200
}
```
## Dependencies
- MCP tool: `email-extractor` (see `mcp-tools/email-extractor.md`)
- State: read `state/people/aliases.json`, `state/customers/domain_map.json`