assistant-claw/atlas/skills/claw-email-parser/SKILL.md
Vega (Atlas scaffolding) ce9f27320a Add Atlas profile under atlas/ — boss-perspective project execution radar
This adds the full Atlas (总助 Claw / 老板视角项目执行雷达) scaffolding as a
sibling profile to the existing Vega general-purpose assistant. All Atlas content
lives under atlas/ to keep the existing top-level skeleton intact.

What's included:

- atlas/IDENTITY.md, SOUL.md, USER.md, AGENTS.md, MEMORY.md, BOOTSTRAP.md,
  HEARTBEAT.md, TOOLS.md (+ zh-CN mirrors) — full OpenClaw 8-piece set
  matching the zero-cca convention
- atlas/skills/ — 6 sub-skills with frontmatter:
  claw-email-parser / claw-project-tracker / claw-people-observer /
  claw-customer-radar / claw-boss-distiller / claw-report-writer
- atlas/skills/claw-boss-distiller/ — adapter notes for nuwa-skill, 5-layer
  boss_skill seed template (23 rules across Expression DNA / Mental Models /
  Decision Heuristics / Anti-Patterns / Honest Boundaries), and a complete
  synthetic distillation demo (10 input emails -> validated 5-layer output)
- atlas/mcp-tools/email-extractor/ — Python implementation of stages 1-3
  (fetch + decode + dequote), 7 pytest tests passing, CLI: atlas-extract
- atlas/state-schemas/ — formal JSON schemas for project / person / customer
  cards with the no-employee-rating hard constraint baked in
- atlas/client-deck/ — 2-page client-facing pitch document
- autopilots/atlas-*.yaml — 5 autopilot configs (daily / weekly / monthly /
  quarterly + andon event-triggered) for a future Multica-side scheduler

Notes:

- nuwa-skill (MIT, https://github.com/alchaincyf/nuwa-skill) NOT vendored;
  fetch at deploy time via instructions in
  atlas/skills/claw-boss-distiller/upstream/README.md
- Vega-side prompts/skills/tools/autopilots/docs scaffold left untouched
- Top-level README.md updated with a brief Atlas pointer; rest preserved
2026-05-09 17:00:29 +08:00

64 lines
2.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
name: claw-email-parser
description: Wraps the email-extractor MCP tool; orchestrates fetch → extract → write canonical Email JSON. The thin LLM layer that decides what to do with low-confidence extractions.
---
# claw-email-parser
## Purpose
Atlas's intake skill. Given a date range, pull and extract emails into canonical JSON, surface anything the rule-based extractor was uncertain about, and let the LLM make a judgment call (or punt to the boss).
## Inputs
- `since`: ISO date — e.g. `2025-05-09` (V0 default = 12 months ago)
- `until`: ISO date — e.g. today
- `mode`: `full_backfill` | `incremental` (incremental reads `state/.last_sync`)
## Outputs
- N × `state/extracted/YYYY-MM/<thread_id>/<msg_id>.json`
- Updated `state/.last_sync`
- Run summary in `state/runs/YYYY-MM-DD.extract.json` with counts: fetched, extracted, failed, low_confidence_intents, new_customer_domains, new_alias_collisions
## Judgment Rules (LLM layer)
The MCP tool handles 95% mechanically. The LLM layer handles:
1. **Intent classification ambiguity** (confidence < 0.6) read the message and call it
2. **Customer domain disambiguation** `support@notify.clientco.com` vs `wang@clientco.com` same customer? boss-confirm or auto-merge based on org-name presence
3. **Alias merging proposals** when the same human appears under 2+ identities, propose a merge with evidence (signature line match, project overlap)
4. **Dequote escape hatch** if regex strategies leave a clearly garbage `body_text_clean` (e.g., 90% punctuation), retry with LLM-based dequoting
## Failure Modes
| Failure | Behavior |
|---------|---------|
| Email server unreachable | Retry with backoff; if 3 retries fail, write failure to run summary, exit gracefully |
| Single message extraction fails | Skip + log, do not abort run |
| Quota / rate limit | Exponential backoff; checkpoint progress to resume next run |
| LLM call fails | Mark intent = `unknown`, low_confidence = true, continue |
## Sample I/O
**Input:** Run incremental from `state/.last_sync` = 2026-05-08T00:00:00Z
**Output (run summary):**
```json
{
"run_id": "2026-05-09T07:30:00Z-extract",
"fetched": 47,
"extracted_ok": 45,
"extraction_failed": 2,
"low_confidence_intents": 4,
"new_customer_domains": ["@newprospect.io"],
"new_alias_collisions": 1,
"duration_ms": 31200
}
```
## Dependencies
- MCP tool: `email-extractor` (see `mcp-tools/email-extractor.md`)
- State: read `state/people/aliases.json`, `state/customers/domain_map.json`