assistant-claw/atlas/skills/claw-email-parser/SKILL.md
Vega (Atlas iteration) 04e197c896 Hermes-compatible skill format + nuwa mirror prep + README rewrite
Three things in this commit:

1. Atlas skills now agentskills.io / Hermes-compatible
   - Each atlas/skills/claw-*/SKILL.md frontmatter enriched with version,
     author, license, and metadata.hermes block (tags, category,
     related_skills, boundaries)
   - New atlas/skills/DESCRIPTION.md per Hermes category convention
   - New atlas/INTEGRATION-hermes.md — step-by-step SOP to install Atlas
     onto hermes-agent runtime (cp skills, fetch nuwa upstream, configure
     env, wire cron, smoke test). Documents the branding override and
     self-improving-loop guardrail.

2. nuwa-skill mirror prep (waiting on org-repo creation)
   - scripts/mirror-nuwa-to-moments.sh — one-shot bare-clone + push --mirror
   - docs/decisions/0001-mirror-nuwa-skill.md — ADR explaining the why,
     the bot-token scope limitation, and the manual one-time repo creation
     step required at https://git.moments.top/repo/create

3. README rewrite
   - Atlas-forward navigation table ("想做什么 → 看哪里")
   - Quickstart sections for browsing, running tests locally, fetching
     nuwa upstream (public + air-gapped variants), and Hermes integration
   - Preserved all original Vega working agreements
   - Roadmap with explicit Atlas / Vega tracks

Bot account (multica-bot) lacks write:organization scope so cannot create
the nuwa-skill repo via API. After human creates the empty repo at
git.moments.top/Moments.top/nuwa-skill, run scripts/mirror-nuwa-to-moments.sh
to populate it.
2026-05-09 17:21:31 +08:00

72 lines
2.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
name: claw-email-parser
description: Wraps the email-extractor MCP tool; orchestrates fetch → extract → write canonical Email JSON. The thin LLM layer that decides what to do with low-confidence extractions.
version: 0.1.0
author: Moments / Atlas team
license: MIT
metadata:
hermes:
category: atlas
tags: [email, imap, extraction, atlas, intake]
related_skills: [claw-project-tracker, claw-people-observer, claw-customer-radar]
---
# claw-email-parser
## Purpose
Atlas's intake skill. Given a date range, pull and extract emails into canonical JSON, surface anything the rule-based extractor was uncertain about, and let the LLM make a judgment call (or punt to the boss).
## Inputs
- `since`: ISO date — e.g. `2025-05-09` (V0 default = 12 months ago)
- `until`: ISO date — e.g. today
- `mode`: `full_backfill` | `incremental` (incremental reads `state/.last_sync`)
## Outputs
- N × `state/extracted/YYYY-MM/<thread_id>/<msg_id>.json`
- Updated `state/.last_sync`
- Run summary in `state/runs/YYYY-MM-DD.extract.json` with counts: fetched, extracted, failed, low_confidence_intents, new_customer_domains, new_alias_collisions
## Judgment Rules (LLM layer)
The MCP tool handles 95% mechanically. The LLM layer handles:
1. **Intent classification ambiguity** (confidence < 0.6) read the message and call it
2. **Customer domain disambiguation** `support@notify.clientco.com` vs `wang@clientco.com` same customer? boss-confirm or auto-merge based on org-name presence
3. **Alias merging proposals** when the same human appears under 2+ identities, propose a merge with evidence (signature line match, project overlap)
4. **Dequote escape hatch** if regex strategies leave a clearly garbage `body_text_clean` (e.g., 90% punctuation), retry with LLM-based dequoting
## Failure Modes
| Failure | Behavior |
|---------|---------|
| Email server unreachable | Retry with backoff; if 3 retries fail, write failure to run summary, exit gracefully |
| Single message extraction fails | Skip + log, do not abort run |
| Quota / rate limit | Exponential backoff; checkpoint progress to resume next run |
| LLM call fails | Mark intent = `unknown`, low_confidence = true, continue |
## Sample I/O
**Input:** Run incremental from `state/.last_sync` = 2026-05-08T00:00:00Z
**Output (run summary):**
```json
{
"run_id": "2026-05-09T07:30:00Z-extract",
"fetched": 47,
"extracted_ok": 45,
"extraction_failed": 2,
"low_confidence_intents": 4,
"new_customer_domains": ["@newprospect.io"],
"new_alias_collisions": 1,
"duration_ms": 31200
}
```
## Dependencies
- MCP tool: `email-extractor` (see `mcp-tools/email-extractor.md`)
- State: read `state/people/aliases.json`, `state/customers/domain_map.json`