Interacts with live Hunk diff review sessions via CLI. Inspects review focus, navigates files and hunks, reloads session contents, and adds inline review comments. Use when the user has a Hunk session running or wants to review diffs interactively.
Manage persistent coding sessions across Claude Code, Codex, Gemini, and Cursor engines. Use when orchestrating multi-engine coding agents, starting/sending/stopping sessions, running multi-agent council collaborations, cross-session messaging, ultraplan deep planning, ultrareview parallel code review, or switching models/tools at runtime. Triggers on "start a session", "send to session", "run council", "ultraplan", "ultrareview", "switch model", "multi-agent", "coding session", "session inbox", "cursor agent".
Analyze Claude Code session bloat — shows token count, context usage %, and bloat breakdown. Use when the user asks about session size, context usage, or when you notice the context window is getting full.
Query previous pi sessions to retrieve context, decisions, code changes, or other information. Use when you need to look up what happened in a parent session or any other session file.
3-tier agent memory system with 5-level compaction tree. OpenClaw version. Defines session start protocol, end-of-task checkpoints, and memory file management. MUST be followed every session.
Evaluate and score agent behavior against a golden reference. Use this skill whenever the user wants to run evaluation, check pass/fail status, understand metric scores, compare sessions for regressions, validate agent behavior, or score a trace from a file or a live session. Trigger on phrases like "eval this trace", "check my agent output", "did my agent do the right thing", "compare runs", "did my agent regress", "score session X", "evaluate against golden", "run evals". Works with both local trace files and live streaming sessions. --- Evaluate agent behavior and explain what the scores mean. ## Determine the input type First, figure out what to evaluate: - **Trace file(s)** — user mentions a `.json` or `.jsonl` file path → use `evaluate_traces` - **Sessions vs golden** — user has multiple live sessions and wants regression testing → use `evaluate_sessions` - **Single live session** — user wants to score one session against a golden eval set → guide them to use `evaluate_sessions` with one session as golden ## Evaluating trace files 1. Get the file path(s). Check the extension: `.jsonl` → `trace_format: "otlp-json"` | `.json` → `"jaeger-json"` (default) 2. Ask if they have a golden eval set JSON. For `tool_trajectory_avg_score` (the default metric), an eval set is required — it provides the expected tool call sequence to compare against. If they don't have one yet, explain this and suggest starting with `hallucinations_v1`, or ask if they want to create a golden set from a reference run first. 3. Call `evaluate_traces` with the file(s), format, and eval set. 4. Present results as a score table (see Score interpretation below) and explain failures. ## Evaluating sessions (regression testing) This workflow requires the server to be running with the `--dev` flag (which enables WebSocket and session streaming). Plain `agentevals serve` will not have sessions. If you get a connection error from any tool below, tell the user: ```bash uv run agentevals serve --dev ```
Monitors context window health throughout a session and rides peak context quality for maximum output fidelity. Activates automatically after plan-interview and intent-framed-agent. Stays active through execution and hands off cleanly to simplify-and-harden and self-improvement when the wave completes naturally or exits via handoff. Use this skill whenever a multi-step agent task is underway and session continuity or context drift is a concern. Especially important for long-running tasks, complex refactors, or any work where degraded context would silently corrupt the output. Trigger even if the user doesn't say "context surfing" — if an agent task is running across multiple steps with intent and a plan already established, this skill is live. --- # Context Surfing ## Install ```bash npx skills add pskoett/pskoett-ai-skills/skills/context-surfing ``` The agent rides the wave of peak context. When the wave crests, it commits. When it detects drift, it pulls out cleanly — saving state, handing off, and letting the next session catch the next wave. No wipeouts. No zombie sessions. Only intentional, high-fidelity execution. --- ## Mental Model
- 📁 scripts/
- 📁 templates/
- 📄 config.yml
- 📄 SKILL.md
Capture current session transcript to workspace history. Use at session end or when preserving conversation context.
- 📁 .claude-plugin/
- 📄 SKILL.md
Query Claude Code session analytics from ccrecall database. Use when user asks about token usage, session history, or wants to analyze their Claude Code usage patterns.
- 📁 scripts/
- 📄 README.md
- 📄 SKILL.md
读取 Codex 的单个 session/thread;当已知 thread id 且需要查看或摘要会话内容时使用。
- 📄 SKILL.md
- 📄 SKILL.md.meta.json
Wait for CI to settle across all repos in a Polygraph session, then report results and investigate failures. USE WHEN user says "await polygraph", "wait for polygraph ci", "polygraph ci status", "check polygraph ci", "watch polygraph session", "monitor polygraph".
Your next session starts cold. No memory of what you built, what broke, what you decided. Every signal you write is a gift to that future session. The richer the signal, the less time re-learning.