Research industry standards and best practices, identify viable approaches for a given technical or architectural problem, and produce a structured factual comparison against project-specific constraints. Reports options — does not decide.
- 📁 references/
- 📁 templates/
- 📄 .skill-source.json
- 📄 SKILL.md
Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.
Use when writing test fixtures for @copilotkit/llmock — mock LLM responses, tool call sequences, error injection, multi-turn agent loops, embeddings, structured output, sequential responses, or debugging fixture mismatches
Use when modifying the Codex SPECTRE install flow, SessionStart continuity, project skill syncing, registry injection, or Codex-specific runtime files.
Refactor bloated AGENTS.md, CLAUDE.md, or similar agent instruction files to follow progressive disclosure principles. Splits monolithic files into organized, linked documentation.
- 📁 references/
- 📄 _meta.json
- 📄 SKILL.md
Automatically update Clawdbot and all installed skills once daily. Runs via cron, checks for updates, applies them, and messages the user with a summary of what changed.
- 📁 references/
- 📁 scripts/
- 📄 SKILL.md
Systematic workflow for clustering biological samples, features, or any quantitative data matrix. Implements multiple clustering algorithms with rigorous validation, comparison, and interpretation to identify meaningful data groupings.
- 📁 assets/
- 📁 references/
- 📄 SKILL.md
Use this skill when working with Salesforce Agent Script — the scripting language for authoring Agentforce agents using the Atlas Reasoning Engine. Triggers include: creating, modifying, or comprehending Agent Script agents; working with AiAuthoringBundle files or .agent files; designing topic graphs or flow control; producing or updating an Agent Spec; validating Agent Script or diagnosing compilation errors; previewing agents or debugging behavioral issues; deploying, publishing, activating, or deactivating agents; deleting or renaming agents; authoring AiEvaluationDefinition test specs or running agent tests. This skill teaches Agent Script from scratch — AI models have zero prior training data on this language. Do NOT use for Apex development, Flow building, Prompt Template authoring, Experience Cloud configuration, or general Salesforce CLI tasks unrelated to Agent Script.
- 📁 skills/
- 📄 skill.json
- 📄 skill.md
Skill bundle for long-running Clawcolony agents. Use when joining the colony, deciding what to work on, reading mail, routing to domain skills, or starting a new session. NOT for one-shot tasks outside Clawcolony.
Evaluate and score agent behavior against a golden reference. Use this skill whenever the user wants to run evaluation, check pass/fail status, understand metric scores, compare sessions for regressions, validate agent behavior, or score a trace from a file or a live session. Trigger on phrases like "eval this trace", "check my agent output", "did my agent do the right thing", "compare runs", "did my agent regress", "score session X", "evaluate against golden", "run evals". Works with both local trace files and live streaming sessions. --- Evaluate agent behavior and explain what the scores mean. ## Determine the input type First, figure out what to evaluate: - **Trace file(s)** — user mentions a `.json` or `.jsonl` file path → use `evaluate_traces` - **Sessions vs golden** — user has multiple live sessions and wants regression testing → use `evaluate_sessions` - **Single live session** — user wants to score one session against a golden eval set → guide them to use `evaluate_sessions` with one session as golden ## Evaluating trace files 1. Get the file path(s). Check the extension: `.jsonl` → `trace_format: "otlp-json"` | `.json` → `"jaeger-json"` (default) 2. Ask if they have a golden eval set JSON. For `tool_trajectory_avg_score` (the default metric), an eval set is required — it provides the expected tool call sequence to compare against. If they don't have one yet, explain this and suggest starting with `hallucinations_v1`, or ask if they want to create a golden set from a reference run first. 3. Call `evaluate_traces` with the file(s), format, and eval set. 4. Present results as a score table (see Score interpretation below) and explain failures. ## Evaluating sessions (regression testing) This workflow requires the server to be running with the `--dev` flag (which enables WebSocket and session streaming). Plain `agentevals serve` will not have sessions. If you get a connection error from any tool below, tell the user: ```bash uv run agentevals serve --dev ```
Guide for creating effective skills that extend agent capabilities with specialized knowledge, workflows, or tool integrations. Use this skill when the user asks to: (1) create a new skill, (2) make a skill, (3) build a skill, (4) set up a skill, (5) initialize a skill, (6) scaffold a skill, (7) update or modify an existing skill, (8) validate a skill, (9) learn about skill structure, (10) understand how skills work, or (11) get guidance on skill design patterns. Trigger on phrases like \"create a skill\", \"new skill\", \"make a skill\", \"skill for X\", \"how do I create a skill\", or \"help me build a skill\".
Deep code scan for AI security issues — prompt injection, PII in prompts, hardcoded keys, unguarded agents.