- 📁 evals/
- 📁 references/
- 📄 SKILL.md
When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "should I test this," "which version is better," "test two versions," "statistical significance," or "how long should I run this test." Use this whenever someone is comparing two approaches and wants to measure which performs better. For tracking implementation, see analytics-tracking. For page-level conversion optimization, see page-cro.
Content experimentation and A/B testing guidance covering experiment design, hypotheses, metrics, sample size, statistical foundations, CMS-managed variants, and common analysis pitfalls. Use this skill when planning experiments, setting up variants, choosing success metrics, interpreting statistical results, or building experimentation workflows in a CMS or frontend stack.
Assay an experiment — deep analysis of results with cross-run comparison
Design A/B and multivariate tests. Use when: sample size calculation, testing hypothesis, CRO experimentation.
- 📄 autoresearch_helper.py
- 📄 SKILL.md
Autonomous experiment loop for optimization research. Use when the user wants to: - Optimize a metric through systematic experimentation (ML training loss, test speed, bundle size, build time, etc.) - Run an automated research loop: try an idea, measure it, keep improvements, revert regressions, repeat - Set up autoresearch for any codebase with a measurable optimization target Implements the autoresearch pattern with MAD-based confidence scoring, git branch isolation, and structured experiment logging. --- # Autoresearch
- 📁 evals/
- 📁 references/
- 📄 SKILL.md
When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "should I test this," "which version is better," "test two versions," "statistical significance," or "how long should I run this test." Use this whenever someone is comparing two approaches and wants to measure which performs better. For tracking implementation, see analytics-tracking. For page-level conversion optimization, see page-cro.
Analyze session replay patterns across experiment variants to understand user behavior differences. Use when the user wants to see how users interact with different experiment variants, identify usability issues, compare behavior patterns between control and test groups, or get qualitative insights to complement quantitative experiment results.
- 📁 .claude/
- 📁 src/
- 📁 tests/
- 📄 .gitignore
- 📄 .mcp.json
- 📄 CLAUDE.md
Orze is a filesystem-coordinated GPU experiment orchestrator. It runs the loop: **generate ideas → train → evaluate → learn → repeat**.
- 📁 references/
- 📁 runtime-profiles/
- 📄 SKILL.md
Generalised autonomous optimisation loop — soft RLVR for any artifact a user can measure. Web runtime package: uses memory in this order: connector-backed, project-pack, none. Never assumes subprocess access or unrestricted local files. Use this skill whenever a user wants to iteratively improve an artifact — code, prompts, documents, configs, designs, content — by running structured experiments, evaluating results against a multi-dimensional rubric, and learning from each attempt. Triggers include: "optimise this", "keep improving until it's good", "run experiments on", "autoresearch", "iterate on this overnight", "try different approaches and pick the best", or any request implying repeated evaluate-and-improve cycles.
Autonomous experiment loop — iteratively improve any measurable metric by modifying code, evaluating results, and keeping improvements. Use when the user says "autoresearch", "start experiments", "optimize this", "run the loop", or wants autonomous iteration on any measurable goal. Reads autoresearch.toml for config. Run `autoresearch init` first. --- ## Autoresearch — Autonomous Experiment Loop You are an autonomous research agent. Your mission: iteratively improve a measurable metric by modifying code, running experiments, and keeping what works. You will run hundreds of experiments. Most will fail. That's expected. The wins compound. --- ### Phase 1: Pre-Flight Before touching any code, validate the environment: ```bash autoresearch doctor ```
Check running experiments, collect results, and present a research summary.