- 📁 evals/
- 📁 references/
- 📄 SKILL.md
When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "should I test this," "which version is better," "test two versions," "statistical significance," or "how long should I run this test." Use this whenever someone is comparing two approaches and wants to measure which performs better. For tracking implementation, see analytics-tracking. For page-level conversion optimization, see page-cro.
Assay an experiment — deep analysis of results with cross-run comparison
- 📄 autoresearch_helper.py
- 📄 SKILL.md
Autonomous experiment loop for optimization research. Use when the user wants to: - Optimize a metric through systematic experimentation (ML training loss, test speed, bundle size, build time, etc.) - Run an automated research loop: try an idea, measure it, keep improvements, revert regressions, repeat - Set up autoresearch for any codebase with a measurable optimization target Implements the autoresearch pattern with MAD-based confidence scoring, git branch isolation, and structured experiment logging. --- # Autoresearch
- 📁 evals/
- 📁 references/
- 📄 SKILL.md
When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "should I test this," "which version is better," "test two versions," "statistical significance," or "how long should I run this test." Use this whenever someone is comparing two approaches and wants to measure which performs better. For tracking implementation, see analytics-tracking. For page-level conversion optimization, see page-cro.
Analyze session replay patterns across experiment variants to understand user behavior differences. Use when the user wants to see how users interact with different experiment variants, identify usability issues, compare behavior patterns between control and test groups, or get qualitative insights to complement quantitative experiment results.
- 📁 .claude/
- 📁 src/
- 📁 tests/
- 📄 .gitignore
- 📄 .mcp.json
- 📄 CLAUDE.md
Orze is a filesystem-coordinated GPU experiment orchestrator. It runs the loop: **generate ideas → train → evaluate → learn → repeat**.