- 📁 examples/
- 📁 references/
- 📁 scripts/
- 📄 SKILL.md
Zero-shot time series forecasting with Google's TimesFM foundation model. Use this skill when forecasting ANY univariate time series — sales, sensor readings, stock prices, energy demand, patient vitals, weather, or scientific measurements — without training a custom model. Supports both basic forecasting and advanced covariate forecasting (XReg) with dynamic and static exogenous variables. Automatically checks system RAM/GPU before loading the model, validates dataset fit before processing, supports CSV/DataFrame/array inputs, and returns point forecasts with calibrated prediction intervals. Includes a preflight system checker script that MUST be run before first use to verify the machine can load the model and handle your specific dataset.
- 📁 evals/
- 📁 references/
- 📄 SKILL.md
Core guide for using the Tinker API — installation, model selection, SDK basics, types, CLI, and hyperparameters. Use this skill whenever the user asks about getting started with Tinker, choosing a model, using the SDK, API types, CLI commands, or tuning hyperparameters. This is the foundational skill — trigger it for any general Tinker question.
Automatically evaluate and compare multiple AI models or agents without pre-existing test data. Generates test queries from a task description, collects responses from all target endpoints, auto-generates evaluation rubrics, runs pairwise comparisons via a judge model, and produces win-rate rankings with reports and charts. Supports checkpoint resume, incremental endpoint addition, and judge model hot-swap. Use when the user asks to compare, benchmark, or rank multiple models or agents on a custom task, or run an arena-style evaluation. --- # Auto Arena Skill End-to-end automated model comparison using the OpenJudge `AutoArenaPipeline`: 1. **Generate queries** — LLM creates diverse test queries from task description 2. **Collect responses** — query all target endpoints concurrently 3. **Generate rubrics** — LLM produces evaluation criteria from task + sample queries 4. **Pairwise evaluation** — judge model compares every model pair (with position-bias swap) 5. **Analyze & rank** — compute win rates, win matrix, and rankings 6. **Report & charts** — Markdown report + win-rate bar chart + optional matrix heatmap ## Prerequisites ```bash # Install OpenJudge pip install py-openjudge # Extra dependency for auto_arena (chart generation) pip install matplotlib ``` ## Gather from user before running | Info | Required? | Notes | |------|-----------|-------| | Task description | Yes | What the models/agents should do (set in config YAML) | | Target endpoints | Yes | At least 2 OpenAI-compatible endpoints to compare | | Judge endpoint | Yes | Strong model for pairwise evaluation (e.g. `gpt-4`, `qwen-max`) | | API keys | Yes | Env vars: `OPENAI_API_KEY`, `DASHSCOPE_API_KEY`, etc. | | Number of queries | No | Default: `20` | | Seed queries | No | Example queries to guide generation style | | System prompts | No | Per-endpoint system prompts | | Output directory | No | Default: `./evaluation_results` | | Report language | No | `"zh"` (default) or `"en"` | ## Quick start ### CLI `
Build a ForgeCAD model while actively hunting for API friction — missing helpers, awkward patterns, bad defaults, verbose boilerplate. Use when asked to dogfood, stress-test the API, or build a model with the goal of improving ForgeCAD.
- 📁 agents/
- 📁 assets/
- 📁 references/
- 📄 SKILL.md
Inspect external prediction model implementations and adapt them to EasyTSF task contracts. Use when the user provides model code, class definitions, forward logic, or config fragments and wants Codex to classify the target task as `sequence_prediction`, `graph_prediction`, or `grid_prediction`, determine the current repository fit, and produce either a direct adaptation plan or a repository extension plan.
Add or update a model in the harness model registry. Use when the user wants to add a new AI model, update model pricing, or change default models for a harness.
Add a new diffusion model (text-to-image, text-to-video, image-to-video, text-to-audio, image editing) to vLLM-Omni, including Cache-DiT acceleration and parallelism support (TP, SP/USP, CFG-Parallel, HSDP). Use when integrating a new diffusion model, porting a diffusers pipeline or a custom model repo to vllm-omni, creating a new DiT transformer adapter, adding diffusion model support, or enabling multi-GPU parallelism and cache acceleration for an existing model.
Creates entity model documents with Mermaid.js ER diagrams and attribute tables defining entities, relationships, data types, and validation rules. Use when the user asks to "create an entity model", "design a data model", "draw an ERD", "define database schema", "model entities", or mentions entity-relationship diagram, ER diagram, database design, or data modeling. --- # Entity Model ## Instructions Create or update the entity model at `docs/entity_model.md` based on `docs/requirements.md`. The document contains an ER diagram and attribute tables. ## DO NOT - Add attributes/columns to the Mermaid diagram - Write prose descriptions like "Key attributes: name, email..." - Create a "Relationships" table - Skip the attribute tables ## Document Structure ```markdown # Entity Model ## Entity Relationship Diagram ```mermaid erDiagram ROOM_TYPE ||--o{ ROOM : "categorizes" GUEST ||--o{ RESERVATION : "makes" ``` ### ENTITY_NAME One sentence describing the entity. | Attribute | Description | Data Type | Length/Precision | Validation Rules | |-----------|-------------|-----------|------------------|-----------------------| | id | ... | Long | 19 | Primary Key, Sequence | | ... | ... | ... | ... | ... | ## Required Format for Each Entity
Guides ACT-R cognitive model construction: chunk types, production rules, subsymbolic parameters, and model validation
Spawn conversations with other LLMs (Gemini, GPT, ChatGPT, Codex, o3, DeepSeek, Qwen, Grok, Mistral, etc.) and fold results back into your context. TRIGGER when: user asks to talk to, chat with, use, call, or spawn another LLM or model; user mentions Gemini, GPT, ChatGPT, Codex, o3, DeepSeek, Claude (as a sidecar target), Qwen, Grok, Mistral, or any non-current model by name; user asks to get a second opinion from another model; user wants parallel exploration with a different model; user says "sidecar", "fork", or "fold".
- 📁 viewer/
- 📄 .gitignore
- 📄 README.md
- 📄 setup.sh
Generate a 3D model from a text description using build123d and render it in the browser viewer. Use when asked to "render", "make a 3D model", "create a part", "design a", "model a", or any 3D modeling request.
This skill should be used when the user asks about "AI security", "ML pipeline attacks", "prompt injection", "model deserialization", "unsafe model loading", "Jupyter injection", "LLM security", or needs to identify AI/ML-specific vulnerabilities in codebases that use machine learning frameworks.