agent-evaluation

Category: Tools & Productivity | Uploader: mlflowmlflow | Downloads: 0 | Version: v1.0(Latest)

Use this when you need to EVALUATE OR IMPROVE or OPTIMIZE an existing LLM agent's output quality - including improving tool selection accuracy, answer quality, reducing costs, or fixing issues where the agent gives wrong/incomplete responses. Evaluates agents systematically using MLflow evaluation with datasets, scorers, and tracing. IMPORTANT - Always also load the instrumenting-with-mlflow-tracing skill before starting any work. Covers end-to-end evaluation workflow or individual components (tracing setup, dataset creation, scorer definition, evaluation execution).

Changelog: Source: GitHub https://github.com/mlflow/skills

Directory Structure

Current level: tree/main/agent-evaluation/

  • 📁 assets/
    • 📄 evaluation_report_template.md 4.4 KB
  • 📁 references/
    • 📄 dataset-preparation.md 9.7 KB
    • 📄 scorers-constraints.md 6.0 KB
    • 📄 scorers.md 10.7 KB
    • 📄 setup-guide.md 6.1 KB
    • 📄 throughput-guide.md 5.5 KB
    • 📄 troubleshooting.md 23.9 KB
  • 📁 scripts/
    • 📁 utils/
      • 📄 __init__.py 365 B
      • 📄 env_validation.py 3.5 KB
    • 📄 analyze_results.py 19.0 KB
    • 📄 create_dataset_template.py 13.4 KB
    • 📄 list_datasets.py 8.7 KB
    • 📄 run_evaluation_template.py 9.7 KB
    • 📄 setup_mlflow.py 9.5 KB
    • 📄 validate_auth.py 7.3 KB
    • 📄 validate_environment.py 5.1 KB
    • 📄 validate_tracing_runtime.py 11.0 KB
  • 📄 SKILL.md 20.5 KB

SKILL.md

Login to download/like/favorite ❤ 20 | ★ 0
Comments 0

Please login before commenting.

Loading comments...