Runs are aggregated automatically. Models with repeated runs show mean performance and error bars; models with only Run 1 show a single point.
Performance vs Cost Pareto Front
Cost is on the x-axis and the selected performance metric is on the y-axis. The Pareto front highlights models that are not dominated by cheaper and better alternatives.
Leaderboard and token-cost summaries
Use these tables to inspect the ranked models and the total cost/token footprint by provider and model.
Token columns appear when token data is available in data.js. Current data includes task/model cost, so cost is summarized now and tokens will fill automatically when token fields are added.
This table breaks each provider into individual models for the selected task, metric, model setting, and provider filter.