Tag: evals

2 articles available

Explains why token logprobs differ from natural-language confidence, and how to test multi-candidate prompts with seeds and evals.

Avoid model-name anchoring by defining success criteria, output format, and failure handling, then running evals on every change.

Aionda