Evaluator

class mixedvoices.evaluation.evaluator.Evaluator(eval_id: str, project_id: str, metric_names: List[str], test_cases: List[str], created_at: int | None = None, eval_runs: dict[str, EvalRun] | None = None)[source]

Bases: object

Evaluator is a reusable collections of tests cases and metrics to test model performance. These can be run multiple times across different versions to track performance.

property id: str

Get the id of the Evaluator

property info: Dict[str, Any]

Get the info of the evaluator as a dictionary

list_eval_runs(version_id: str | None = None) List[EvalRun][source]

List of eval runs

load_eval_run(run_id: str) EvalRun[source]

Load an eval run from id

Parameters:

run_id (str) – The id of the eval run

property metric_names: List[str]

List of metric names to be evaluated

property project_id: str

Get the name of the Project

run(version: Version, agent_class: Type[BaseAgent], agent_starts: bool | None, verbose: bool = True, **kwargs) EvalRun[source]

Runs the evaluator and saves the results.

Parameters:
  • version (Version) – The version of the project to evaluate

  • agent_class (Type[BaseAgent]) – The agent class to evaluate

  • agent_starts (Optional[bool]) – Whether the agent starts the conversation or not. If True, the agent starts the conversation If False, the evaluator starts the conversation If None, random choice

  • verbose (bool) – Whether to print testing conversation and scores. Defaults to True

  • **kwargs – Keyword arguments to pass to the agent class

property test_cases: List[str]

List of test cases to be evaluated