Evaluator
- class mixedvoices.evaluation.evaluator.Evaluator(eval_id: str, project_id: str, metric_names: List[str], test_cases: List[str], created_at: int | None = None, eval_runs: dict[str, EvalRun] | None = None)[source]
Bases:
objectEvaluator is a reusable collections of tests cases and metrics to test model performance. These can be run multiple times across different versions to track performance.
- property id: str
Get the id of the Evaluator
- property info: Dict[str, Any]
Get the info of the evaluator as a dictionary
- load_eval_run(run_id: str) EvalRun[source]
Load an eval run from id
- Parameters:
run_id (str) – The id of the eval run
- property metric_names: List[str]
List of metric names to be evaluated
- property project_id: str
Get the name of the Project
- run(version: Version, agent_class: Type[BaseAgent], agent_starts: bool | None, verbose: bool = True, **kwargs) EvalRun[source]
Runs the evaluator and saves the results.
- Parameters:
version (Version) – The version of the project to evaluate
agent_class (Type[BaseAgent]) – The agent class to evaluate
agent_starts (Optional[bool]) – Whether the agent starts the conversation or not. If True, the agent starts the conversation If False, the evaluator starts the conversation If None, random choice
verbose (bool) – Whether to print testing conversation and scores. Defaults to True
**kwargs – Keyword arguments to pass to the agent class
- property test_cases: List[str]
List of test cases to be evaluated