Metrics
Metric
- class mixedvoices.metrics.metric.Metric(name: str, definition: str, scoring: Literal['binary', 'continuous'], include_prompt: bool = False)[source]
Bases:
objectDefine a custom metric.
- Parameters:
name (str) – The name of the metric.
definition (str) – The definition of the metric.
scoring (str) – The scoring range of the metric. Can be ‘binary’ or ‘continuous’. binary for PASS/FAIL, continuous for 0-10 scale
include_prompt (bool, optional) – Whether to include the agent prompt when evaluating the metric. Example: To check for hallucination, agent prompt should be included. But for conciseness, it shouldn’t. Defaults to False.
- definition: str
The definition of the metric.
- property expected_values
Returns the expected values for the metric based on the scoring type.
- include_prompt: bool = False
Whether to include the agent prompt when evaluating the metric.
- name: str
The name of the metric.
- scoring: Literal['binary', 'continuous']
The scoring range of the metric. Can be ‘binary’ or ‘continuous’.
Default Metrics
The following metrics are provided by default:
- mixedvoices.metrics.metric.empathy: Metric
Evaluates the bot’s empathetic responses on a 0-10 scale. The bot should acknowledge what the user said and empathize by relating to their concerns
- mixedvoices.metrics.metric.verbatim_repetition: Metric
Checks if the bot repeats itself word-for-word (PASS/FAIL). Similar but non-identical answers are acceptable.
- mixedvoices.metrics.metric.conciseness: Metric
Measures response length and clarity on a 0-10 scale. Responses should be under 50 words while maintaining completeness.
- mixedvoices.metrics.metric.hallucination: Metric
Detects if the bot makes claims not supported by its prompt (PASS/FAIL). Includes prompt content in evaluation.
- mixedvoices.metrics.metric.context_awareness: Metric
Evaluates if the bot maintains conversation context (PASS/FAIL). Should acknowledge and incorporate user’s previous statements.
- mixedvoices.metrics.metric.scheduling: Metric
Rates appointment scheduling effectiveness on a 0-10 scale. Checks for gathering info, time/date handling, and confirmation.
Helper Functions
- mixedvoices.metrics.definitions.get_all_default_metrics() list[Metric][source]
Returns a list of all default metrics available in the system.
These metrics cover various aspects of conversational agent performance: - Emotional intelligence (empathy, objection handling) - Technical accuracy (hallucination, verbatim repetition) - Conversation quality (conciseness, context awareness) - Task completion (scheduling, adaptive QA)
Example
>>> metrics = get_all_default_metrics() >>> for metric in metrics: ... print(f"{metric.name}: {metric.scoring}") Empathy: continuous Verbatim Repetition: binary ...