Metrics

Metric

class mixedvoices.metrics.metric.Metric(name: str, definition: str, scoring: Literal['binary', 'continuous'], include_prompt: bool = False)[source]

Bases: object

Define a custom metric.

Parameters:

name (str) – The name of the metric.
definition (str) – The definition of the metric.
scoring (str) – The scoring range of the metric. Can be ‘binary’ or ‘continuous’. binary for PASS/FAIL, continuous for 0-10 scale
include_prompt (bool, optional) – Whether to include the agent prompt when evaluating the metric. Example: To check for hallucination, agent prompt should be included. But for conciseness, it shouldn’t. Defaults to False.

definition: str: The definition of the metric.

property expected_values: Returns the expected values for the metric based on the scoring type.

include_prompt: bool = False: Whether to include the agent prompt when evaluating the metric.

name: str: The name of the metric.

scoring: Literal['binary', 'continuous']: The scoring range of the metric. Can be ‘binary’ or ‘continuous’.

to_dict()[source]: Returns a dictionary representation of the metric.

Default Metrics

The following metrics are provided by default:

mixedvoices.metrics.metric.empathy: Metric: Evaluates the bot’s empathetic responses on a 0-10 scale. The bot should acknowledge what the user said and empathize by relating to their concerns

mixedvoices.metrics.metric.verbatim_repetition: Metric: Checks if the bot repeats itself word-for-word (PASS/FAIL). Similar but non-identical answers are acceptable.

mixedvoices.metrics.metric.conciseness: Metric: Measures response length and clarity on a 0-10 scale. Responses should be under 50 words while maintaining completeness.

mixedvoices.metrics.metric.hallucination: Metric: Detects if the bot makes claims not supported by its prompt (PASS/FAIL). Includes prompt content in evaluation.

mixedvoices.metrics.metric.context_awareness: Metric: Evaluates if the bot maintains conversation context (PASS/FAIL). Should acknowledge and incorporate user’s previous statements.

mixedvoices.metrics.metric.scheduling: Metric: Rates appointment scheduling effectiveness on a 0-10 scale. Checks for gathering info, time/date handling, and confirmation.

mixedvoices.metrics.metric.adaptive_qa: Metric: Scores the bot’s question relevance on a 0-10 scale. Questions should be topical and avoid repeating answered items.

mixedvoices.metrics.metric.objection_handling: Metric: Rates how well the bot handles objections on a 0-10 scale. Should acknowledge, empathize, and provide relevant solutions.

Helper Functions

mixedvoices.metrics.definitions.get_all_default_metrics() → list[Metric][source]

Returns a list of all default metrics available in the system.

These metrics cover various aspects of conversational agent performance: - Emotional intelligence (empathy, objection handling) - Technical accuracy (hallucination, verbatim repetition) - Conversation quality (conciseness, context awareness) - Task completion (scheduling, adaptive QA)

Example

>>> metrics = get_all_default_metrics()
>>> for metric in metrics:
...     print(f"{metric.name}: {metric.scoring}")
Empathy: continuous
Verbatim Repetition: binary
...