Attri AI

Quality & Evals

A/B testing, quality metrics, scorecards, and guardrail monitoring

Avg Quality Score

8.4/10

5.2%improvement

Active Experiments

12

Running: 8Completed: 4

Guardrail Violations

23

18.5%vs last period

Avg Response Time

1.2s

8.3%optimization

Quality Scorecard

Quality Metrics vs Cost

A/B Testing Experiments

ExperimentStatusVariantsSample SizeWinner MetricVariant AVariant BImprovement
Customer Support Prompt v2 vs v1
Started 2025-01-15
running
A: Original Prompt
B: Optimized Prompt
50/50
1.25KQuality Score7.88.6+10.3%
GPT-4 vs Claude 3 Sonnet
Started 2025-01-18
running
A: GPT-4 Turbo
B: Claude 3 Sonnet
50/50
980Cost per Quality0.120.09+25%
Temperature 0.7 vs 0.3
Started 2025-01-10
completed
A: Temp 0.7
B: Temp 0.3
50/50
2.10KConsistency82.594.2+14.2%
Content Length: Long vs Short
Started 2025-01-08
completed
A: Long Context
B: Short Context
60/40
1.65KRelevance8.97.2+23.6%

Quality Metrics Breakdown

CategoryScoreSamplesTotal CostCost per Quality Point
Accuracy
8.9/10
12.50K1250.500.14
Relevance
8.7/10
12.50K1180.200.14
Coherence
8.5/10
12.50K1220.800.14
Helpfulness
8.2/10
12.50K1190.400.15
Fluency
9.1/10
12.50K1280.900.14

Guardrail Violations

GuardrailTypeStatusViolationsTotal ChecksViolation Rate
PII Detectionprivacyactive845.89K1.700%
Toxicity Filtersafetyactive1245.89K2.600%
Brand Guidelinescomplianceactive345.89K0.700%
Context Length Limittechnicalactive045.89K0.000%