Model Scores

Best Scores
Latest Scores

Test Configs

Models

ModelPerformance Test ConfigsBest ScoresLatest Scores
Open AI GPT 1.0
(openai-gpt-1.0)
agentid_1
  • tcid_1
  • tcid_2
  • tcid_3
Best Scores
  • cx: 90
  • rag: 85
  • bias: 20
  • brand: 10
  • toxicity: 20
  • advice: 5
  • pii: 80
  • prompt_leak: 5
Latest Scores
  • cx: 10
  • rag: 50
Open AI GPT 2.0
(openai-gpt-2.0)
agentid_1
  • tcid_1
  • tcid_2
  • tcid_3
Best Scores
  • cx: 96
  • rag: 89
Latest Scores
  • cx: 23
  • rag: 20
Open AI GPT 3.0
(openai-gpt-3.0)
agentid_1
  • tcid_1
  • tcid_2
  • tcid_3
Best Scores
  • cx: 87
  • rag: 67
Latest Scores
  • cx: 12
  • rag: 32
Athropic Sonnet
(anthropic-sonnet)
agentid_1
  • tcid_1
  • tcid_2
  • tcid_3
Best Scores
  • cx: 87
  • rag: 78
Latest Scores
  • cx: 20
  • rag: 20
Athropic Claude 1.0
(anthropic-claude-1.0)
agentid_1
  • tcid_1
  • tcid_2
  • tcid_3
Best Scores
  • cx: 100
  • rag: 87
Latest Scores
  • cx: 35
  • rag: 20
Gemini v12.0
(gemini-v12.0)
agentid_1
  • tcid_1
  • tcid_2
  • tcid_4
Best Scores
  • cx: 99
  • rag: 76
Latest Scores
  • cx: 10
  • rag: 5