SemEval-2026 (5)
Dashboard by
COGNAC
Test
Dev
Enable the Best Ensemble
Zero-shot
All
Gemini
2.5-flash
2.0-flash
2.5-flash-lite
GPT
5.1
5-mini
5-nano
4o
4o-mini
4.1-mini
Deepseek
V3.2
Chain-of-Thought
All
Gemini
2.5-flash
2.0-flash
2.5-flash-lite
GPT
5.1
5-mini
5-nano
4o
4o-mini
4.1-mini
Deepseek
V3.2
Comparative
All
Gemini
2.5-flash
2.0-flash
2.5-flash-lite
GPT
5.1
5-mini
5-nano
4o
4o-mini
4.1-mini
Deepseek
V3.2
--
Accuracy
--
Correlation*
--
Average
*May be slightly different than the paper due to how JS calculates Spearman Correlation.