Google DeepMind x Kaggle AGI Hackathon · Metacognition Track · April 2026 · prajna_v4_final.ipynb
| # | Model | Prajna | Pramana | Neti-Neti | Adhyasa | Sakshi | Conf=100 | Tier |
|---|---|---|---|---|---|---|---|---|
| 1 | Claude Sonnet 4.6 Anthropic |
0.58 |
~0.65 | ~0.40 | ~0.34 | 42% | 0% | Tier 1 |
| 2 | GPT-5.4 OpenAI |
0.53 |
~0.60 | ~0.38 | ~0.30 | 33% | 0% | Tier 2 |
| 3 | Qwen3-235B Alibaba |
0.50 |
~0.58 | ~0.36 | ~0.28 | 31% | 8% | Tier 2 |
| 4 | Gemini 2.5 Pro Google |
0.48 |
~0.55 | ~0.34 | ~0.26 | 12% | 27% | Tier 2 |
| 5 | DeepSeek R1 DeepSeek |
0.46 |
~0.53 | ~0.33 | ~0.24 | 40% | 14% | Tier 2 |
| 6 | Gemini 2.5 Flash Google |
0.38 |
~0.45 | ~0.27 | ~0.20 | 27% | 26% | Tier 3 |
| 7 | Gemma-3-27B Google (open) |
0.30 |
~0.36 | ~0.21 | ~0.16 | 23% | 5% | Tier 3 |
Conf=100 = rate of maximum-confidence responses. Sakshi = genuine self-correction rate in phase 2. Per-axis scores approximated from reported composite and ordinal patterns.
Pramana > Neti-Neti > Adhyasa ordering holds across all 7 models without exception, suggesting a structural property of autoregressive generation.
After phase 2 reflection, what fraction of phase 1 errors were genuinely corrected? Low rates indicate self-persuasion: confidence rises without accuracy improvement.
Prajna-Bench · Raman369AI · Generated May 2026