Our Approach
TopClanker aggregates scores from established, peer-reviewed benchmarks published by the AI research community. We don't make up numbers. Every score links back to published results.
Think of it like Rotten Tomatoes for AI:
- 🎯 Benchmark Score Our "Tomatometer" - Weighted aggregate of published academic benchmarks
- 🍿 Community Score Our "Popcorn Meter" - User voting (coming soon)
Benchmarks We Use
Reasoning Reasoning Models
MMLU (Massive Multitask Language Understanding)
57 subjects covering STEM, humanities, social sciences. Multiple-choice questions from elementary to professional level.
Source: Hendrycks et al., 2021
Weight in category: 40%
GPQA (Graduate-Level Questions)
Diamond-level graduate questions in physics, biology, and chemistry. Tests expert-level reasoning.
Source: Published research benchmarks
Weight in category: 30%
LMSYS Chatbot Arena
Real-world human preference ranking via blind pairwise comparisons. Over 1M+ votes.
Source: LMSYS Org
Weight in category: 30%
Math Math Models
GSM8K (Grade School Math 8K)
8,500 grade-school level math word problems. Tests multi-step reasoning and arithmetic.
Source: Cobbe et al., 2021
Weight in category: 40%
MATH
12,500 competition mathematics problems with step-by-step solutions. Tests advanced mathematical reasoning.
Weight in category: 40%
AIME (Math Competition)
American Invitational Mathematics Examination problems. High-school competition level.
Weight in category: 20%
Research Research Models
MMLU (General Knowledge)
Same as reasoning, but weighted for breadth of knowledge.
Weight in category: 35%
MMMU (Multimodal Understanding)
Multimodal questions requiring visual reasoning and document understanding.
Weight in category: 30%
Citation Accuracy
Manual testing of fact-checking and source attribution.
Weight in category: 35%
Learning Learning/Coding Models
HumanEval
164 hand-written programming problems. Tests code generation and correctness.
Source: Chen et al., 2021
Weight in category: 40%
SWE-bench Verified
Real-world software engineering tasks. Tests ability to fix bugs and write production code.
Weight in category: 40%
Adaptive Performance
Testing context retention and learning from feedback.
Weight in category: 20%
Scoring Formula
Category Score Calculation
Category Score = Σ (Benchmark Score × Weight)
Example for Reasoning:
= (MMLU × 0.40) + (GPQA × 0.30) + (Arena Elo × 0.30)
Overall Score
- • Privacy rating (+5% for high privacy)
- • Open source (+3% bonus for open models)
- • Recency (newer benchmarks weighted slightly higher)
Privacy Rating
High Privacy
No training on user data, clear data retention policies, GDPR compliant, allows data deletion.
Medium Privacy
May train on user data with opt-out, 30-day retention, some data sharing with partners.
Low Privacy
Trains on user data by default, unclear retention, extensive data collection.
Update Schedule
- Monthly: Update with new published benchmark results
- Immediately: Add new models when major releases occur
- Quarterly: Review and adjust category weights based on community feedback
Data Sources
- • Official model release papers and technical reports
- • LMSYS Chatbot Arena leaderboard (updated continuously)
- • Papers with Code leaderboards
- • Hugging Face Open LLM Leaderboard
- • Independent third-party evaluations (when available)
Our Commitments
✓ No paid placements: Rankings are based solely on benchmark performance.
✓ Open methodology: This page explains exactly how we calculate scores.
✓ Source everything: Every claim links to published research.
✓ Community input: User voting will complement (not replace) benchmark scores.
Limitations & Caveats
- • Benchmarks aren't perfect: They test specific capabilities, not all real-world performance.
- • Scores change: Models get updated, new benchmarks emerge.
- • Context matters: The "best" model depends on your use case.
- • Gaming is possible: Labs can optimize for benchmarks. We use diverse tests to minimize this.
Questions or Feedback?
Think we're missing an important benchmark? Disagree with our weighting? Found an error?
Email us: rankings@topclanker.com