metadata
license: mit
tags:
- hle
- humanity-last-exam
- symbolic-reasoning
- rule-based
- no-llm
- no-neural-network
- wikipedia-only
datasets:
- cais/hle
metrics:
- accuracy
model-index:
- name: Verantyx-hle-4.6
results:
- task:
type: question-answering
name: Humanity's Last Exam
dataset:
type: cais/hle
name: HLE
split: test
metrics:
- type: accuracy
value: 4.6
name: Accuracy (%)
Verantyx HLE β 4.6%
Fully LLM-free symbolic solver for Humanity's Last Exam (HLE) β no neural networks, no language models, pure rule-based reasoning with Wikipedia as the only knowledge source.
Score
| Split | Score | Method |
|---|---|---|
| Full 2500 questions | 115/2500 = 4.6% | atom_cross + knowledge_match + cross_decompose |
Approach
Verantyx solves HLE through structural decomposition:
- Atom Extraction β Break questions and choices into atomic facts using 200+ regex patterns
- Wikipedia Knowledge β Fetch relevant articles as the sole knowledge source
- Cross-Decompose β Decompose each MCQ choice individually, cross-match against Wikipedia facts
- Atom Relation Classification β LLM-free supports/contradicts/unknown classifier (60+ antonym pairs, negation detection, numeric cross-check)
- MCQε ¨εεη (Always Answer) β HLE has no wrong-answer penalty; fallback uses best keyword overlap
Pipeline
Question β Fact Atomizer β Wikipedia Fetch β Atom Cross Solver
β
Choice Scoring (supports/contradicts)
β
Best Choice or Keyword Fallback
Solver Components
| Component | Fires | Description |
|---|---|---|
| cross_decompose | 122 | Per-choice decomposition + Wikipedia cross-match |
| knowledge_match | 18 | Direct atom-based knowledge matching |
| atom_cross | fallback | Normalized atom scoring with Wikipedia overlap |
Properties
- β No LLM β zero language model inference (Qwen 7B fully removed)
- β No neural network β pure rule-based symbolic reasoning
- β No pattern detectors β DISABLE_PATTERN_DETECTORS=1
- β No concept boost β DISABLE_CONCEPT_BOOST=1
- β No wrong-answer penalty exploitation β MCQε ¨εεη is valid since HLE scoring has no penalty
- β Wikipedia-only knowledge β no pre-trained embeddings or cached answers
- β Deterministic β same input always produces same output
Score History
| Version | Score | Method |
|---|---|---|
| v1 (with LLM) | 2.68% | mcq_direct (Qwen 7B) + cross_decompose |
| v2 (LLM-free partial) | 1.22% | Early LLM removal, limited coverage |
| v4 (LLM-free full) | 4.6% | atom_cross + MCQε ¨εεη + normalized scoring |
Stats
Total: 2500 questions
Correct: 115 (4.6%)
Time: 98 minutes (4 parallel workers)
Wiki hits: 2298
Knowledge match: 18
Cross decompose: 122 fired
Links
- GitHub: Ag3497120/verantyx-v6
- ARC-AGI-2 Solver: kofdai/Verantyx-arc-agi2-7.4 (same philosophy)
- HLE Benchmark: cais/hle