File size: 3,406 Bytes
865ac14 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | ---
license: mit
tags:
- hle
- humanity-last-exam
- symbolic-reasoning
- rule-based
- no-llm
- no-neural-network
- wikipedia-only
datasets:
- cais/hle
metrics:
- accuracy
model-index:
- name: Verantyx-hle-4.6
results:
- task:
type: question-answering
name: Humanity's Last Exam
dataset:
type: cais/hle
name: HLE
split: test
metrics:
- type: accuracy
value: 4.6
name: Accuracy (%)
---
# Verantyx HLE β 4.6%
**Fully LLM-free symbolic solver for Humanity's Last Exam (HLE)** β no neural networks, no language models, pure rule-based reasoning with Wikipedia as the only knowledge source.
## Score
| Split | Score | Method |
|---|---|---|
| Full 2500 questions | **115/2500 = 4.6%** | atom_cross + knowledge_match + cross_decompose |
## Approach
Verantyx solves HLE through **structural decomposition**:
1. **Atom Extraction** β Break questions and choices into atomic facts using 200+ regex patterns
2. **Wikipedia Knowledge** β Fetch relevant articles as the sole knowledge source
3. **Cross-Decompose** β Decompose each MCQ choice individually, cross-match against Wikipedia facts
4. **Atom Relation Classification** β LLM-free supports/contradicts/unknown classifier (60+ antonym pairs, negation detection, numeric cross-check)
5. **MCQε
¨εεη (Always Answer)** β HLE has no wrong-answer penalty; fallback uses best keyword overlap
## Pipeline
```
Question β Fact Atomizer β Wikipedia Fetch β Atom Cross Solver
β
Choice Scoring (supports/contradicts)
β
Best Choice or Keyword Fallback
```
### Solver Components
| Component | Fires | Description |
|---|---|---|
| **cross_decompose** | 122 | Per-choice decomposition + Wikipedia cross-match |
| **knowledge_match** | 18 | Direct atom-based knowledge matching |
| **atom_cross** | fallback | Normalized atom scoring with Wikipedia overlap |
## Properties
- β
**No LLM** β zero language model inference (Qwen 7B fully removed)
- β
**No neural network** β pure rule-based symbolic reasoning
- β
**No pattern detectors** β DISABLE_PATTERN_DETECTORS=1
- β
**No concept boost** β DISABLE_CONCEPT_BOOST=1
- β
**No wrong-answer penalty exploitation** β MCQε
¨εεη is valid since HLE scoring has no penalty
- β
**Wikipedia-only knowledge** β no pre-trained embeddings or cached answers
- β
**Deterministic** β same input always produces same output
## Score History
| Version | Score | Method |
|---|---|---|
| v1 (with LLM) | 2.68% | mcq_direct (Qwen 7B) + cross_decompose |
| v2 (LLM-free partial) | 1.22% | Early LLM removal, limited coverage |
| **v4 (LLM-free full)** | **4.6%** | **atom_cross + MCQε
¨εεη + normalized scoring** |
## Stats
```
Total: 2500 questions
Correct: 115 (4.6%)
Time: 98 minutes (4 parallel workers)
Wiki hits: 2298
Knowledge match: 18
Cross decompose: 122 fired
```
## Links
- **GitHub**: [Ag3497120/verantyx-v6](https://github.com/Ag3497120/verantyx-v6)
- **ARC-AGI-2 Solver**: [kofdai/Verantyx-arc-agi2-7.4](https://huggingface.co/kofdai/Verantyx-arc-agi2-7.4) (same philosophy)
- **HLE Benchmark**: [cais/hle](https://huggingface.co/datasets/cais/hle)
|