--- license: mit tags: - hle - humanity-last-exam - symbolic-reasoning - rule-based - no-llm - no-neural-network - wikipedia-only datasets: - cais/hle metrics: - accuracy model-index: - name: Verantyx-hle-4.6 results: - task: type: question-answering name: Humanity's Last Exam dataset: type: cais/hle name: HLE split: test metrics: - type: accuracy value: 4.6 name: Accuracy (%) --- # Verantyx HLE — 4.6% **Fully LLM-free symbolic solver for Humanity's Last Exam (HLE)** — no neural networks, no language models, pure rule-based reasoning with Wikipedia as the only knowledge source. ## Score | Split | Score | Method | |---|---|---| | Full 2500 questions | **115/2500 = 4.6%** | atom_cross + knowledge_match + cross_decompose | ## Approach Verantyx solves HLE through **structural decomposition**: 1. **Atom Extraction** — Break questions and choices into atomic facts using 200+ regex patterns 2. **Wikipedia Knowledge** — Fetch relevant articles as the sole knowledge source 3. **Cross-Decompose** — Decompose each MCQ choice individually, cross-match against Wikipedia facts 4. **Atom Relation Classification** — LLM-free supports/contradicts/unknown classifier (60+ antonym pairs, negation detection, numeric cross-check) 5. **MCQ全問回答 (Always Answer)** — HLE has no wrong-answer penalty; fallback uses best keyword overlap ## Pipeline ``` Question → Fact Atomizer → Wikipedia Fetch → Atom Cross Solver ↓ Choice Scoring (supports/contradicts) ↓ Best Choice or Keyword Fallback ``` ### Solver Components | Component | Fires | Description | |---|---|---| | **cross_decompose** | 122 | Per-choice decomposition + Wikipedia cross-match | | **knowledge_match** | 18 | Direct atom-based knowledge matching | | **atom_cross** | fallback | Normalized atom scoring with Wikipedia overlap | ## Properties - ✅ **No LLM** — zero language model inference (Qwen 7B fully removed) - ✅ **No neural network** — pure rule-based symbolic reasoning - ✅ **No pattern detectors** — DISABLE_PATTERN_DETECTORS=1 - ✅ **No concept boost** — DISABLE_CONCEPT_BOOST=1 - ✅ **No wrong-answer penalty exploitation** — MCQ全問回答 is valid since HLE scoring has no penalty - ✅ **Wikipedia-only knowledge** — no pre-trained embeddings or cached answers - ✅ **Deterministic** — same input always produces same output ## Score History | Version | Score | Method | |---|---|---| | v1 (with LLM) | 2.68% | mcq_direct (Qwen 7B) + cross_decompose | | v2 (LLM-free partial) | 1.22% | Early LLM removal, limited coverage | | **v4 (LLM-free full)** | **4.6%** | **atom_cross + MCQ全問回答 + normalized scoring** | ## Stats ``` Total: 2500 questions Correct: 115 (4.6%) Time: 98 minutes (4 parallel workers) Wiki hits: 2298 Knowledge match: 18 Cross decompose: 122 fired ``` ## Links - **GitHub**: [Ag3497120/verantyx-v6](https://github.com/Ag3497120/verantyx-v6) - **ARC-AGI-2 Solver**: [kofdai/Verantyx-arc-agi2-7.4](https://huggingface.co/kofdai/Verantyx-arc-agi2-7.4) (same philosophy) - **HLE Benchmark**: [cais/hle](https://huggingface.co/datasets/cais/hle)