File size: 3,406 Bytes

865ac14

---
license: mit
tags:
  - hle
  - humanity-last-exam
  - symbolic-reasoning
  - rule-based
  - no-llm
  - no-neural-network
  - wikipedia-only
datasets:
  - cais/hle
metrics:
  - accuracy
model-index:
  - name: Verantyx-hle-4.6
    results:
      - task:
          type: question-answering
          name: Humanity's Last Exam
        dataset:
          type: cais/hle
          name: HLE
          split: test
        metrics:
          - type: accuracy
            value: 4.6
            name: Accuracy (%)
---

# Verantyx HLE — 4.6%

**Fully LLM-free symbolic solver for Humanity's Last Exam (HLE)** — no neural networks, no language models, pure rule-based reasoning with Wikipedia as the only knowledge source.

## Score

| Split | Score | Method |
|---|---|---|
| Full 2500 questions | **115/2500 = 4.6%** | atom_cross + knowledge_match + cross_decompose |

## Approach

Verantyx solves HLE through **structural decomposition**:

1. **Atom Extraction** — Break questions and choices into atomic facts using 200+ regex patterns
2. **Wikipedia Knowledge** — Fetch relevant articles as the sole knowledge source
3. **Cross-Decompose** — Decompose each MCQ choice individually, cross-match against Wikipedia facts
4. **Atom Relation Classification** — LLM-free supports/contradicts/unknown classifier (60+ antonym pairs, negation detection, numeric cross-check)
5. **MCQ全問回答 (Always Answer)** — HLE has no wrong-answer penalty; fallback uses best keyword overlap

## Pipeline

```
Question → Fact Atomizer → Wikipedia Fetch → Atom Cross Solver
                                              ↓
                                    Choice Scoring (supports/contradicts)
                                              ↓
                                    Best Choice or Keyword Fallback
```

### Solver Components

| Component | Fires | Description |
|---|---|---|
| **cross_decompose** | 122 | Per-choice decomposition + Wikipedia cross-match |
| **knowledge_match** | 18 | Direct atom-based knowledge matching |
| **atom_cross** | fallback | Normalized atom scoring with Wikipedia overlap |

## Properties

- ✅ **No LLM** — zero language model inference (Qwen 7B fully removed)
- ✅ **No neural network** — pure rule-based symbolic reasoning
- ✅ **No pattern detectors** — DISABLE_PATTERN_DETECTORS=1
- ✅ **No concept boost** — DISABLE_CONCEPT_BOOST=1
- ✅ **No wrong-answer penalty exploitation** — MCQ全問回答 is valid since HLE scoring has no penalty
- ✅ **Wikipedia-only knowledge** — no pre-trained embeddings or cached answers
- ✅ **Deterministic** — same input always produces same output

## Score History

| Version | Score | Method |
|---|---|---|
| v1 (with LLM) | 2.68% | mcq_direct (Qwen 7B) + cross_decompose |
| v2 (LLM-free partial) | 1.22% | Early LLM removal, limited coverage |
| **v4 (LLM-free full)** | **4.6%** | **atom_cross + MCQ全問回答 + normalized scoring** |

## Stats

```
Total: 2500 questions
Correct: 115 (4.6%)
Time: 98 minutes (4 parallel workers)
Wiki hits: 2298
Knowledge match: 18
Cross decompose: 122 fired
```

## Links

- **GitHub**: [Ag3497120/verantyx-v6](https://github.com/Ag3497120/verantyx-v6)
- **ARC-AGI-2 Solver**: [kofdai/Verantyx-arc-agi2-7.4](https://huggingface.co/kofdai/Verantyx-arc-agi2-7.4) (same philosophy)
- **HLE Benchmark**: [cais/hle](https://huggingface.co/datasets/cais/hle)