File size: 3,406 Bytes
865ac14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
license: mit
tags:
  - hle
  - humanity-last-exam
  - symbolic-reasoning
  - rule-based
  - no-llm
  - no-neural-network
  - wikipedia-only
datasets:
  - cais/hle
metrics:
  - accuracy
model-index:
  - name: Verantyx-hle-4.6
    results:
      - task:
          type: question-answering
          name: Humanity's Last Exam
        dataset:
          type: cais/hle
          name: HLE
          split: test
        metrics:
          - type: accuracy
            value: 4.6
            name: Accuracy (%)
---

# Verantyx HLE β€” 4.6%

**Fully LLM-free symbolic solver for Humanity's Last Exam (HLE)** β€” no neural networks, no language models, pure rule-based reasoning with Wikipedia as the only knowledge source.

## Score

| Split | Score | Method |
|---|---|---|
| Full 2500 questions | **115/2500 = 4.6%** | atom_cross + knowledge_match + cross_decompose |

## Approach

Verantyx solves HLE through **structural decomposition**:

1. **Atom Extraction** β€” Break questions and choices into atomic facts using 200+ regex patterns
2. **Wikipedia Knowledge** β€” Fetch relevant articles as the sole knowledge source
3. **Cross-Decompose** β€” Decompose each MCQ choice individually, cross-match against Wikipedia facts
4. **Atom Relation Classification** β€” LLM-free supports/contradicts/unknown classifier (60+ antonym pairs, negation detection, numeric cross-check)
5. **MCQε…¨ε•ε›žη­” (Always Answer)** β€” HLE has no wrong-answer penalty; fallback uses best keyword overlap

## Pipeline

```
Question β†’ Fact Atomizer β†’ Wikipedia Fetch β†’ Atom Cross Solver
                                              ↓
                                    Choice Scoring (supports/contradicts)
                                              ↓
                                    Best Choice or Keyword Fallback
```

### Solver Components

| Component | Fires | Description |
|---|---|---|
| **cross_decompose** | 122 | Per-choice decomposition + Wikipedia cross-match |
| **knowledge_match** | 18 | Direct atom-based knowledge matching |
| **atom_cross** | fallback | Normalized atom scoring with Wikipedia overlap |

## Properties

- βœ… **No LLM** β€” zero language model inference (Qwen 7B fully removed)
- βœ… **No neural network** β€” pure rule-based symbolic reasoning
- βœ… **No pattern detectors** β€” DISABLE_PATTERN_DETECTORS=1
- βœ… **No concept boost** β€” DISABLE_CONCEPT_BOOST=1
- βœ… **No wrong-answer penalty exploitation** β€” MCQε…¨ε•ε›žη­” is valid since HLE scoring has no penalty
- βœ… **Wikipedia-only knowledge** β€” no pre-trained embeddings or cached answers
- βœ… **Deterministic** β€” same input always produces same output

## Score History

| Version | Score | Method |
|---|---|---|
| v1 (with LLM) | 2.68% | mcq_direct (Qwen 7B) + cross_decompose |
| v2 (LLM-free partial) | 1.22% | Early LLM removal, limited coverage |
| **v4 (LLM-free full)** | **4.6%** | **atom_cross + MCQε…¨ε•ε›žη­” + normalized scoring** |

## Stats

```
Total: 2500 questions
Correct: 115 (4.6%)
Time: 98 minutes (4 parallel workers)
Wiki hits: 2298
Knowledge match: 18
Cross decompose: 122 fired
```

## Links

- **GitHub**: [Ag3497120/verantyx-v6](https://github.com/Ag3497120/verantyx-v6)
- **ARC-AGI-2 Solver**: [kofdai/Verantyx-arc-agi2-7.4](https://huggingface.co/kofdai/Verantyx-arc-agi2-7.4) (same philosophy)
- **HLE Benchmark**: [cais/hle](https://huggingface.co/datasets/cais/hle)