Hungarian Date Converter - LoRA Adapter (mT5-base)

This repository contains LoRA adapters for a modified mT5-base model fine-tuned on Hungarian date conversion tasks. The adapters can convert written dates to numeric format and vice versa.

Model Details

Base Model: GaborMadarasz/hut5-base (mT5-base)
Adapter Type: LoRA (Low-Rank Adaptation)
Task: Bidirectional date conversion in Hungarian
- word2date: Written date → Numeric format (e.g., "ezerkilencszáztizenegy május tizenöt" → "1911. május 15.")
- date2word: Numeric date → Written format (e.g., "1978. október 1." → "ezerkilencszázhetvennyolc október első")
Developer: SilentSynapse
Release Date: 2026

Model Architecture

LoRA Configuration

LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q", "v", "k", "o"],
    task_type="SEQ_2_SEQ_LM"
)

Parameter	Value
Trainable Parameters	3,538,944 (1.43% of total)
Total Parameters	247,848,192
Adapter Size	~14 MB
Base Model Size	~950 MB (not included)

Training Configuration

Parameter	Value
Batch Size	6 (gradient accumulation: 3, effective: 18)
Learning Rate	1.5e-4
Epochs	2
Max Length	128 (input), 64 (target)
Precision	FP16
Hardware	NVIDIA RTX 3060 12GB
Framework	Hugging Face Transformers + PEFT

Training Data

Training Samples: 756,000
Evaluation Samples: 1000
Language: Hungarian
Source: Hungarian Wikipedia articles containing date patterns
Dataset:

Evaluation Results

Metrics (64-token target limit)

Metric	Overall	word2date	date2word
Exact Match	75.70%	85.69%	65.01%
Word Error Rate (WER)	1.04%	0.61%	1.56%
Char Error Rate (CER)	0.37%	0.20%	0.54%
ROUGE-1 F1	0.9956	-	-
ROUGE-2 F1	0.9949	-	-
ROUGE-L F1	0.9956	-	-

Performance Notes

word2date performs better due to token efficiency (numeric formats are shorter)
date2word has lower accuracy due to longer Hungarian number words consuming the 64-token limit
Low WER/CER indicates most errors are minor typos, not structural failures

Usage

Requirements

pip install transformers peft torch

Basic Inference

from peft import PeftModel
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch

# Load base model and adapter
model_id = "GaborMadarasz/hut5-base"
adapter_path = "SilentSynapse/hut5-date-converter"

base_model = AutoModelForSeq2SeqLM.from_pretrained(
    model_id, 
    torch_dtype=torch.float16
)
model = PeftModel.from_pretrained(base_model, adapter_path)
model = model.merge_and_unload().to("cuda").eval()

tokenizer = AutoTokenizer.from_pretrained(model_id)

# Inference function
def convert_date(text, mode="word2date"):
    """
    mode: "word2date" or "date2word"
    """
    prompt = f"{mode}: {text}"
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=64,
            num_beams=4,
            early_stopping=True
        )
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Examples
print(convert_date("ezerkilencszáztizenegy május tizenöt", "word2date"))
# Output: 1911. május 15.

print(convert_date("1978. október 5", "date2word"))
# Output: ezerkilencszázhetvennyolc október öt

Batch Inference

def convert_batch(texts, modes):
    prompts = [f"{m}: {t}" for m, t in zip(modes, texts)]
    inputs = tokenizer(prompts, return_tensors="pt", padding=True).to("cuda")
    
    with torch.no_grad():
        outputs = model.generate(**inputs, max_length=64, num_beams=4)
    
    return tokenizer.batch_decode(outputs, skip_special_tokens=True)

texts = ["ezerkilencszáztizenegy május tizenöt", "1978. október 5."]
modes = ["word2date", "date2word"]
results = convert_batch(texts, modes)

CPU Inference (no GPU)

# Remove .to("cuda") and use float32
base_model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
model = PeftModel.from_pretrained(base_model, adapter_path)
model = model.merge_and_unload().eval()

Limitations

Output Length: The model is optimized for 64-token outputs. Longer texts may be truncated.
date2word Performance: Written date conversion has lower accuracy (65% EM) due to complex Hungarian number words.
Date Formats: Follows Hungarian conventions only (e.g., "1991. május 15."). Other formats (ISO 8601, US style) are not supported.
Context Preservation: The model converts dates while preserving surrounding text. Very long passages may lose information due to the token limit.
Language: Hungarian only. Not suitable for other languages.
Edge Cases:
- Roman numerals (e.g., "XIX. század") are not converted
- Date ranges (1978–1991) may have inconsistent conversion

Intended Use Cases

Date normalization in Hungarian text processing pipelines
Information extraction from historical documents
Text preprocessing for NLP tasks requiring standardized dates
Digital humanities projects involving Hungarian archives
Accessibility - converting dates to formats suitable for screen readers

Out-of-Scope Use Cases

Non-Hungarian language date conversion
Non-Gregorian calendar systems
Time/timestamp conversion (only dates are supported)
Mathematical operations on dates
Named entity recognition (the model does not identify dates, only converts them)

Future Improvements

256-token target support for longer texts
Additional date format support (ISO 8601, US, EU standards)
Improved date2word performance (more training epochs, larger dataset)
Quantization support (INT8, INT4) for lower VRAM usage

License

MIT License

Authors

SilentSynapse

References

Base Model: GaborMadarasz/hut5-base
Dataset:
PEFT Library: Hugging Face PEFT
mT5 Paper: Multilingual T5
LoRA Paper: Low-Rank Adaptation

Version History

Version	Date	Changes
1.0	2026-04	Initial LoRA adapter release
```

Downloads last month: 15

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SilentSynapse/hut5-date-converter

Base model

GaborMadarasz/hut5-base

Finetuned

(2)

this model

Papers for SilentSynapse/hut5-date-converter

LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 60

mT5: A massively multilingual pre-trained text-to-text transformer

Paper • 2010.11934 • Published Oct 22, 2020 • 4