pinned
Running
199
Evaluation Guidebook
๐
Display benchmark evaluation data for LLMs
LLM evaluation
Display benchmark evaluation data for LLMs
A space to view and inspect all the tasks in lighteval
Explore and discover all leaderboards from the HF community
Display and inspect log files
Launch and monitor model evaluation jobs
Generate a command to run model evaluations
Compare tokenization lengths across languages