Edit Models filters

Model Tree

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

reinforcement-learning

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

74,009

Base only

Active filters: reinforcement-learning

nvidia/GEAR-SONIC

Reinforcement Learning • Updated Apr 11 • 47

zghhui/OmniNFT

Any-to-Any • Updated 18 days ago • 59 • 37

Adilbai/stock-trading-rl-agent

Reinforcement Learning • Updated Jan 8 • 116 • 149

exla-ai/openpie-0.6

Robotics • Updated Feb 4 • 33 • 25

PhysicsWallahAI/Aryabhata-2.0

Text Generation • 21B • Updated 3 days ago • 331 • 3

osanseviero/test_sb3

Reinforcement Learning • Updated May 4, 2022 • 2 • 7

One-RL-to-See-Them-All/Orsta-7B

Image-Text-to-Text • 8B • Updated Jun 4, 2025 • 25 • 13

infly/inf-retriever-v1-pro

Reinforcement Learning • 7B • Updated Feb 2 • 1.6k • 7

cyankiwi/INTELLECT-3-AWQ-4bit

Text Generation • 19B • Updated about 1 month ago • 24 • 4

hardware-pathon-ai/unitree-g1-phase1-locomotion

Reinforcement Learning • Updated Jan 15 • 1

mendicant04/DermoGPT-RL

Image-Text-to-Text • 9B • Updated 22 days ago • 644 • 7

cagataydev/sac-unitree-g1-mujoco

Reinforcement Learning • Updated Feb 24 • 2 • 1

OpenEnvisionLab/Auto-Rubric-as-Reward

Text-to-Image • Updated 25 days ago • 3

simplex-ai-inc/LiteResearcher-4B

Text Generation • 4B • Updated Apr 22 • 96 • • 3

Kabs-123/clustermind-lora

Reinforcement Learning • Updated Apr 26 • 2 • 1

youngzhong/SOD-GRPO_teacher-4B

Text Generation • 4B • Updated 15 days ago • 193 • 2

lushy/ppo-Huggy

Reinforcement Learning • Updated 24 days ago • 88 • 1

twnlp/ChineseErrorCorrector4-4B

Text Generation • 4B • Updated 3 days ago • 546 • 5

Philip-MIT/SOLE-R1-8B

Image-Text-to-Text • 770k • Updated 6 days ago • 157 • 1

zlab-princeton/Vero-Qwen35-9B

Image-Text-to-Text • 9B • Updated 11 days ago • 28 • 1

Odegua/td3-PandaReachDense-v3

Reinforcement Learning • Updated 8 days ago • 33 • 1

0xemkey/rlt_lift_finetuned_pi05

Reinforcement Learning • Updated 7 days ago • 1

poolside-laguna-hackathon/laguna-xs2-two-hacks

Text Generation • Updated 7 days ago • 1

issll/ppo-LunarLander-v3

Reinforcement Learning • Updated 7 days ago • 27 • 1

Srgreen/ppo-LunarLander-v2-complete

Reinforcement Learning • Updated 6 days ago • 1

huggingkhalil/ppo-LunarLander-v3

Reinforcement Learning • Updated 5 days ago • 28 • 1

MoistPotato/ppo-LunarLander-v3

Reinforcement Learning • Updated 4 days ago • 26 • 1

TingheOliver/Explore-Execute-Chain-Qwen

Reinforcement Learning • Updated 3 days ago • 1

morpeN1/mimic-sepsis-cql

Reinforcement Learning • Updated 2 days ago • 18 • 1

afsee/ppo-SnowballTarget

Reinforcement Learning • Updated 1 day ago • 21 • 1