Ashish Mishra's picture

17 47

Ashish Mishra

ashbuilds

·

ashbuilds

AI & ML interests

None yet

Recent Activity

upvoted a paper 12 days ago

WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

liked a model 19 days ago

tencent/HY-WorldPlay

liked a model 21 days ago

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8

View all activity

Organizations

None yet

upvoted a paper 12 days ago

WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

Paper • 2512.19678 • Published 14 days ago • 29

upvoted an article 25 days ago

Article

Codex is Open Sourcing AI models

26 days ago

•

53

upvoted a paper about 1 month ago

What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

Paper • 2512.00425 • Published Nov 29, 2025 • 50

upvoted an article 3 months ago

Article

Smol2Operator: Post-Training GUI Agents for Computer Use

+3

Sep 23, 2025

•

134

upvoted a collection 3 months ago

Granite Docling

5 items • Updated Nov 17, 2025 • 60

upvoted an article 4 months ago

Article

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

+5

Sep 11, 2025

•

176

upvoted an article 6 months ago

Article

Creating custom kernels for the AMD MI300

Jul 9, 2025

•

52

upvoted a collection 7 months ago

Holo1

Vision-Language Action Model for use in Surfer-H web navigation agent • 6 items • Updated Jun 10, 2025 • 48

upvoted an article 7 months ago

Article

Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

Jun 3, 2025

•

71

upvoted a collection 8 months ago

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 11 items • Updated 6 days ago • 549

upvoted a paper 9 months ago

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2, 2025 • 9

upvoted 2 papers 10 months ago

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Paper • 2503.04724 • Published Mar 6, 2025 • 72

InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback

Paper • 2502.15027 • Published Feb 20, 2025 • 7

upvoted 2 papers 12 months ago

Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9, 2025 • 54

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Paper • 2501.06282 • Published Jan 10, 2025 • 52

upvoted a collection about 1 year ago

DeepSeek-V3

4 items • Updated Nov 27, 2025 • 278

upvoted a paper about 1 year ago

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 48