Any-to-Any
Safetensors
Transformers
LongCat-Next
longcat_next
text-generation
multimodal
custom_code
Instructions to use meituan-longcat/LongCat-Next with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use meituan-longcat/LongCat-Next with Transformers:
# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("meituan-longcat/LongCat-Next", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
..
Browse files- README.md +2 -2
- assets/{overview.png → overview.jpg} +2 -2
README.md
CHANGED
|
@@ -21,7 +21,7 @@ tags:
|
|
| 21 |
<a href="https://longcat.chat/longcat-next/intro" target="_blank" style="margin: 2px;">
|
| 22 |
<img alt="Blog" src="https://img.shields.io/badge/Blog-LongCatNext-white?logo=safari&logoColor=white&color=purple" style="display: inline-block; vertical-align: middle;"/>
|
| 23 |
</a>
|
| 24 |
-
<a href="https://huggingface.co/meituan-longcat" target="_blank" style="margin: 2px;">
|
| 25 |
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCatNext-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
| 26 |
</a>
|
| 27 |
<a href="https://github.com/meituan-longcat/LongCat-Next" target="_blank" style="margin: 2px;">
|
|
@@ -59,7 +59,7 @@ tags:
|
|
| 59 |
|
| 60 |
## Model Introduction
|
| 61 |
|
| 62 |
-

|
| 63 |
|
| 64 |
|
| 65 |
We develop **LongCat-Next**, a native multimodal model that processes text, vision, and audio under a single autoregressive objective with minimal inductive bias beyond the language paradigm. As an industrial-strength foundation model with A3B model size, it excels at seeing, creating, and talking, achieving strong performance across a wide range of multimodal benchmarks. In particular, leveraging semantically complete discrete representations, it surpasses the long-standing performance ceiling of discrete vision modeling on understanding tasks, and provides a unified solution for visual understanding and generation. This success demonstrates that discrete tokens can universally represent multimodal signals and be deeply internalized within a single discrete embedding space. We further provide extensive experiments to analyze this unified discrete training paradigm and uncover several interesting findings.
|
assets/{overview.png → overview.jpg}
RENAMED
|
File without changes
|