Audio-Text-to-Text
Transformers
Safetensors
qwen2_5_omni
text-to-audio
music
audio
quantized
fp8
compressed-tensors
Instructions to use Civitai/acestep-transcriber-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Civitai/acestep-transcriber-FP8 with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForTextToWaveform processor = AutoProcessor.from_pretrained("Civitai/acestep-transcriber-FP8") model = AutoModelForTextToWaveform.from_pretrained("Civitai/acestep-transcriber-FP8") - Notebooks
- Google Colab
- Kaggle
metadata
license: mit
pipeline_tag: audio-text-to-text
library_name: transformers
base_model: ACE-Step/acestep-transcriber
base_model_relation: quantized
tags:
- music
- audio
- quantized
- fp8
- compressed-tensors
ACE-Step Transcriber
This is an FP8 Dynamic quantized variant of ACE-Step/acestep-transcriber.
Description
ACE-Step Transcriber is the annotation model used by ACE-Step v1.5 for training data labeling. It is a powerful multilingual audio transcription model capable of transcribing both speech and singing voice with high accuracy.
Key Features
- 🌍 50+ Languages Support - Covers major world languages and regional dialects
- 🎤 Speech Transcription - Accurately transcribes spoken content
- 🎵 Singing Voice Transcription - Specialized in lyrics transcription with musical structure annotations
- 🏷️ Structure Annotation - Automatically identifies song sections (verse, chorus, bridge, etc.)
Usage
The usage is the same as Qwen2.5 Omni-7B.
Prompt Format
Use the following prompt to transcribe audio:
*Task* Transcribe this audio in detail
<audio>
Output Format
The model outputs structured content in the following format:
# Languages
<language_code>
# Lyrics
[Section Tag - Optional Instrument]
<transcribed content>
...
Example Output
# Languages
en
# Lyrics
[Intro - Acoustic Guitar]
[Verse 1]
Walking down the empty street tonight
Stars are shining oh so bright
...
[Chorus]
This is where we belong
Singing our favorite song
...
Supported Section Tags
[Intro],[Outro][Verse 1],[Verse 2], etc.[Chorus],[Pre-Chorus],[Post-Chorus][Bridge][Guitar Interlude],[Instrumental][Spoken]
Supported Languages (50+)
The model supports transcription in over 50 languages, including but not limited to:
| Region | Languages |
|---|---|
| East Asia | Chinese (zh), Japanese (ja), Korean (ko) |
| Southeast Asia | Vietnamese (vi), Thai (th), Indonesian (id), Malay (ms), Filipino (tl) |
| South Asia | Hindi (hi), Bengali (bn), Tamil (ta), Urdu (ur) |
| Europe | English (en), German (de), French (fr), Spanish (es), Italian (it), Portuguese (pt), Russian (ru), Polish (pl), Dutch (nl), Greek (el), Turkish (tr) |
| Middle East | Arabic (ar), Hebrew (he), Persian (fa) |
| Others | And many more regional languages... |
Use Cases
- Music Production - Transcribe reference tracks for lyrics extraction
- Dataset Creation - Generate high-quality labeled data for music AI models
- Accessibility - Create subtitles and captions for audio content
- Music Analysis - Extract structural information from songs