Instructions to use kyutai/stt-2.6b-en-trfs with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kyutai/stt-2.6b-en-trfs with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="kyutai/stt-2.6b-en-trfs")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("kyutai/stt-2.6b-en-trfs") model = AutoModelForSpeechSeq2Seq.from_pretrained("kyutai/stt-2.6b-en-trfs") - Notebooks
- Google Colab
- Kaggle
Is it possible to prompt the model with instructions, like Unmute?
#1
by deathknight0 - opened
Thank you for your release. Just wondering if this repo supports prompting the model natively - like what you implemented in Unmute.
Hi, what would that mean for a speech-to-text? It is possible to feed in a prefix of audio + transcript which would condition the model to transcribe words in certain ways. For that you'd want to hack around in the Pytorch implementation, see here, we don't provide a high-level interface for that atm. But if you mean something like "transcribe every proper name in upper case", that is not possible.