--- license: apache-2.0 base_model: meta-llama/Meta-Llama-3-70B-Instruct library_name: transformers language: - en tags: - quantllm ---
# ๐Ÿ“ฆ Meta-Llama-3-70B-Instruct-4bit-gguf **meta-llama/Meta-Llama-3-70B-Instruct** converted to **GUFF** format [![QuantLLM](https://img.shields.io/badge/๐Ÿš€_Made_with-QuantLLM-orange?style=for-the-badge)](https://github.com/codewithdark-git/QuantLLM) [![Format](https://img.shields.io/badge/Format-GUFF-blue?style=for-the-badge)]() โญ Star QuantLLM on GitHub
--- ## ๐Ÿ“– About This Model This model is **[meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)** converted to GUFF format. | Property | Value | |----------|-------| | **Base Model** | [meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | | **Format** | GUFF | | **Quantization** | None (Full Precision) | | **License** | apache-2.0 | | **Created With** | [QuantLLM](https://github.com/codewithdark-git/QuantLLM) | ## ๐Ÿš€ Quick Start ### With Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Load model and tokenizer model = AutoModelForCausalLM.from_pretrained("QuantLLM/Meta-Llama-3-70B-Instruct-4bit-gguf") tokenizer = AutoTokenizer.from_pretrained("QuantLLM/Meta-Llama-3-70B-Instruct-4bit-gguf") # Generate text inputs = tokenizer("Once upon a time", return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### With QuantLLM ```python from quantllm import TurboModel # Load with automatic optimization model = TurboModel.from_pretrained("QuantLLM/Meta-Llama-3-70B-Instruct-4bit-gguf") # Generate response = model.generate("Write a poem about coding") print(response) ``` ### Requirements ```bash pip install transformers torch ``` ## ๐Ÿ“Š Model Details | Property | Value | |----------|-------| | **Original Model** | [meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | | **Format** | GUFF | | **Quantization** | Full Precision | | **License** | `apache-2.0` | | **Export Date** | 2026-04-24 | | **Exported By** | [QuantLLM v2.0](https://github.com/codewithdark-git/QuantLLM) | --- ## ๐Ÿš€ Created with QuantLLM
[![QuantLLM](https://img.shields.io/badge/๐Ÿš€_QuantLLM-Ultra--fast_LLM_Quantization-orange?style=for-the-badge)](https://github.com/codewithdark-git/QuantLLM) **Convert any model to GGUF, ONNX, or MLX in one line!** ```python from quantllm import turbo # Load any HuggingFace model model = turbo("meta-llama/Meta-Llama-3-70B-Instruct") # Export to any format model.export("guff", quantization="Q4_K_M") # Push to HuggingFace model.push("your-repo", format="guff") ``` GitHub Stars **[๐Ÿ“š Documentation](https://github.com/codewithdark-git/QuantLLM#readme)** ยท **[๐Ÿ› Report Issue](https://github.com/codewithdark-git/QuantLLM/issues)** ยท **[๐Ÿ’ก Request Feature](https://github.com/codewithdark-git/QuantLLM/issues)**