Create README.md
#2
by
metascroy
- opened
README.md
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Exporting to ExecuTorch
|
| 2 |
+
|
| 3 |
+
⚠️ Note: These instructions only work on Arm-based machines. Running them on x86_64 will fail.
|
| 4 |
+
|
| 5 |
+
We can run the 3-bit quantized model on a mobile phone using [ExecuTorch](https://github.com/pytorch/executorch), the PyTorch solution for mobile deployment.
|
| 6 |
+
|
| 7 |
+
To set up ExecuTorch with TorchAO lowbit kernels, run the following commands:
|
| 8 |
+
```
|
| 9 |
+
git clone https://github.com/pytorch/executorch.git
|
| 10 |
+
pushd executorch
|
| 11 |
+
git submodule update --init --recursive
|
| 12 |
+
python install_executorch.py
|
| 13 |
+
USE_CPP=1 TORCHAO_BUILD_KLEIDIAI=1 pip install third-party/ao
|
| 14 |
+
popd
|
| 15 |
+
```
|
| 16 |
+
|
| 17 |
+
(The above command works on Arm-based Mac; to use Arm-based Linux define the following environment variables before pip installing third-party/ao: BUILD_TORCHAO_EXPERIMENTAL=1 TORCHAO_BUILD_CPU_AARCH64=1 TORCHAO_BUILD_KLEIDIAI=1 TORCHAO_ENABLE_ARM_NEON_DOT=1 TORCHAO_PARALLEL_BACKEND=OPENMP).
|
| 18 |
+
|
| 19 |
+
Now we export the model to ExecuTorch, using the TorchAO lowbit kernel backend.
|
| 20 |
+
(Do not run these commands from a directory containing the ExecuTorch repo you cloned during setup, or python will use the local paths in the repo instead of the installed paths.)
|
| 21 |
+
|
| 22 |
+
```shell
|
| 23 |
+
# 1. Download QAT'd weights from HF
|
| 24 |
+
HF_DIR=lvj/Phi-4-mini-instruct-parq-3b-weight-4b-embed-shared
|
| 25 |
+
WEIGHT_DIR=$(hf download ${HF_DIR})
|
| 26 |
+
|
| 27 |
+
# 2. Rename the weight keys to ones that ExecuTorch expects
|
| 28 |
+
python -m executorch.examples.models.phi_4_mini.convert_weights $WEIGHT_DIR pytorch_model_converted.bin
|
| 29 |
+
|
| 30 |
+
# 3. Download model config from the ExecuTorch repo
|
| 31 |
+
curl -L -o phi_4_mini_config.json https://raw.githubusercontent.com/pytorch/executorch/main/examples/models/phi_4_mini/config/config.json
|
| 32 |
+
|
| 33 |
+
# 4. Export the model to ExecuTorch pte file
|
| 34 |
+
python -m executorch.examples.models.llama.export_llama \
|
| 35 |
+
--model "phi_4_mini" \
|
| 36 |
+
--checkpoint pytorch_model_converted.bin \
|
| 37 |
+
--params phi_4_mini_config.json \
|
| 38 |
+
--output_name phi4_model_3bit.pte \
|
| 39 |
+
-kv \
|
| 40 |
+
--use_sdpa_with_kv_cache \
|
| 41 |
+
--use-torchao-kernels \
|
| 42 |
+
--max_context_length 1024 \
|
| 43 |
+
--max_seq_length 256 \
|
| 44 |
+
--dtype fp32 \
|
| 45 |
+
--metadata '{"get_bos_id":199999, "get_eos_ids":[200020,199999]}'
|
| 46 |
+
|
| 47 |
+
# # 5. (optional) Upload pte file to HuggingFace
|
| 48 |
+
# hf upload ${HF_DIR} phi4_model_3bit.pte
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
Once you have the *.pte file, you can run it inside of our [iOS demo app](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/apple) in a [few easy steps](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/apple#build-and-run).
|