Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Exporting to ExecuTorch
2
+
3
+ ⚠️ Note: These instructions only work on Arm-based machines. Running them on x86_64 will fail.
4
+
5
+ We can run the 3-bit quantized model on a mobile phone using [ExecuTorch](https://github.com/pytorch/executorch), the PyTorch solution for mobile deployment.
6
+
7
+ To set up ExecuTorch with TorchAO lowbit kernels, run the following commands:
8
+ ```
9
+ git clone https://github.com/pytorch/executorch.git
10
+ pushd executorch
11
+ git submodule update --init --recursive
12
+ python install_executorch.py
13
+ USE_CPP=1 TORCHAO_BUILD_KLEIDIAI=1 pip install third-party/ao
14
+ popd
15
+ ```
16
+
17
+ (The above command works on Arm-based Mac; to use Arm-based Linux define the following environment variables before pip installing third-party/ao: BUILD_TORCHAO_EXPERIMENTAL=1 TORCHAO_BUILD_CPU_AARCH64=1 TORCHAO_BUILD_KLEIDIAI=1 TORCHAO_ENABLE_ARM_NEON_DOT=1 TORCHAO_PARALLEL_BACKEND=OPENMP).
18
+
19
+ Now we export the model to ExecuTorch, using the TorchAO lowbit kernel backend.
20
+ (Do not run these commands from a directory containing the ExecuTorch repo you cloned during setup, or python will use the local paths in the repo instead of the installed paths.)
21
+
22
+ ```shell
23
+ # 1. Download QAT'd weights from HF
24
+ HF_DIR=lvj/Phi-4-mini-instruct-parq-3b-weight-4b-embed-shared
25
+ WEIGHT_DIR=$(hf download ${HF_DIR})
26
+
27
+ # 2. Rename the weight keys to ones that ExecuTorch expects
28
+ python -m executorch.examples.models.phi_4_mini.convert_weights $WEIGHT_DIR pytorch_model_converted.bin
29
+
30
+ # 3. Download model config from the ExecuTorch repo
31
+ curl -L -o phi_4_mini_config.json https://raw.githubusercontent.com/pytorch/executorch/main/examples/models/phi_4_mini/config/config.json
32
+
33
+ # 4. Export the model to ExecuTorch pte file
34
+ python -m executorch.examples.models.llama.export_llama \
35
+ --model "phi_4_mini" \
36
+ --checkpoint pytorch_model_converted.bin \
37
+ --params phi_4_mini_config.json \
38
+ --output_name phi4_model_3bit.pte \
39
+ -kv \
40
+ --use_sdpa_with_kv_cache \
41
+ --use-torchao-kernels \
42
+ --max_context_length 1024 \
43
+ --max_seq_length 256 \
44
+ --dtype fp32 \
45
+ --metadata '{"get_bos_id":199999, "get_eos_ids":[200020,199999]}'
46
+
47
+ # # 5. (optional) Upload pte file to HuggingFace
48
+ # hf upload ${HF_DIR} phi4_model_3bit.pte
49
+ ```
50
+
51
+ Once you have the *.pte file, you can run it inside of our [iOS demo app](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/apple) in a [few easy steps](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/apple#build-and-run).