# 🐛 How to Debug Your HF Space ## Your Situation ✅ Deployed successfully ⏳ Took long time to respond ❌ Finally showed error --- ## 🎯 Step-by-Step Debugging ### Step 1: Run Local Diagnosis (30 seconds) ```powershell # Check your HF Space status python debug_hf_space.py ``` This will tell you: - ✅ If Space is running - ✅ What hardware it's using (CPU vs GPU) - ✅ If model files are uploaded - ✅ Common issues ### Step 2: Get the Actual Error (MOST IMPORTANT) Go to your Space and copy the error: 1. **Visit:** https://huggingface.co/spaces/nocapdev/my-gradio-momask 2. **Click:** "Logs" tab (top right) 3. **Scroll** to the bottom 4. **Copy** the last 30-50 lines **What to look for:** - Lines with `ERROR` or `Exception` - Lines with `Traceback` - The very last error message ### Step 3: Common Error Patterns #### Error A: "Model checkpoints not found" ``` ERROR: Model checkpoints not found! Looking for: ./checkpoints FileNotFoundError: [Errno 2] No such file or directory ``` **Cause:** Model files weren't uploaded to HF Space **Solution:** Upload the checkpoints (see below) #### Error B: "CUDA out of memory" ``` RuntimeError: CUDA out of memory torch.cuda.OutOfMemoryError ``` **Cause:** Model too large for GPU RAM **Solution:** Use larger GPU or optimize model #### Error C: "Killed" or "SIGKILL" ``` Killed Process finished with exit code 137 ``` **Cause:** Out of RAM (CPU memory) **Solution:** Upgrade Space RAM or optimize code #### Error D: Stuck at "Generating motion tokens..." ``` [1/4] Generating motion tokens... [No more output for 20+ minutes] ``` **Cause:** Using CPU (very slow, not an error!) **Solution:** Wait 20-30 minutes OR upgrade to GPU --- ## 🔧 Solutions for Common Issues ### Solution 1: Upload Model Checkpoints **If error shows:** `Model checkpoints not found` #### Option A: Upload via Git (for files <10GB) ```bash # Clone your Space git clone https://huggingface.co/spaces/nocapdev/my-gradio-momask cd my-gradio-momask # Install Git LFS (one time) git lfs install # Track large files git lfs track "checkpoints/**/*.tar" git lfs track "checkpoints/**/*.pth" git lfs track "checkpoints/**/*.npy" # Copy your checkpoints # FROM: C:\Users\purva\OneDrive\Desktop\momaskhg\checkpoints # TO: current directory cp -r /path/to/checkpoints ./ # Commit and push git add .gitattributes git add checkpoints/ git commit -m "Add model checkpoints" git push ``` #### Option B: Upload via HF Web UI 1. Go to: https://huggingface.co/spaces/nocapdev/my-gradio-momask/tree/main 2. Click "Add file" → "Upload files" 3. Drag your `checkpoints/` folder 4. Click "Commit" **Note:** This works for files <50MB. For larger files, use Git LFS. #### Option C: Host Models Separately Upload models to HF Model Hub, then download in app.py: ```python from huggingface_hub import snapshot_download # Add to app.py before initializing generator if not os.path.exists('./checkpoints'): print("Downloading models from HF Hub...") snapshot_download( repo_id="YOUR_USERNAME/momask-models", local_dir="./checkpoints" ) ``` --- ### Solution 2: Upgrade Hardware (for speed) If using CPU and it's too slow: 1. Go to: https://huggingface.co/spaces/nocapdev/my-gradio-momask/settings 2. Scroll to "Hardware" 3. Select: - **T4 small** (~$0.60/hour) - Good for this app - **A10G small** (~$3/hour) - Faster 4. Click "Save" 5. Wait for rebuild (~2 minutes) --- ### Solution 3: Test Locally First Before debugging on HF, test locally: ```powershell # 1. Test your setup python test_local.py # 2. Run app locally python app.py # 3. Visit http://localhost:7860 # 4. Try a prompt # 5. Check terminal for errors ``` **If it works locally but fails on HF:** - Models probably not uploaded to HF Space - Or HF Space using different Python/package versions --- ## 📋 Debugging Checklist Run through this checklist: ### ✅ Pre-deployment - [ ] `python test_local.py` passes - [ ] App works locally at http://localhost:7860 - [ ] Models in `./checkpoints/` directory - [ ] `python pre_deploy_check.py` shows 8/8 PASS ### ✅ Post-deployment - [ ] Space shows "Running" status - [ ] Logs show "Using device: cpu/cuda" - [ ] Logs show "Models loaded successfully!" - [ ] No error messages in logs ### ✅ During generation - [ ] Logs show "[1/4] Generating motion tokens..." - [ ] Logs show progress through [2/4], [3/4], [4/4] - [ ] No "Killed" or "SIGKILL" messages --- ## 🎯 Quick Diagnosis Commands ```powershell # Check HF Space status python debug_hf_space.py # Test local setup python test_local.py # Validate before deploy python pre_deploy_check.py # Deploy with latest fixes python deploy.py ``` --- ## 📊 Expected Logs (Healthy Run) ### Startup (should see this): ``` Using device: cpu (or cuda) Loading models... ✓ VQ model loaded ✓ Transformer loaded ✓ Residual model loaded ✓ Length estimator loaded Models loaded successfully! Running on local URL: http://0.0.0.0:7860 ``` ### During generation (should see this): ``` ====================================================================== Generating motion for: 'a person walks forward' ====================================================================== [1/4] Generating motion tokens... ✓ Generated 80 frames [2/4] Converting to BVH format... ✓ BVH conversion complete [3/4] Rendering video... ✓ Video saved to ./gradio_outputs/motion_12345.mp4 [4/4] Complete! ====================================================================== ``` --- ## 🆘 Still Stuck? ### Share these with me: 1. **Output from:** ```powershell python debug_hf_space.py ``` 2. **Last 50 lines from HF Space Logs** - Go to Logs tab - Copy from bottom - Include any ERROR or Traceback 3. **What you see in the browser** - Screenshot of the error - Or copy the error message Then I can give you the exact fix! --- ## 💡 Most Likely Issues (90% of cases) 1. **CPU is slow** (not an error!) - Logs show: "Using device: cpu" - Solution: Wait 20 mins OR upgrade to GPU 2. **Models not uploaded** - Logs show: "Model checkpoints not found" - Solution: Upload checkpoints to HF Space 3. **Out of memory** - Logs show: "Killed" or "SIGKILL" - Solution: Upgrade to more RAM Run `python debug_hf_space.py` first - it will identify which one!