# 🐛 How to Debug Your HF Space

## Your Situation
✅ Deployed successfully
⏳ Took long time to respond
❌ Finally showed error

---

## 🎯 Step-by-Step Debugging

### Step 1: Run Local Diagnosis (30 seconds)

```powershell
# Check your HF Space status
python debug_hf_space.py
```

This will tell you:
- ✅ If Space is running
- ✅ What hardware it's using (CPU vs GPU)
- ✅ If model files are uploaded
- ✅ Common issues

### Step 2: Get the Actual Error (MOST IMPORTANT)

Go to your Space and copy the error:

1. **Visit:** https://huggingface.co/spaces/nocapdev/my-gradio-momask
2. **Click:** "Logs" tab (top right)
3. **Scroll** to the bottom
4. **Copy** the last 30-50 lines

**What to look for:**
- Lines with `ERROR` or `Exception`
- Lines with `Traceback`
- The very last error message

### Step 3: Common Error Patterns

#### Error A: "Model checkpoints not found"
```
ERROR: Model checkpoints not found!
Looking for: ./checkpoints
FileNotFoundError: [Errno 2] No such file or directory
```

**Cause:** Model files weren't uploaded to HF Space
**Solution:** Upload the checkpoints (see below)

#### Error B: "CUDA out of memory"
```
RuntimeError: CUDA out of memory
torch.cuda.OutOfMemoryError
```

**Cause:** Model too large for GPU RAM
**Solution:** Use larger GPU or optimize model

#### Error C: "Killed" or "SIGKILL"
```
Killed
Process finished with exit code 137
```

**Cause:** Out of RAM (CPU memory)
**Solution:** Upgrade Space RAM or optimize code

#### Error D: Stuck at "Generating motion tokens..."
```
[1/4] Generating motion tokens...
[No more output for 20+ minutes]
```

**Cause:** Using CPU (very slow, not an error!)
**Solution:** Wait 20-30 minutes OR upgrade to GPU

---

## 🔧 Solutions for Common Issues

### Solution 1: Upload Model Checkpoints

**If error shows:** `Model checkpoints not found`

#### Option A: Upload via Git (for files <10GB)

```bash
# Clone your Space
git clone https://huggingface.co/spaces/nocapdev/my-gradio-momask
cd my-gradio-momask

# Install Git LFS (one time)
git lfs install

# Track large files
git lfs track "checkpoints/**/*.tar"
git lfs track "checkpoints/**/*.pth"
git lfs track "checkpoints/**/*.npy"

# Copy your checkpoints
# FROM: C:\Users\purva\OneDrive\Desktop\momaskhg\checkpoints
# TO: current directory
cp -r /path/to/checkpoints ./

# Commit and push
git add .gitattributes
git add checkpoints/
git commit -m "Add model checkpoints"
git push
```

#### Option B: Upload via HF Web UI

1. Go to: https://huggingface.co/spaces/nocapdev/my-gradio-momask/tree/main
2. Click "Add file" → "Upload files"
3. Drag your `checkpoints/` folder
4. Click "Commit"

**Note:** This works for files <50MB. For larger files, use Git LFS.

#### Option C: Host Models Separately

Upload models to HF Model Hub, then download in app.py:

```python
from huggingface_hub import snapshot_download

# Add to app.py before initializing generator
if not os.path.exists('./checkpoints'):
    print("Downloading models from HF Hub...")
    snapshot_download(
        repo_id="YOUR_USERNAME/momask-models",
        local_dir="./checkpoints"
    )
```

---

### Solution 2: Upgrade Hardware (for speed)

If using CPU and it's too slow:

1. Go to: https://huggingface.co/spaces/nocapdev/my-gradio-momask/settings
2. Scroll to "Hardware"
3. Select:
   - **T4 small** (~$0.60/hour) - Good for this app
   - **A10G small** (~$3/hour) - Faster
4. Click "Save"
5. Wait for rebuild (~2 minutes)

---

### Solution 3: Test Locally First

Before debugging on HF, test locally:

```powershell
# 1. Test your setup
python test_local.py

# 2. Run app locally
python app.py

# 3. Visit http://localhost:7860
# 4. Try a prompt
# 5. Check terminal for errors
```

**If it works locally but fails on HF:**
- Models probably not uploaded to HF Space
- Or HF Space using different Python/package versions

---

## 📋 Debugging Checklist

Run through this checklist:

### ✅ Pre-deployment
- [ ] `python test_local.py` passes
- [ ] App works locally at http://localhost:7860
- [ ] Models in `./checkpoints/` directory
- [ ] `python pre_deploy_check.py` shows 8/8 PASS

### ✅ Post-deployment
- [ ] Space shows "Running" status
- [ ] Logs show "Using device: cpu/cuda"
- [ ] Logs show "Models loaded successfully!"
- [ ] No error messages in logs

### ✅ During generation
- [ ] Logs show "[1/4] Generating motion tokens..."
- [ ] Logs show progress through [2/4], [3/4], [4/4]
- [ ] No "Killed" or "SIGKILL" messages

---

## 🎯 Quick Diagnosis Commands

```powershell
# Check HF Space status
python debug_hf_space.py

# Test local setup
python test_local.py

# Validate before deploy
python pre_deploy_check.py

# Deploy with latest fixes
python deploy.py
```

---

## 📊 Expected Logs (Healthy Run)

### Startup (should see this):
```
Using device: cpu  (or cuda)
Loading models...
✓ VQ model loaded
✓ Transformer loaded
✓ Residual model loaded
✓ Length estimator loaded
Models loaded successfully!
Running on local URL: http://0.0.0.0:7860
```

### During generation (should see this):
```
======================================================================
Generating motion for: 'a person walks forward'
======================================================================
[1/4] Generating motion tokens...
✓ Generated 80 frames
[2/4] Converting to BVH format...
✓ BVH conversion complete
[3/4] Rendering video...
✓ Video saved to ./gradio_outputs/motion_12345.mp4
[4/4] Complete!
======================================================================
```

---

## 🆘 Still Stuck?

### Share these with me:

1. **Output from:**
   ```powershell
   python debug_hf_space.py
   ```

2. **Last 50 lines from HF Space Logs**
   - Go to Logs tab
   - Copy from bottom
   - Include any ERROR or Traceback

3. **What you see in the browser**
   - Screenshot of the error
   - Or copy the error message

Then I can give you the exact fix!

---

## 💡 Most Likely Issues (90% of cases)

1. **CPU is slow** (not an error!)
   - Logs show: "Using device: cpu"
   - Solution: Wait 20 mins OR upgrade to GPU

2. **Models not uploaded**
   - Logs show: "Model checkpoints not found"
   - Solution: Upload checkpoints to HF Space

3. **Out of memory**
   - Logs show: "Killed" or "SIGKILL"
   - Solution: Upgrade to more RAM

Run `python debug_hf_space.py` first - it will identify which one!