Spaces:
Running
Running
File size: 3,010 Bytes
e7a095d 992967a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | The two scripts you provided are nearly identical in structure, but the second script contains significant **safety enhancements** and **robustness fixes** for the `della_magprune` method.
Here are the specific differences:
### 1. Parameter Validation (Safety Guards)
The second script adds a "Safety Guard" block at the start of the `della_magprune` function. This prevents the function from crashing or producing invalid results if the input parameters (`density` or `epsilon`) are mathematically impossible.
* **Density Clipping:** It ensures `density` stays within a valid range (between `1e-4` and `1-1e-4`).
* **Epsilon Adjustment:** It automatically shrinks `epsilon` if it is too large. Since the algorithm calculates probabilities as `density +/- epsilon`, an epsilon that is too large would result in probabilities greater than 1 or less than 0. The second script forces epsilon to be within a safe bound.
### 2. Division by Zero Protection
In the rank normalization step of `della_magprune`:
* **Script 1:** `rank_norm = ((ranks - min_ranks) / (max_ranks - min_ranks))`
* **Script 2:** `rank_norm = ((ranks - min_ranks) / (max_ranks - min_ranks).clamp(min=1e-8))`
* **Impact:** If a tensor has only one unique value (meaning `max_ranks == min_ranks`), the first script would divide by zero and produce `NaN` values. The second script uses `.clamp(min=1e-8)` to ensure the denominator is never zero.
### 3. Probability Clipping
In the final step of generating the mask:
* **Script 1:** `probs = (density - epsilon) + rank_norm * 2 * epsilon`
* **Script 2:** `probs = (density - epsilon) + rank_norm * 2 * epsilon` followed by `torch.bernoulli(probs.clamp(0, 1))`
* **Impact:** Even with the epsilon guards, floating-point errors could theoretically push a probability slightly outside the $[0, 1]$ range. The second script adds `.clamp(0, 1)` to the Bernoulli input to ensure PyTorch does not throw an error.
### 4. Logic Flow in `della_magprune`
* **Script 1** has a check: `if density + epsilon >= 1 or density - epsilon <= 0: raise ValueError(...)`. This causes the script to **crash** if the parameters are bad.
* **Script 2** removes that `ValueError` and replaces it with the "Safety Guard" logic mentioned above. Instead of crashing, it **corrects** the values and continues running.
### Summary Table
| Feature | Script 1 | Script 2 |
| :--- | :--- | :--- |
| **Bad Inputs** | Crashes with `ValueError` | Automatically fixes/clips values |
| **Single-value Tensors** | May produce `NaN` (Div by 0) | Safe (Clamped denominator) |
| **Bernoulli Stability** | Risk of out-of-bounds error | Guaranteed $[0, 1]$ range |
| **Reliability** | Experimental/Strict | Production-ready/Robust |
**Recommendation:** Use the **second script**. It is a more mature version of the code designed to handle edge cases and prevent runtime failures during automated optimization or training.
## Note
This patch is required for Gemma 4 31B merges. |