Spaces:
Running
The two scripts you provided are nearly identical in structure, but the second script contains significant safety enhancements and robustness fixes for the della_magprune method.
Here are the specific differences:
1. Parameter Validation (Safety Guards)
The second script adds a "Safety Guard" block at the start of the della_magprune function. This prevents the function from crashing or producing invalid results if the input parameters (density or epsilon) are mathematically impossible.
- Density Clipping: It ensures
densitystays within a valid range (between1e-4and1-1e-4). - Epsilon Adjustment: It automatically shrinks
epsilonif it is too large. Since the algorithm calculates probabilities asdensity +/- epsilon, an epsilon that is too large would result in probabilities greater than 1 or less than 0. The second script forces epsilon to be within a safe bound.
2. Division by Zero Protection
In the rank normalization step of della_magprune:
- Script 1:
rank_norm = ((ranks - min_ranks) / (max_ranks - min_ranks)) - Script 2:
rank_norm = ((ranks - min_ranks) / (max_ranks - min_ranks).clamp(min=1e-8)) - Impact: If a tensor has only one unique value (meaning
max_ranks == min_ranks), the first script would divide by zero and produceNaNvalues. The second script uses.clamp(min=1e-8)to ensure the denominator is never zero.
3. Probability Clipping
In the final step of generating the mask:
- Script 1:
probs = (density - epsilon) + rank_norm * 2 * epsilon - Script 2:
probs = (density - epsilon) + rank_norm * 2 * epsilonfollowed bytorch.bernoulli(probs.clamp(0, 1)) - Impact: Even with the epsilon guards, floating-point errors could theoretically push a probability slightly outside the $[0, 1]$ range. The second script adds
.clamp(0, 1)to the Bernoulli input to ensure PyTorch does not throw an error.
4. Logic Flow in della_magprune
- Script 1 has a check:
if density + epsilon >= 1 or density - epsilon <= 0: raise ValueError(...). This causes the script to crash if the parameters are bad. - Script 2 removes that
ValueErrorand replaces it with the "Safety Guard" logic mentioned above. Instead of crashing, it corrects the values and continues running.
Summary Table
| Feature | Script 1 | Script 2 |
|---|---|---|
| Bad Inputs | Crashes with ValueError |
Automatically fixes/clips values |
| Single-value Tensors | May produce NaN (Div by 0) |
Safe (Clamped denominator) |
| Bernoulli Stability | Risk of out-of-bounds error | Guaranteed $[0, 1]$ range |
| Reliability | Experimental/Strict | Production-ready/Robust |
Recommendation: Use the second script. It is a more mature version of the code designed to handle edge cases and prevent runtime failures during automated optimization or training.
Note
This patch is required for Gemma 4 31B merges.