Spaces:

Naphula
/

model_tools

Running

App Files Files Community

model_tools / sparsify_v3_notes.md

Naphula

Update sparsify_v3_notes.md

992967a verified 4 days ago

preview code

raw

history blame contribute delete

3.01 kB

The two scripts you provided are nearly identical in structure, but the second script contains significant safety enhancements and robustness fixes for the della_magprune method.

Here are the specific differences:

1. Parameter Validation (Safety Guards)

The second script adds a "Safety Guard" block at the start of the della_magprune function. This prevents the function from crashing or producing invalid results if the input parameters (density or epsilon) are mathematically impossible.

Density Clipping: It ensures density stays within a valid range (between 1e-4 and 1-1e-4).
Epsilon Adjustment: It automatically shrinks epsilon if it is too large. Since the algorithm calculates probabilities as density +/- epsilon, an epsilon that is too large would result in probabilities greater than 1 or less than 0. The second script forces epsilon to be within a safe bound.

2. Division by Zero Protection

In the rank normalization step of della_magprune:

Script 1: rank_norm = ((ranks - min_ranks) / (max_ranks - min_ranks))
Script 2: rank_norm = ((ranks - min_ranks) / (max_ranks - min_ranks).clamp(min=1e-8))
Impact: If a tensor has only one unique value (meaning max_ranks == min_ranks), the first script would divide by zero and produce NaN values. The second script uses .clamp(min=1e-8) to ensure the denominator is never zero.

3. Probability Clipping

In the final step of generating the mask:

Script 1: probs = (density - epsilon) + rank_norm * 2 * epsilon
Script 2: probs = (density - epsilon) + rank_norm * 2 * epsilon followed by torch.bernoulli(probs.clamp(0, 1))
Impact: Even with the epsilon guards, floating-point errors could theoretically push a probability slightly outside the $[0, 1]$ range. The second script adds .clamp(0, 1) to the Bernoulli input to ensure PyTorch does not throw an error.

4. Logic Flow in `della_magprune`

Script 1 has a check: if density + epsilon >= 1 or density - epsilon <= 0: raise ValueError(...). This causes the script to crash if the parameters are bad.
Script 2 removes that ValueError and replaces it with the "Safety Guard" logic mentioned above. Instead of crashing, it corrects the values and continues running.

Summary Table

Feature	Script 1	Script 2
Bad Inputs	Crashes with `ValueError`	Automatically fixes/clips values
Single-value Tensors	May produce `NaN` (Div by 0)	Safe (Clamped denominator)
Bernoulli Stability	Risk of out-of-bounds error	Guaranteed $[0, 1]$ range
Reliability	Experimental/Strict	Production-ready/Robust

Recommendation: Use the second script. It is a more mature version of the code designed to handle edge cases and prevent runtime failures during automated optimization or training.

Note

This patch is required for Gemma 4 31B merges.