BAD Classifier (FairSteer): Llama-2-7b-chat-hf Layer 25 (RAW Mode)
This Biased Activation Detector (BAD) was trained using the FairSteer methodology.
Model Metadata
- Base Model:
meta-llama/Llama-2-7b-chat-hf - Optimal Layer: 25
- Validation Accuracy: 75.76%
- Extraction Mode: RAW (Directly matches FairSteer GitHub logic)
- Protocol: 1:1 Balanced Undersampling (Scenario-Grouped)
Usage
- Extract residual stream activation
[:, -1, :]from Layer 25. - Pass raw activation directly to
model.safetensors.
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support