bitlabsdb
/

bad-probe-llama-2-7b-chat-hf-L25-raw

interpretability

Model card Files Files and versions

BAD Classifier (FairSteer): Llama-2-7b-chat-hf Layer 25 (RAW Mode)

This Biased Activation Detector (BAD) was trained using the FairSteer methodology.

Model Metadata

Base Model: meta-llama/Llama-2-7b-chat-hf
Optimal Layer: 25
Validation Accuracy: 75.76%
Extraction Mode: RAW (Directly matches FairSteer GitHub logic)
Protocol: 1:1 Balanced Undersampling (Scenario-Grouped)

Usage

Extract residual stream activation [:, -1, :] from Layer 25.
Pass raw activation directly to model.safetensors.

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support