Which Model's Guardrails Fail First? — Cross-Model Refusal Benchmark v0
12 prompts × 5 frontier models × 3 runs (raw, harness-passthrough, perturbed). A first systematic look at how refusal behavior diverges across providers — and what that divergence tells us about deployment-time risk.