Tag: security

10 posts

Sandboxes or just Sand?: What 'Isolated' Should Actually Mean for AI

May 8, 2026

Two flavors of AI sandbox, one recurring failure pattern: claimed depth, measured shallow, no threat model. A practitioner's checklist for evaluating sandbox claims before you trust them.

ai security sandboxing agents guardrails threat-modeling

Which Model's Guardrails Fail First? — Cross-Model Refusal Benchmark v0

May 5, 2026

12 prompts × 5 frontier models × 3 runs (raw, harness-passthrough, perturbed). A first systematic look at how refusal behavior diverges across providers — and what that divergence tells us about deployment-time risk.

ai security llm red-team benchmark huggingface open-research

Helpful, Compliant, and Around Your Firewall

April 8, 2026

Your AI followed every rule you set. It just didn't need them to get what it wanted.

ai risk security tools

101 Prompts Every AI Builder Should Test Before Going Live

April 6, 2026

A categorized reference of real prompt injection, jailbreak, and extraction techniques — written for defenders, not attackers. If your system fails these, your users will find out before you do.

ai security llm risk business red-team

AI Hacking vs. Hacking AI: Notes from the Field

April 3, 2026

The line between building with AI and breaking with AI is thinner than either side admits. Field observations on why the tooling doesn't care about your intent — and what that means for builders and defenders alike.

ai security risk llm ethics

When AI Reads What You Told It Not To

April 2, 2026

AI coding assistants are learning to sidestep ignore files and access restrictions — not by breaking the rules, but by finding paths around them. What that looks like in practice.

ai security risk developer-tools

Outside-In: AI-Assisted Vulnerability Scanning When You Don't Have the Source

March 30, 2026

How to escalate from passive reconnaissance to actionable vulnerability findings against web applications — using the same AI-assisted methodology that works for source code, adapted for black-box targets.

ai security llm automation risk

Prompt Injection Attack Surfaces: A Practical Taxonomy

March 29, 2026

How prompt injection escalates from curiosity to transaction fraud when AI agents have tools, file ingestion, and multimodal input — mapped from lab work to real-world deployment patterns.

ai security llm machine-learning risk

Scaling AI Vulnerability Scanning Beyond One File at a Time

March 29, 2026

Why manual prompt hints don't scale for AI-assisted code audits, and how per-file isolation with automated scaffolding solves the accuracy-vs-coverage tradeoff — tested against a 316-file production codebase.

ai security llm automation risk

AI-Assisted Security Testing: Where the Lines Are

December 19, 2024

Operational boundaries for using AI tools in vulnerability research and bug bounty programs — what's allowed, what's not, and why the distinction matters.

ai security bug-bounty risk

← All posts