Guardrail LabsGuardrail API

Guardrail API — Frequently Asked Questions

This page explains Guardrail in plain language. It's written for people who need to make decisions about AI safety and risk, even if they don't live in code every day: executives, security leaders, product owners, and advisors.

For deeper technical documentation, see the documentation page or enterprise contact if you need to talk through a specific environment.

For non-technical decision makers

What does the Guardrail API actually do?

Guardrail sits in the middle between your applications and your AI models. Every time an app tries to ask the model a question, the request goes through Guardrail first. Guardrail checks the request against safety and policy rules, and only lets safe, compliant traffic through. If something looks risky, it can block it or ask for clarification before the model ever sees it.

Is this a firewall?

It's a firewall specifically for AI language models. Traditional firewalls look at network traffic like IP addresses and ports. Guardrail looks at the actual language going into and coming out of your models. It's focused on things like prompt injection, data leaks, policy violations, and abuse of tools or agents.

Does Guardrail replace our AI models or vendors?

No. Guardrail doesn't replace your models. You keep using OpenAI, Azure, Google, Anthropic, or your own internal models. Guardrail is a safety and governance layer that sits in front of them. It standardizes how you apply rules, how you log decisions, and how you see what's happening across all of your models.

Will this slow everything down?

Guardrail is designed to run in real time, in-band with your traffic. For normal, clearly safe requests, the overhead is small—similar to putting an API gateway or authentication layer in front of a service. When something is ambiguous, Guardrail can take longer because it may need to ask a verifier to review the intent. That's by design: suspicious traffic should get more attention than routine requests.

What kinds of problems does this actually prevent?

  • Prompt injection: users or external content trying to trick the model into ignoring your instructions.
  • Data exfiltration: prompts that try to make the model reveal sensitive or internal data.
  • Policy violations: requests or responses that would violate GDPR, HIPAA, internal usage policies, or sector-specific rules.
  • Unsafe tool use: attempts to get an AI agent to run dangerous actions by hiding instructions in text, files, or web content.

We already have security tools. Why do we need this?

Existing tools like firewalls, WAFs, and endpoint protection don't understand the content of AI prompts and responses. They see network traffic, not intent. Guardrail fills that gap. It's designed to work alongside your existing controls and plug into your logging and SIEM stack, not to replace them.

For security, risk, and compliance leaders

What do you mean by 'clarify-first' decisions?

Clarify-first means we don't guess about risky intent. If a prompt could be harmless or harmful depending on context, Guardrail treats it as unclear. Instead of silently blocking or allowing it, Guardrail asks for clarification or routes it to a verifier for intent assessment. Only once the intent is clear does it move forward or get blocked.

How does Guardrail help with regulations like GDPR, HIPAA, or the EU AI Act?

Guardrail uses policy packs—structured rule sets—to encode regulatory and internal requirements. These packs can express rules like "don't output unredacted PII," "don't route this data outside a region," or "don't answer this class of questions for this user group." Over time, those packs can evolve as regulations change, without you having to re-implement the logic in every application.

What is the 'dual-arm' model you mention?

Guardrail treats traffic into the model (ingress) and traffic coming out of the model (egress) as two separate "arms." Each arm can enforce its own rules and remain effective even if the other arm has a problem. That means an issue in output filtering doesn't silently disable input checks, and vice versa. It's about reducing single points of failure in your AI safety layer.

What does the verifier do, and does it run untrusted code?

The verifier is a separate service that uses LLMs to assess intent and risk. It never executes untrusted code or tools. When the core runtime is unsure whether a request is safe, it sends a description of the situation to the verifier. The verifier's only job is to answer questions like "Is running this request likely to be harmful or violate policy?"—not to carry out the request itself.

What kind of audit trail does Guardrail provide?

Every decision can be tagged with IDs, reasons, and headers that your logging and observability stack can ingest. That means you can answer questions like "Why was this request blocked?" or "Who changed this policy and when?" without digging through raw model logs. The goal is to make AI behavior explainable to auditors and risk teams in a way that fits existing processes.

For engineers and architects

How does Guardrail sit in the architecture?

In most setups, your applications call Guardrail instead of calling an LLM endpoint directly. Guardrail then forwards only safe, policy-aligned traffic to your chosen providers. From an engineer's perspective, it looks a lot like an API gateway focused on language and safety rather than just routing.

What does integration look like in code?

Integration is usually a small change to where your app points its chat or completion requests. Instead of hitting api.provider.com, you hit api.guardrailapi.com (or your internal deployment). You include your Guardrail tenant identifiers and any metadata needed for policy decisions. The rest of your stack—model selection, prompts, tools—can be introduced progressively.

Can it support multiple model providers?

Yes. Guardrail is designed to front multiple providers (for example, OpenAI, Azure, Google, Anthropic, or internal models). It's common to use Guardrail as a single enforcement and observability layer while the underlying models evolve over time.

What happens if Guardrail is unavailable?

In high-stakes environments, you typically run Guardrail as part of your core platform, with the same availability and redundancy expectations as your gateways and identity providers. Exact failure behavior—fail-open vs fail-closed—is a design choice for each deployment and is something we expect to discuss explicitly during architecture reviews.

Where can I find more technical detail?

The high-level concepts, pseudo-code examples, and ecosystem overview live on the documentation page. For enterprise deployments, we expect to provide deeper architecture guides, threat models, and reference configurations under NDA as the product matures.