Private AI for Law Firms:Keep Privilege Intact
How to give your lawyers every leading model without sending a single privileged document to a third party.
Generative AI has become indispensable for document review, contract analysis and legal research. But for a law firm, the convenience of pasting a privileged document into a public chatbot carries a specific, under-appreciated risk: it can be construed as disclosure to a third party — and a court may treat that disclosure as a waiver of attorney-client privilege.
This guide explains the privilege problem in concrete terms, then walks through the architecture that lets a firm use every leading model — Claude, GPT, Gemini, and self-hosted open models — without ever letting client data leave infrastructure the firm controls.
The RiskWhy public AI quietly waives privilege
Attorney-client privilege protects confidential communications made for the purpose of obtaining legal advice. The doctrine is fragile: privilege can be waived by voluntary disclosure to anyone outside the privileged relationship. When a lawyer pastes a confidential memo into a consumer AI tool, the document is transmitted to, processed by, and frequently retained on a vendor's servers.
- The document leaves the firm's custody and control.
- The vendor — a third party — now holds the privileged content.
- Default consumer terms may permit the provider to retain inputs or use them to improve models.
- There is typically no audit record of exactly what the model saw.
The moment a privileged document is pasted into a public model, it leaves your control — and opposing counsel can argue the privilege was waived. The safest legal posture is to ensure inference never happens outside infrastructure the firm owns or controls.
DefinitionsWhat "private AI" actually means
"Private AI" is an overloaded term. Vendors use it to mean very different things, and the differences matter enormously for privilege. Three models are common:
| Deployment model | Where data goes | Privilege posture |
|---|---|---|
| Consumer chatbot (free/Pro tier) | Vendor servers; may be retained / used for training | Weak — treat as public disclosure |
| Enterprise API with zero-retention terms | Vendor servers; not retained, not trained on | Stronger — contractual, but data still transits a third party |
| Self-hosted / VPC inference on your own infrastructure | Never leaves your boundary | Strongest — no third-party custody |
A defensible firm posture combines both ends of this table: open-weight models (Llama, Mistral, Qwen) self-hosted for the most sensitive work, and closed APIs reached only over isolated egress with contractual zero-retention terms for everything else.
ArchitectureA reference architecture for privilege-grade AI
The goal is simple to state and harder to engineer: the model can answer questions over the firm's documents, but the documents never leave a boundary the firm controls, and every interaction is logged. A practical pipeline looks like this:
- Ingestion — pleadings, contracts and case files are indexed into a private vector store (e.g. Qdrant + PostgreSQL) running inside the firm's environment, encrypted at rest with keys the firm holds.
- Authorisation — every query passes a zero-trust gateway that resolves the user, their role, and the specific matter they are cleared to see before any retrieval runs.
- Retrieval — vector search returns only document chunks that pass row-level permission checks scoped to client and matter, so one client's files can never surface in another's query.
- Sanitisation — names, IDs and contact details are masked at the chunk level before any context is assembled for a model.
- Inference — open models run on GPUs in the firm's region; closed APIs are reached over isolated egress with zero-retention terms.
- Audit — every query, document touch and token is written to a tamper-evident log with a content hash, producing the defensible record regulators and opposing counsel expect.
The strongest guarantee is architectural. If cross-matter access is impossible because the retrieval layer enforces permission filters at query time, you are not relying on a promise that staff will behave — you are relying on the system being unable to misbehave.
EconomicsIs private AI actually affordable for a 30-lawyer firm?
The common objection is cost. In practice, a self-hosted deployment for a small-to-mid firm is far cheaper than most partners assume. As a public reference point, a documented engagement to self-host a 70B-parameter open model for a law firm (Llama 3 70B on vLLM with a vector store) ran at roughly $1,200/month in hosting on top of a one-time setup. The setup, not the compute, is where the value and the risk concentrate.
That is the gap our compliance offering is built for: a managed setup plus ongoing advisory, where the firm pays its cloud provider directly and we never take custody of regulated data.
ChecklistA privilege-preservation checklist
- Inference for privileged matters never leaves infrastructure the firm controls.
- Encryption keys are held by the firm, not the vendor.
- Matter-level access controls are enforced in the retrieval layer, not just in the UI.
- Every AI interaction is written to an immutable, tamper-evident audit log.
- A signed DPA (and a BAA where health data is involved) is in place with any processor.
- Closed-API usage is governed by contractual zero-retention and no-training terms.
- Staff have a clear, written policy on which tools are approved for privileged work.
Frequently asked questions
It can. Privilege can be waived by voluntary disclosure to a third party. Pasting privileged content into a public, consumer AI tool transmits that content to a vendor's servers and may be construed as disclosure. The risk is avoided when inference happens on infrastructure the firm controls, with no third-party custody of the data.
Yes. A self-hosted open-model deployment for a small-to-mid firm typically costs on the order of low-thousands of dollars per month in hosting plus a one-time setup. The firm pays its cloud provider directly; the larger value is in the secure setup and ongoing advisory, not the raw compute.
Zero-retention terms are a contractual promise that a closed-API vendor will not store or train on your inputs — the data still transits the vendor. Self-hosting means inference runs on your own infrastructure and the data never leaves your boundary. For the most sensitive privileged work, self-hosting gives the strongest posture; zero-retention API access is a reasonable second tier for less sensitive tasks.
Matter-level isolation must be enforced in the retrieval layer. The vector store returns document chunks only after row-level permission checks against client and matter scope, so cross-matter access is impossible by construction rather than by policy.
Bring private AI to your regulated workload
We'll walk your team through the architecture, the contracts and the controls — against your actual requirements, not a generic deck.