← Research & Evidence

Insurance claims pilot on GCP TDX: what we verified

Can a realistic insurance workflow run in a confidential AI environment and produce evidence that is independently verifiable afterward? Here is what we found.

Borys Tsyrulnikov · April 2026
Read ↓
Three verified results — emerald squares on baseline

This pilot was designed to answer a simple question: can a realistic insurance workflow run in a confidential AI environment and produce evidence that is independently verifiable afterward?

The result was encouraging, but the useful part is the exact scope.

In the pilot, an insurance-claims workflow ran on GCP TDX. Three confidential requests completed successfully. Three AIR receipts were collected and verified offline.

That is the factual outcome. It is strong because it is specific. It is also important not to overstate it.

What we verified

First, the workflow executed inside a confidential-computing environment rather than a normal application container. That is environment trust.

Second, the workload identity verification path was active. The verification chain checked policy-bound workload properties rather than relying only on a generic "trusted host" assumption.

Third, the receipt path worked end to end. Each completed confidential request produced an AIR receipt that could be verified offline. That matters because it turns the pilot into evidence, not just a runtime demo.

Fourth, the model identity path was bound through the current model-hash design. The receipt and verification flow covered the model identity claim used by the pilot configuration.

Request arrow entering enclave, verified output emerging
The complete evidence loop: confidential execution, receipt generation, and offline verification.

What we did not claim

We did not claim that the pilot proves all confidential AI risks are solved.

We did not claim cryptographic proof of deletion.

We did not claim that hardware attestation alone proves the exact model behavior.

We did not claim that one pilot equals full production readiness across every platform and every inference runtime.

Those distinctions matter because pilot evidence should reduce uncertainty, not create new ambiguity.

The right interpretation is not "pilot complete, problem solved." The right interpretation is that the trust model is concrete enough to support a real workflow, not just a toy demo.

The complete evidence loop

The most useful takeaway is that the pilot demonstrated a complete evidence loop:

That is the kind of result a security or compliance stakeholder can inspect concretely.

For regulated workflows, this is more useful than a generic benchmark or a control-plane screenshot. It shows that a workflow can produce a verifiable artifact tied to the actual inference event.

What "confidential AI" should mean in practice

It also helps clarify what "confidential AI" should mean in practice. It is not enough to say the workload used confidential infrastructure. The higher-value question is whether the workflow also produces evidence that can be checked independently later.

That is what the pilot was really testing.

So the right summary is:

For teams evaluating confidential AI, that is a much more useful milestone than a generic promise about privacy.