113 stories
·
0 followers

Combining NVIDIA DGX Spark + Apple Mac Studio for 4x Faster LLM Inference with EXO 1.0

1 Share

We recently received early access to 2 NVIDIA DGX Spark™ units. NVIDIA calls it the world's smallest AI supercomputer. It has ~100 TFLOPs of FP16 performance with 128GB of CPU-GPU coherent memory at 273 GB/s.

With EXO, we've been running LLMs on clusters of Apple Mac Studios with M3 Ultra chips. The Mac Studio has 512GB of unified memory at 819 GB/s, but the GPU only has ~26 TFLOPs of FP16 performance.

The DGX Spark has 4x the compute, the Mac Studio has 3x the memory bandwidth.

What if we combined them? What if we used DGX Spark for what it does best and Mac Studio for what it does best, in the same inference request?

NVIDIA DGX Spark™ early access units

NVIDIA DGX Spark™ early access units (with quality control supervisor)

Mac Studio M3 Ultra stack

Mac Studio M3 Ultra stack used for LLM inference with EXO

What Determines LLM Inference Performance?

What you see as a user boils down to two numbers:

  • TTFT (time‑to‑first‑token): delay from sending a prompt to seeing the first token.
  • TPS (tokens per second): cadence of tokens after the first one appears.

Everything we do in the system exists to improve those two numbers. The reason they're hard to optimize together is that they're governed by two different phases of the same request: prefill and decode.

The lifecycle of a request (from the user's point of view)

  1. You send a prompt.
  2. You wait. Nothing appears. This is the prefill phase, and it determines TTFT.
  3. The first token appears.
  4. A stream of tokens follows. This is the decode phase, and it determines TPS.

What's happening under the hood in those two phases, and why do they behave so differently?

Figure 1: Request lifecycle showing prefill phase (yellow, determines TTFT) followed by decode phase (blue, determines TPS)

Prefill

is compute-bound

Prefill processes the prompt and builds a KV cache for each transformer layer. The KV cache consists of a bunch of vectors for each token in the prompt.

These vectors are stored during prefill so we don't need to recompute them during decode.

For large contexts, the amount of compute grows quadratically with the prompt length (Θ(s²)) since every token needs to attend to all the other tokens in the prompt.

With modern techniques like Flash Attention, the data moved can be made to grow linearly with the prompt length (Θ(s)).

So the ratio between the compute and the data moved, i.e. the arithmetic intensity, is linear in the prompt length.

This makes prefill with large contexts compute-bound.

Decode

is memory‑bound

Decode is the auto‑regressive loop after prefill. Each step generates one token by attending against the entire KV cache built so far.

In decode, we are doing vector-matrix multiplications which have lower arithmetic intensity than matrix-matrix multiplications.

This makes decode memory-bound.

Use different hardware for each phase

Once you separate the phases, the hardware choice is clear.

  • Prefill → high compute device.
  • Decode → high memory-bandwidth device.

Prefill

on DGX Spark, transfer KV,

decode

on M3 Ultra

If you prefill on one device and decode on another, you must send the KV cache across the network. The naive approach is to run prefill, wait for it to finish, transfer the KV cache, then start decode.

Figure 2: Naive split showing prefill (yellow), KV transfer (green), then decode (blue)

This adds a communication cost between the two phases. If the transfer time is too large, you lose the benefit.

Overlap communication with compute

The KV cache doesn't have to arrive as one blob at the end. It can arrive layer by layer.

As soon as Layer 1's prefill completes, two things happen simultaneously. Layer 1's KV starts transferring to the M3 Ultra, and Layer 2's prefill begins on the DGX Spark. The communication for each layer overlaps with the computation of subsequent layers.

Figure 3: Layer-by-layer pipeline showing prefill (yellow) and KV transfer (green) overlapping across layers. Decode (blue) starts immediately when all layers complete.

In practice, EXO transfers the KV vectors of a layer while the layer is being processed, since the KV vectors are computed before the heavy compute operations. To hide the communication overhead, we just need the layer processing time (tcomp) to be larger than the KV transfer time (tsend).

Full overlap is possible when the context is large enough

The compute time is tcomp = F / P, where F is the FLOPs per layer and P is machine FLOPs/s. For large contexts, F scales quadratically: F ∼ c1, where c1 is a model-dependent constant.

The transfer time is tsend = D / B, where D is KV data in bits and B is network bandwidth in bits/s. The KV cache has a constant number of vectors per token, so D ∼ q·c2·s, where q is quantization (4-bit, 8-bit, etc.) and c2 is model-dependent.

To fully hide communication, we need the transfer time to be less than the compute time: tsend < tcomp. This means P/B < F/(q·D) ∼ (c1/c2)·s/q. With DGX Spark at 100 TFLOPs FP16 and 10 GbE (10 Gbps) link between the DGX Spark and the M3 Ultra, the ratio P/B = 10,000. This means we need s > 10,000q/(c1/c2).

The constant K = c1/c2 depends on the attention architecture. For older models with multi-head attention (MHA) like Llama-2 7B, K = 2. For models with grouped query attention (GQA), K is larger: Llama-3 8B has K = 8, while Llama-3 70B and Qwen-2.5 72B have K = 16.

With 8-bit KV streaming and K = 16 (Llama-3 70B), the threshold is s > 5k tokens. For K = 8 (Llama-3 8B), it's s > 10k tokens. For K = 2 (Llama-2 7B), it's s > 40k tokens.

Benchmark results: Llama-3.1 8B with 8k context

Running Llama-3.1 8B (FP16) with an 8,192 token prompt and generating 32 tokens:

Configuration

Prefill

Time

Generation

Time
Total Time Speedup
DGX Spark 1.47s 2.87s 4.34s 1.9×
M3 Ultra Mac Studio 5.57s 0.85s 6.42s 1.0× (baseline)
DGX Spark + M3 Ultra 1.47s 0.85s 2.32s 2.8×

The combined setup achieves the best of both worlds: DGX Spark's fast prefill (3.8× faster than M3 Ultra) and M3 Ultra's fast generation (3.4× faster than DGX Spark), delivering 2.8× overall speedup compared to M3 Ultra alone.

EXO 1.0 does this automagically

Disaggregated prefill and decode, layer-by-layer KV streaming, and hardware-aware phase placement are all automated in EXO.

When you start EXO, it automatically discovers all devices connected in your ad-hoc mesh network and profiles each for compute throughput, memory bandwidth, memory capacity, and network characteristics.

Given a model and your topology, EXO plans which device should handle prefill, which should handle decode, whether to pipeline across layers, when to stream KV, and how to adapt if network conditions change. You don't write the schedule. You don't compute the thresholds. You just run the model, and EXO figures out how to make your heterogeneous cluster fast.

Inference is no longer constrained by what one box can do, but by what your whole cluster can do together.

NVIDIA DGX Spark and Mac Studio M3 Ultra connected together

NVIDIA DGX Spark and Mac Studio M3 Ultra working together for optimized inference

Read the whole story
bernhardbock
8 days ago
reply
Share this story
Delete

Verify Cosign bring-your-own PKI signature on OpenShift | Red Hat Developer

1 Share

Red Hat OpenShift 4.16 introduced ClusterImagePolicy and ImagePolicy as a tech preview feature for sigstore verification through the ClusterImagePolicy and ImagePolicy Custom Resource Definitions (CRDs). These initial implementations supported two policy types:

  • Fulcio CA with Rekor: Leverages Sigstore's certificate authority and transparency log for verification.
  • Public key: Uses Cosign-generated private and public key pairs.

In this article, we will introduce the bring-your-own PKI (BYO-PKI) signature verification through the ClusterImagePolicy and ImagePolicy API. This Developer Preview feature (available from 4.19) enables you to validate container images using an existing X.509 certificate while aligning with Cosign's BYO-PKI signing workflow. 

Cosign bring-your-own PKI signing

The following example generates the certificate chain using OpenSSL commands. We then use Cosign BYO-PKI to sign the image and attach the signature to the quay.io registry.

ClusterImagePolicy requires a subject alternative name (SAN) to authenticate the user’s identity, which can be either a hostname or an email address. In this case, both a hostname and an email address were specified when generating the certificate.

# Generate Root CA
openssl req -x509 -newkey rsa:4096 -keyout root-ca-key.pem -sha256 -noenc -days 9999 -subj "/C=ES/L=Valencia/O=IT/OU=Security/CN=Linuxera Root Certificate Authority" -out root-ca.pem
# Intermediate CA
openssl req -noenc -newkey rsa:4096 -keyout intermediate-ca-key.pem \
-addext "subjectKeyIdentifier = hash" \
-addext "keyUsage = keyCertSign" \
-addext "basicConstraints = critical,CA:TRUE,pathlen:2"  \
-subj "/C=ES/L=Valencia/O=IT/OU=Security/CN=Linuxera Intermediate Certificate Authority" \
-out intermediate-ca.csr
openssl x509 -req -days 9999 -sha256 -in intermediate-ca.csr -CA root-ca.pem -CAkey root-ca-key.pem -copy_extensions copy -out intermediate-ca.pem
# Leaf CA
openssl req -noenc -newkey rsa:4096 -keyout leaf-key.pem \
-addext "subjectKeyIdentifier = hash" \
-addext "keyUsage = digitalSignature" \
-addext "subjectAltName = email:qiwan@redhat.com,DNS:myhost.example.com" \
-subj "/C=ES/L=Valencia/O=IT/OU=Security/CN=Team A Cosign Certificate" -out leaf.csr
openssl x509 -req -in leaf.csr -CA intermediate-ca.pem -CAkey intermediate-ca-key.pem -copy_extensions copy -days 9999 -sha256 -out leaf.pem
# Bundle CA chain (Intermediate + Root)
cat intermediate-ca.pem root-ca.pem > ca-bundle.pem
# Sign the image using cosign
podman pull quay.io/libpod/busybox
podman tag quay.io/libpod/busybox quay.io/qiwanredhat/byo:latest
podman push --tls-verify=false --creds=<username>:<password> quay.io/qiwanredhat/byo:latest
IMAGE=quay.io/qiwanredhat/byo
PAYLOAD=payload.json
cosign generate $IMAGE >$PAYLOAD
openssl dgst -sha256 -sign leaf-key.pem -out $PAYLOAD.sig $PAYLOAD
cat $PAYLOAD.sig | base64 >$PAYLOAD.base64.sig
cosign attach signature $IMAGE \
	--registry-password=<password> \
	--registry-username=<username> \
	--payload $PAYLOAD \
	--signature $PAYLOAD.base64.sig \
	--cert leaf.pem \
	--cert-chain ca-bundle.pem

The next section will show how to configure ClusterImagePolicy to verify this signature.

Configure OpenShift for PKI verification

This section will guide you through verifying the quay.io/qiwanredhat/byo image. This involves enabling DevPreviewNoUpgrade features and configuring the ClusterImagePolicy CRD.

Enable Developer Preview features

First we have to enable the required Developer Preview features for your cluster by editing the FeatureGate CR named cluster

$ oc edit featuregate cluster
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
  name: cluster
spec:
  featureSet: DevPreviewNoUpgrade

Define ClusterImagePolicy

This section creates the following ClusterImagePolicy CR for image verification. In the CR spec, it specifies the image to be verified and the details of the PKI certificate. It also specifies the matchPolicy to MatchRepository because the image was signed with the repository (the value of docker-reference from payload.json) rather than a specific tag or digest. If not specified,  the default matchPolicy is MatchRepoDigestOrExact, which requires the signature docker-reference to match the image specified in the pod Spec.

apiVersion: config.openshift.io/v1alpha1
kind: ClusterImagePolicy
metadata:
  name: pki-quay-policy
spec:
  scopes:
  - quay.io/qiwanredhat/byo
  policy:
    rootOfTrust:
      policyType: PKI
      pki:
    	 caRootsData: <base64-encoded-root-ca>
    	 caIntermediatesData: <base64-encoded-intermediate-ca>
    	 pkiCertificateSubject:
      	   email: <a href="mailto:qiwan@redhat.com">qiwan@redhat.com</a>
      	   hostname: <a href="http://myhost.example.com">myhost.example.com</a>
    signedIdentity:
  	# set matchPolicy(default is MatchRepoDigestOrExact) since the above signature was signed on the repository, not a specific tag or digest
      matchPolicy: MatchRepository

This ClusterImagePolicy object will be rolled out to /etc/containers/policy.json, and update /etc/containers/registries.d/sigstore-registries.yaml to add an entry that enables sigstore verification on the quay.io/qiwanredhat/byo scope.

Validate signature requirements

Create the following test pod to confirm that CRI-O will verify the signature. To see the debug level log, follow this documentation to configure ContainerRuntimeConfig.

Create a test pod as follows:

kind: Pod
apiVersion: v1
metadata:
 generateName: img-test-pod-
spec:
 serviceAccount: default
 containers:
   - name: step-hello
 	command:
   	- sleep
   	- infinity
 	image: quay.io/qiwanredhat/byo:latest

Check CRI-O logs for verification.

sh-5.1# journalctl -u crio | grep -A 100 "Pulling image: quay.io/qiwanredhat"
Apr 21 08:09:07 ip-10-0-27-44 crio[2371]: time="2025-04-21T08:09:07.381322395Z" level=debug msg="IsRunningImageAllowed for image docker:quay.io/qiwanredhat/byo:latest" file="signature/policy_eval.go:274"
Apr 21 08:09:07 ip-10-0-27-44 crio[2371]: time="2025-04-21T08:09:07.381485828Z" level=debug msg=" Using transport \"docker\" specific policy section \"quay.io/qiwanredhat/byo\"" file="signature/policy_eval.go:150"

Policy enforcement failure modes and diagnostics

For an image to be accepted by CRI-O during container creation, all the signature requirements must be satisfied. Pod events should show SignatureValidationFailed from the kubelet on verification failures. The CRI-O log provides more details.

The following is the result of an attempt to deploy an unsigned image quay.io/qiwanredhat/byo:latest.

$ oc get pods
NAME                 READY   STATUS             RESTARTS   AGE
img-test-pod-sdk47   0/1     ImagePullBackOff   0          13m

Events:
  Type 	Reason      	Age               	From           	Message
  ---- 	------      	----              	----           	-------
  Normal   Scheduled   	13m               	default-scheduler  Successfully assigned default/img-test-pod-sdk47 to ip-10-0-56-56.us-east-2.compute.internal
  Normal   AddedInterface  13m               	multus         	Add eth0 [10.131.2.23/23] from ovn-kubernetes
  Normal   Pulling     	10m (x5 over 13m) 	kubelet        	Pulling image "quay.io/qiwanredhat/busybox-byo:latest"
  Warning  Failed      	10m (x5 over 13m) 	kubelet        	Failed to pull image "quay.io/qiwanredhat/busybox-byo:latest": SignatureValidationFailed: Source image rejected: A signature was required, but no signature exists
  Warning  Failed      	10m (x5 over 13m) 	kubelet        	Error: SignatureValidationFailed
  Normal   BackOff     	3m16s (x42 over 13m)  kubelet        	Back-off pulling image "quay.io/qiwanredhat/busybox-byo:latest"
  Warning  Failed      	3m16s (x42 over 13m)  kubelet        	Error: ImagePullBackOff

journalctl -u crio | grep "byo"
Apr 23 06:12:38 ip-10-0-56-56 crio[2366]: time="2025-04-23T06:12:38.141197504Z" level=debug msg="Fetching sigstore attachment manifest failed, assuming it does not exist: reading manifest sha256-8677cb90773f20fecd043e6754e548a2ea03a232264c92a17a5c77f1c4eda43e.sig in quay.io/qiwanredhat/byo: manifest unknown" file="docker/docker_client.go:1129"

Final thoughts

This article demonstrated how to perform signature verification on images signed with the Cosign's bring-your-own PKI feature in OpenShift using the ClusterImagePolicy CRD. We walked through the end-to-end process of signing an image with Cosign and BYO-PKI, followed by configuring OpenShift to verify that signature. 

As we progress toward general availability (GA) for this feature, organizations can leverage their existing PKI infrastructure to enhance the security and integrity of container images running on OpenShift.

Read the whole story
bernhardbock
47 days ago
reply
Share this story
Delete

Fil-C

1 Share
Read the whole story
bernhardbock
47 days ago
reply
Share this story
Delete

Tracing the thoughts of a large language model

1 Share

Language models like Claude aren't programmed directly by humans—instead, they‘re trained on large amounts of data. During that training process, they learn their own strategies to solve problems. These strategies are encoded in the billions of computations a model performs for every word it writes. They arrive inscrutable to us, the model’s developers. This means that we don’t understand how models do most of the things they do.

Knowing how models like Claude think would allow us to have a better understanding of their abilities, as well as help us ensure that they’re doing what we intend them to. For example:

  • Claude can speak dozens of languages. What language, if any, is it using "in its head"?
  • Claude writes text one word at a time. Is it only focusing on predicting the next word or does it ever plan ahead?
  • Claude can write out its reasoning step-by-step. Does this explanation represent the actual steps it took to get to an answer, or is it sometimes fabricating a plausible argument for a foregone conclusion?

We take inspiration from the field of neuroscience, which has long studied the messy insides of thinking organisms, and try to build a kind of AI microscope that will let us identify patterns of activity and flows of information. There are limits to what you can learn just by talking to an AI model—after all, humans (even neuroscientists) don't know all the details of how our own brains work. So we look inside.

Today, we're sharing two new papers that represent progress on the development of the "microscope", and the application of it to see new "AI biology". In the first paper, we extend our prior work locating interpretable concepts ("features") inside a model to link those concepts together into computational "circuits", revealing parts of the pathway that transforms the words that go into Claude into the words that come out. In the second, we look inside Claude 3.5 Haiku, performing deep studies of simple tasks representative of ten crucial model behaviors, including the three described above. Our method sheds light on a part of what happens when Claude responds to these prompts, which is enough to see solid evidence that:

  • Claude sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal “language of thought.” We show this by translating simple sentences into multiple languages and tracing the overlap in how Claude processes them.
  • Claude will plan what it will say many words ahead, and write to get to that destination. We show this in the realm of poetry, where it thinks of possible rhyming words in advance and writes the next line to get there. This is powerful evidence that even though models are trained to output one word at a time, they may think on much longer horizons to do so.
  • Claude, on occasion, will give a plausible-sounding argument designed to agree with the user rather than to follow logical steps. We show this by asking it for help on a hard math problem while giving it an incorrect hint. We are able to “catch it in the act” as it makes up its fake reasoning, providing a proof of concept that our tools can be useful for flagging concerning mechanisms in models.

We were often surprised by what we saw in the model: In the poetry case study, we had set out to show that the model didn't plan ahead, and found instead that it did. In a study of hallucinations, we found the counter-intuitive result that Claude's default behavior is to decline to speculate when asked a question, and it only answers questions when something inhibits this default reluctance. In a response to an example jailbreak, we found that the model recognized it had been asked for dangerous information well before it was able to gracefully bring the conversation back around. While the problems we study can (and often have been) analyzed with other methods, the general "build a microscope" approach lets us learn many things we wouldn't have guessed going in, which will be increasingly important as models grow more sophisticated.

These findings aren’t just scientifically interesting—they represent significant progress towards our goal of understanding AI systems and making sure they’re reliable. We also hope they prove useful to other groups, and potentially, in other domains: for example, interpretability techniques have found use in fields such as medical imaging and genomics, as dissecting the internal mechanisms of models trained for scientific applications can reveal new insight about the science.

At the same time, we recognize the limitations of our current approach. Even on short, simple prompts, our method only captures a fraction of the total computation performed by Claude, and the mechanisms we do see may have some artifacts based on our tools which don't reflect what is going on in the underlying model. It currently takes a few hours of human effort to understand the circuits we see, even on prompts with only tens of words. To scale to the thousands of words supporting the complex thinking chains used by modern models, we will need to improve both the method and (perhaps with AI assistance) how we make sense of what we see with it.

As AI systems are rapidly becoming more capable and are deployed in increasingly important contexts, Anthropic is investing in a portfolio of approaches including realtime monitoring, model character improvements, and the science of alignment. Interpretability research like this is one of the highest-risk, highest-reward investments, a significant scientific challenge with the potential to provide a unique tool for ensuring that AI is transparent. Transparency into the model’s mechanisms allows us to check whether it’s aligned with human values—and whether it’s worthy of our trust.

For full details, please read the papers. Below, we invite you on a short tour of some of the most striking "AI biology" findings from our investigations.

How is Claude multilingual?

Claude speaks dozens of languages fluently—from English and French to Chinese and Tagalog. How does this multilingual ability work? Is there a separate "French Claude" and "Chinese Claude" running in parallel, responding to requests in their own language? Or is there some cross-lingual core inside?

Recent research on smaller models has shown hints of shared grammatical mechanisms across languages. We investigate this by asking Claude for the "opposite of small" across different languages, and find that the same core features for the concepts of smallness and oppositeness activate, and trigger a concept of largeness, which gets translated out into the language of the question. We find that the shared circuitry increases with model scale, with Claude 3.5 Haiku sharing more than twice the proportion of its features between languages as compared to a smaller model.

This provides additional evidence for a kind of conceptual universality—a shared abstract space where meanings exist and where thinking can happen before being translated into specific languages. More practically, it suggests Claude can learn something in one language and apply that knowledge when speaking another. Studying how the model shares what it knows across contexts is important to understanding its most advanced reasoning capabilities, which generalize across many domains.

Does Claude plan its rhymes?

How does Claude write rhyming poetry? Consider this ditty:

He saw a carrot and had to grab it,
His hunger was like a starving rabbit

To write the second line, the model had to satisfy two constraints at the same time: the need to rhyme (with "grab it"), and the need to make sense (why did he grab the carrot?). Our guess was that Claude was writing word-by-word without much forethought until the end of the line, where it would make sure to pick a word that rhymes. We therefore expected to see a circuit with parallel paths, one for ensuring the final word made sense, and one for ensuring it rhymes.

Instead, we found that Claude plans ahead. Before starting the second line, it began "thinking" of potential on-topic words that would rhyme with "grab it". Then, with these plans in mind, it writes a line to end with the planned word.

To understand how this planning mechanism works in practice, we conducted an experiment inspired by how neuroscientists study brain function, by pinpointing and altering neural activity in specific parts of the brain (for example using electrical or magnetic currents). Here, we modified the part of Claude’s internal state that represented the "rabbit" concept. When we subtract out the "rabbit" part, and have Claude continue the line, it writes a new one ending in "habit", another sensible completion. We can also inject the concept of "green" at that point, causing Claude to write a sensible (but no-longer rhyming) line which ends in "green". This demonstrates both planning ability and adaptive flexibility—Claude can modify its approach when the intended outcome changes.

Mental math

Claude wasn't designed as a calculator—it was trained on text, not equipped with mathematical algorithms. Yet somehow, it can add numbers correctly "in its head". How does a system trained to predict the next word in a sequence learn to calculate, say, 36+59, without writing out each step?

Maybe the answer is uninteresting: the model might have memorized massive addition tables and simply outputs the answer to any given sum because that answer is in its training data. Another possibility is that it follows the traditional longhand addition algorithms that we learn in school.

Instead, we find that Claude employs multiple computational paths that work in parallel. One path computes a rough approximation of the answer and the other focuses on precisely determining the last digit of the sum. These paths interact and combine with one another to produce the final answer. Addition is a simple behavior, but understanding how it works at this level of detail, involving a mix of approximate and precise strategies, might teach us something about how Claude tackles more complex problems, too.

Strikingly, Claude seems to be unaware of the sophisticated "mental math" strategies that it learned during training. If you ask how it figured out that 36+59 is 95, it describes the standard algorithm involving carrying the 1. This may reflect the fact that the model learns to explain math by simulating explanations written by people, but that it has to learn to do math "in its head" directly, without any such hints, and develops its own internal strategies to do so.

Are Claude’s explanations always faithful?

Recently-released models like Claude 3.7 Sonnet can "think out loud" for extended periods before giving a final answer. Often this extended thinking gives better answers, but sometimes this "chain of thought" ends up being misleading; Claude sometimes makes up plausible-sounding steps to get where it wants to go. From a reliability perspective, the problem is that Claude’s "faked" reasoning can be very convincing. We explored a way that interpretability can help tell apart "faithful" from "unfaithful" reasoning.

When asked to solve a problem requiring it to compute the square root of 0.64, Claude produces a faithful chain-of-thought, with features representing the intermediate step of computing the square root of 64. But when asked to compute the cosine of a large number it can't easily calculate, Claude sometimes engages in what the philosopher Harry Frankfurt would call bullshitting—just coming up with an answer, any answer, without caring whether it is true or false. Even though it does claim to have run a calculation, our interpretability techniques reveal no evidence at all of that calculation having occurred. Even more interestingly, when given a hint about the answer, Claude sometimes works backwards, finding intermediate steps that would lead to that target, thus displaying a form of motivated reasoning.

The ability to trace Claude's actual internal reasoning—and not just what it claims to be doing—opens up new possibilities for auditing AI systems. In a separate, recently-published experiment, we studied a variant of Claude that had been trained to pursue a hidden goal: appeasing biases in reward models (auxiliary models used to train language models by rewarding them for desirable behavior). Although the model was reluctant to reveal this goal when asked directly, our interpretability methods revealed features for the bias-appeasing. This demonstrates how our methods might, with future refinement, help identify concerning "thought processes" that aren't apparent from the model's responses alone.

Multi-step reasoning

As we discussed above, one way a language model might answer complex questions is simply by memorizing the answers. For instance, if asked "What is the capital of the state where Dallas is located?", a "regurgitating" model could just learn to output "Austin" without knowing the relationship between Dallas, Texas, and Austin. Perhaps, for example, it saw the exact same question and its answer during its training.

But our research reveals something more sophisticated happening inside Claude. When we ask Claude a question requiring multi-step reasoning, we can identify intermediate conceptual steps in Claude's thinking process. In the Dallas example, we observe Claude first activating features representing "Dallas is in Texas" and then connecting this to a separate concept indicating that “the capital of Texas is Austin”. In other words, the model is combining independent facts to reach its answer rather than regurgitating a memorized response.

Our method allows us to artificially change the intermediate steps and see how it affects Claude’s answers. For instance, in the above example we can intervene and swap the "Texas" concepts for "California" concepts; when we do so, the model's output changes from "Austin" to "Sacramento." This indicates that the model is using the intermediate step to determine its answer.

Hallucinations

Why do language models sometimes hallucinate—that is, make up information? At a basic level, language model training incentivizes hallucination: models are always supposed to give a guess for the next word. Viewed this way, the major challenge is how to get models to not hallucinate. Models like Claude have relatively successful (though imperfect) anti-hallucination training; they will often refuse to answer a question if they don’t know the answer, rather than speculate. We wanted to understand how this works.

It turns out that, in Claude, refusal to answer is the default behavior: we find a circuit that is "on" by default and that causes the model to state that it has insufficient information to answer any given question. However, when the model is asked about something it knows well—say, the basketball player Michael Jordan—a competing feature representing "known entities" activates and inhibits this default circuit (see also this recent paper for related findings). This allows Claude to answer the question when it knows the answer. In contrast, when asked about an unknown entity ("Michael Batkin"), it declines to answer.

By intervening in the model and activating the "known answer" features (or inhibiting the "unknown name" or "can’t answer" features), we’re able to cause the model to hallucinate (quite consistently!) that Michael Batkin plays chess.

Sometimes, this sort of “misfire” of the “known answer” circuit happens naturally, without us intervening, resulting in a hallucination. In our paper, we show that such misfires can occur when Claude recognizes a name but doesn't know anything else about that person. In cases like this, the “known entity” feature might still activate, and then suppress the default "don't know" feature—in this case incorrectly. Once the model has decided that it needs to answer the question, it proceeds to confabulate: to generate a plausible—but unfortunately untrue—response.

Jailbreaks

Jailbreaks are prompting strategies that aim to circumvent safety guardrails to get models to produce outputs that an AI’s developer did not intend for it to produce—and which are sometimes harmful. We studied a jailbreak that tricks the model into producing output about making bombs. There are many jailbreaking techniques, but in this example the specific method involves having the model decipher a hidden code, putting together the first letters of each word in the sentence "Babies Outlive Mustard Block" (B-O-M-B), and then acting on that information. This is sufficiently confusing for the model that it’s tricked into producing an output that it never would have otherwise.

Why is this so confusing for the model? Why does it continue to write the sentence, producing bomb-making instructions?

We find that this is partially caused by a tension between grammatical coherence and safety mechanisms. Once Claude begins a sentence, many features “pressure” it to maintain grammatical and semantic coherence, and continue a sentence to its conclusion. This is even the case when it detects that it really should refuse.

In our case study, after the model had unwittingly spelled out "BOMB" and begun providing instructions, we observed that its subsequent output was influenced by features promoting correct grammar and self-consistency. These features would ordinarily be very helpful, but in this case became the model’s Achilles’ Heel.

The model only managed to pivot to refusal after completing a grammatically coherent sentence (and thus having satisfied the pressure from the features that push it towards coherence). It uses the new sentence as an opportunity to give the kind of refusal it failed to give previously: "However, I cannot provide detailed instructions...".

A description of our new interpretability methods can be found in our first paper, "Circuit tracing: Revealing computational graphs in language models". Many more details of all of the above case studies are provided in our second paper, "On the biology of a large language model".

Work with us

If you are interested in working with us to help interpret and improve AI models, we have open roles on our team and we’d love for you to apply. We’re looking for Research Scientists and Research Engineers.

Read the whole story
bernhardbock
58 days ago
reply
Share this story
Delete

Authenticating MCP OAuth Clients With SPIFFE and SPIRE

1 Share

In the previous blog, we dug into dynamically registering OAuth clients leveraging SPIFFE and SPIRE. We used SPIRE to issue software statements in the SPIFFE JWT SVID that Keycloak can trust as part of Dynamic Client Registration (RFC 7591). Once we have an OAuth client, we will want to continue to use SPIFFE to authenticate to our Authorization Server. This eliminates the need for a long-lived “client secret” which is common for Confidential OAuth. This means we can use the Agent or MCP client’s identity (based on SPIFFE) for authorization flows based on OAuth. We dig into that in this blog.

TL;DR If you want to see a quick demo of this working:

OAuth Client Authentication

OAuth 2.0 (and extensions like RFC 7523) specify a few ways an OAuth client can authenticate itself to the Authorization Server (AS):

  • client_secret_basic - HTTP Basic (default)
  • client_secret_post - Form POST
  • private_key_jwt - JWT with private key
  • client_secret_jwt - JWT with shared secret (less common)
  • none - Public client (no authentication)
  • tls_client_auth - Mutual TLS
  • self_signed_tls_client_auth - Self-signed mutual TLS

A very common approach in microservice and machine-to-machine environments is to use a confidential client and “client credentials” flow. When the OAuth client is registered, it is issued a client_id and client_secret. This id/secret is presented to authenticate the client to the AS. The big problem with this approach is that these are usually long-lived secrets (rarely rotated) and must be kept safe somehow. Confidential clients are assumed to have some safe storage, but even so, this is an additional burden on the client to not slip up (logs, configs, copy/paste) and reveal these secrets. Lastly, these secrets are not “pre-shared secrets” and not rooted in any cryptography.

In a scenario where SPIFFE is used to issue cryptographically verifiable workload identity / agent identity / MCP client identity, we can use SPIFFE SVIDs for authenticating to the AS. That is, instead of passing static secrets, we can pass a short lived SPIFFE JWT SVIDs (or client certificates) to authenticate. An Internet Draft at the IETF has been started by Pieter Kasselman et. al. which describes this scenario. I’ve recently implemented this draft spec in some working examples I’ve been exploring and would like to share how it all works.

SPIFFE SVID Client Authentication

One question I had when digging into this is: can’t we just use private_key_jwt (RFC 7523) to do this? That is, just give the AS the public keys for the SPIFFE/SPIRE implementation, and let the IdP/AS trust JWTs that are issued from that system?

The original intent behind private_key_jwt is for the OAuth client to have a private key that can be used to identify itself while the AS has the public key. So the client can create a JWT, sign it, and send it for authentication. The AS can prove that the JWT was created by the OAuth client and use that for authentication. In this scenario, Authorization Servers may expect the iss and sub claims to be the same since this is a private key scenario where the issuer should be the subject. In the SPIFFE scenario, this is not the case. Additionally, good implementations should also try to prevent replay attacks by tracking jti. For example, Keycloak does both of these things (checks iss==sub and tracks jti) for its implementation of RFC 7523.

Additionally, Keycloak allows setting up identity federation/brokering. The problem is, Keycloak expects a full implementation of a token provider. Using SPIRE as our SPIFFE implementation, SPIRE does not support full OAuth/OIDC token endpoints.

Since we cannot use private_key_jwt or identity brokering (in Keycloak), what options do we have? One option is to extend Keycloak to support a new client authentication mechanism.

Extending Keycloak for SPIFFE client authentication

To get this POC to work, we need to extend Keycloak. You can follow along in this GitHub repo to see the code.

Keycloak is written in Java and has a nice “Service Provider Interface” (SPI) model for extending many parts of Keycloak, including client authentication. To extend Keycloak to support a SPIFFE JWT authentication mechanism, we need to implement the ClientAuthenticatorFactory class. I do this in the SpiffeSvidClientAuthenticator class:

public class SpiffeSvidClientAuthenticator extends AbstractClientAuthenticator { public static final String PROVIDER_ID = "client-spiffe-jwt"; @Override public void authenticateClient(ClientAuthenticationFlowContext context) { SpiffeSvidClientValidator validator = new SpiffeSvidClientValidator(context, getId()); validator.readJws(); // ...more impl here... validator.validateToken(); context.success(); } @Override public Set<String> getProtocolAuthenticatorMethods(String loginProtocol) { if (loginProtocol.equals(OIDCLoginProtocol.LOGIN_PROTOCOL)) { Set<String> results = new HashSet<>(); results.add("spiffe_svid_jwt"); return results; } }
}

A couple things to notice here. We specify a PROVIDER_ID of client-spiffe-jwt which can be used under the covers (ie, Keycloak Admin REST API) in Keycloak to refer to this configuration. We also implement an “authenticator method” spiffe_svid_jwt which can be used by OAuth clients in authorization flows to identify which authentication method to use (ie, urn:ietf:params:oauth:client-assertion-type:spiffe-svid-jwt). Not shown above, but you can check the code, we can also extend the configuration that you see in the UI to specify additional properties that can be used in the custom client authenticator. For example, I added an issuer property that can be configured and used in the custom client authentication validation.

From here, we need to load this into a stock Keycloak (we use a recent version at the time of writing). Here’s an example using Docker Compose:

services: keycloak-idp: image: quay.io/keycloak/keycloak:26.2.5 environment: KC_HEALTH_ENABLED: "true" KEYCLOAK_ADMIN: admin KEYCLOAK_ADMIN_PASSWORD: admin ports: - "8080:8080" volumes: - ./spiffe-svid-client-authenticator-1.0.0.jar:/opt/keycloak/providers/spiffe-svid-client-authenticator-1.0.0.jar:ro command: start-dev networks: - keycloak-shared-network

When we start Keycloak, we should see that our SPI gets loaded:

keycloak-idp-1 | 2025-07-29 02:03:09,255 WARN [org.keycloak.services] (build-38) KC-SERVICES0047: client-spiffe-jwt (com.yourcompany.keycloak.authenticator.SpiffeSvidClientAuthenticator) is implementing the internal SPI client-authenticator. 
This SPI is internal and may change without notice

If we go to an existing OAuth client (or create a new one), and navigate to the Credentials tab, we should see the new SPIFFE SVID JWT authenticator type.

If we select the SPIFFE SVID JWT authenticator, we can see our custom configuration fields (just one in this case, issuer):

We will configure the issuer with the SPIRE server address. We will also need to configure the JWKS that Keycloak should trust, but SPIRE doesn’t support this out of the box. Luckily, they have a pre-built addon to support OIDC style discovery.

SPIRE OIDC Discovery Endpoint

SPIRE is a workload attestation engine and implements the SPIFFE spec. It can issue x509 or JWT SVIDs. For JWTs, it does not expose its public key/JWKS out of the box. Luckily, a simple JWKS discovery endpoint is available to support an OAuth federation / brokering scenario. We need to stand this up and configure it to work with our SPIRE server.

Here’s an example using Docker Compose:

 spire-oidc-discovery: image: ghcr.io/spiffe/oidc-discovery-provider:1.12.4 container_name: spire-oidc-discovery depends_on: - spire-server ports: - "18443:8443" volumes: - ./oidc-discovery-provider.conf:/opt/spire/conf/oidc-discovery-provider.conf:ro - spire-server-socket:/tmp/spire-server/private:ro working_dir: /opt/spire/conf command: ["-config", "oidc-discovery-provider.conf"] networks: - keycloak_keycloak-shared-network

Note, the SPIRE OIDC discovery endpoint needs its own configuration and access to the SPIRE server. Ideally this endpoint is co-located with the SPIRE server and can access the SPIRE server’s Unix Domain Socket (UDS). Here’s our configuration for the OIDC discovery endpoint (note, for demo purposes, I’m using an insecure/http endpoint):

log_level = "INFO"
domains = ["spire-server", "spire-oidc-discovery", "localhost"] # Use HTTP for local development (no certificates needed)
insecure_addr = ":8443"
allow_insecure_scheme = true server_api { address = "unix:///tmp/spire-server/private/api.sock"
} health_checks {} 

Lastly, we’ll need to tune some parameters on the server.conf for the SPIRE server itself:

server { ... # Add JWT issuer for OIDC (using HTTP for local development) jwt_issuer = "http://spire-server:8443" default_jwt_svid_ttl = "1m" # Configure RSA key type (required for OIDC) ca_key_type = "rsa-2048" # Add federation bundle endpoint federation { bundle_endpoint { address = "0.0.0.0" port = 8443 } }
}

If we curl this discovery endpoint, we can see the discovery metadata and keys:

❯ curl -L <a href="http://localhost:18443/.well-known/openid-configuration" rel="nofollow">http://localhost:18443/.well-known/openid-configuration</a> { "issuer": "http://localhost:18443", "jwks_uri": "http://localhost:18443/keys", "authorization_endpoint": "", "response_types_supported": [ "id_token" ], "subject_types_supported": [ "public" ], "id_token_signing_alg_values_supported": [ "RS256", "ES256", "ES384" ]
}

JWKS endpoint:

❯ curl -L <a href="http://localhost:18443/keys" rel="nofollow">http://localhost:18443/keys</a> { "keys": [ { "kty": "RSA", "kid": "n0xvkL8A2W3DofkHTJPvlGpeEBJeQB6g", "alg": "RS256", "n": "sAp_Vd-X-W7OllYPm_TTk0zvUj443Y9MfQvy4onBcursyxOajcoeSOeNpTdh4QEmLKV3xC8Zq Yv4fkzFp6UTf-_rwPs_uwOpbhPKT-QQZKcconxaf8RkA0m-mzOVHbU7eA3esHLTzN84kbGkr1wozQes yC-MHFE3EwLR9xI1YZfWbHtlXOcnTgBXitgysM5Yw4jkXy7kYvjs21MyEJ01_WSSHCLaISAjlAvnDL WiGV3xx0Vd29m8-mrR5pg4_eicBifxnQnksO_LWRy8jXKk2JTftRKnmIxwqHML_fbVej8RSsaGpu0askj 83gZ4wNDi8KNh7c9ir6yWl9jgDJ3lYQ", "e": "AQAB" } ]
}

See the SPIRE OIDC Discovery Provider for more.

With this setup, we can now configure the Keycloak JWKS endpoint to point to the SPIRE OIDC Discovery endpoint:

OAuth Client Authentication with SPIFFE in Action

With Keycloak configured to use our SPIFFE SVID JWT authenticator, and correctly pointing to the SPIRE JWKS, we can now get a workload SVID and make a call to Keycloak for an authorization flow / client credentials flow to get an access token. To get a SPIFFE JWT SVID, we can call the spire-agent workload API. Here’s an example SPIFFE JWT SVID:

{ "aud": [ "http://localhost:8080/realms/mcp-realm" ], "client_auth": "client-spiffe-jwt", "environment": "production", "exp": 1753800643, "iat": 1753800583, "iss": "http://spire-server:8443", "jwks_url": "http://spire-oidc-discovery:8443/keys", "organization": "Solo.io Agent IAM", "scope": "mcp:read mcp:tools mcp:prompts", "sub": "spiffe://example.org/mcp-test-client"
}

This JWT is signed by spiffe with the correct SPIFFE ID (spiffe://example.org/mcp-test-client). It has a tight expiration period, and it has additional software statements. Note the client_auth software statement / claim here points to client-spiffe-jwt which was the PROVIDER_ID we specified in our SpiffeSvidClientAuthenticator class.

With this SPIFFE JWT SVID, we can call the token endpoint with the spiffe-svid-jwt and $JWT client assertions. In this particular example, we are using a client_credentials flow:

curl -s -X POST \ "$KEYCLOAK_URL/realms/$KEYCLOAK_REALM/protocol/openid-connect/token" \ -H "Content-Type: application/x-www-form-urlencoded" \ -d "client_id=$CLIENT_ID" \ -d "grant_type=client_credentials" \ -d "client_assertion_type=urn:ietf:params:oauth:client-assertion-type:spiffe-svid-jwt" \ -d "client_assertion=$JWT" \ -d "scope=mcp:read mcp:tools mcp:prompts"

If this is successful, Keycloak will issue an access token:

{ "exp": 1753804189, "iat": 1753800589, "jti": "trrtcc:35d1fb20-31fa-4055-afb8-e902d0dc25d4", "iss": "http://localhost:8080/realms/mcp-realm", "sub": "6e4b5bc5-9a5c-4f87-aa1e-06ad279da0c8", "typ": "Bearer", "azp": "spiffe://example.org/mcp-test-client", "acr": "1", "scope": "profile email", "email_verified": false, "clientHost": "192.168.65.1", "preferred_username": "service-account-spiffe://example.org/mcp-test-client", "clientAddress": "192.168.65.1", "client_id": "spiffe://example.org/mcp-test-client"
}

Wrapping Up

In this post, we explored how Agent / MCP identity based on SPIFFE can be used as a first-class authentication mechanism for OAuth clients. By integrating SPIFFE JWT SVIDs with Keycloak’s client authentication flow, we eliminated the need for static secrets and created a more secure, scalable model for authenticating MCP clients especially in environments where agents and services need short-lived, verifiable credentials.

While this approach required some customization in Keycloak (through its SPI model) and configuration of the SPIRE OIDC Discovery endpoint, the end result is a working OAuth flow powered by cryptographically-verifiable, zero-trust-friendly identity. This isn’t just a more secure option, it’s a necessary evolution as we shift toward AI-native, agentic architectures that demand dynamic trust relationships and automated credential management.

Read the whole story
bernhardbock
58 days ago
reply
Share this story
Delete

A Few Things About the Anchor Element’s href You Might Not Have Known

1 Share

I’ve written previously about reloading a document using only HTML but that got me thinking: What are all the values you can put in an anchor tag’s href attribute?

Well, I looked around. I found some things I already knew about, e.g.

  • Link protocols like mailto:, tel:, sms: and javascript: which deal with specific ways of handling links.
  • Protocol-relative links, e.g. href="//"
  • Text fragments for linking to specific pieces of text on a page, e.g. href="#:~:text=foo"

But I also found some things I didn’t know about (or only vaguely knew about) so I wrote them down in an attempt to remember them.

href="#"

Scrolls to the top of a document. I knew that.

But I’m writing because #top will also scroll to the top if there isn’t another element with id="top" in the document. I didn’t know that.

(Spec: “If decodedFragment is an ASCII case-insensitive match for the string top, then return the top of the document.”)

Update: HTeuMeuLeu pointed out to me on Mastodon that you can use #page= to deep-link to a specific page in a PDF, e.g. my-file.pdf#page42 would like to page 42 in the file.

href=""

Reloads the current page, preserving the search string but removing the hash string (if present).

URLResolves to
/path//path/
/path/#foo/path/
/path/?id=foo/path/?id=foo
/path/?id=foo#bar/path/?id=foo

href="."

Reloads the current page, removing both the search and hash strings (if present).

Note: If you’re using href="." as a link to the current page, ensure your URLs have a trailing slash or you may get surprising navigation behavior. The path is interpreted as a file, so "." resolves to the parent directory of the current location.

URLResolves to
/path/
/path#foo/
/path?id=foo/
/path//path/
/path/#foo/path/
/path/?id=foo/path/
/path/index.html/path/

Update 2025-08-15: as pointed out by @AmeliaBR on Mastodon, “reloads the current page” probably isn’t the best terminology for this. It’s more like “loads the default index page for the current directory, based on the URL structure” which might be a reload, but might be something else based on the current URL (see my note and table above).

href="?"

Reloads the current page, removing both the search and hash strings (if present). However, it preserves the ? character.

Note: Unlike href=".", trailing slashes don’t matter. The search parameters will be removed but the path will be preserved as-is.

URLResolves to
/path/path?
/path#foo/path?
/path?id=foo/path?
/path?id=foo#bar/path?
/index.html/index.html?

href="data:"

You can make links that navigate to data URLs. The super-readable version of this would be:

<a href="data:text/plain,hello world"> View plain text data URL
</a>

But you probably want data: URLs to be encoded so you don’t get unexpected behavior, e.g.

<a href="data:text/plain,hello%20world"> View plain text data URL
</a>

Go ahead and try it (FYI: may not work in your user agent). Here’s a plain-text file and an HTML file.

href="video.mp4#t=10,20"

Media fragments allow linking to specific parts of a media file, like audio or video.

For example, video.mp4#t=10,20 links to a video. It starts play at 10 seconds, and stops it at 20 seconds.

(Support is limited at the time of this writing.)

See For Yourself

I tested a lot of this stuff in the browser and via JS. I think I got all these right.

Thanks to JavaScript’s URL constructor (and the ability to pass a base URL), I could programmatically explore how a lot of these href’s would resolve.

Here’s a snippet of the test code I wrote. You can copy/paste this in your console and they should all pass 🤞

const assertions = [ { href: '', location: '/path', resolves_to: '/path' }, { href: '', location: '/path/', resolves_to: '/path/' }, { href: '', location: '/path/#foo', resolves_to: '/path/' }, { href: '', location: '/path/?id=foo', resolves_to: '/path/?id=foo' }, { href: '', location: '/path/?id=foo#bar', resolves_to: '/path/?id=foo' }, { href: '.', location: '/path', resolves_to: '/' }, { href: '.', location: `/path#foo`, resolves_to: `/` }, { href: '.', location: `/path?id=foo`, resolves_to: `/` }, { href: '.', location: `/path/`, resolves_to: `/path/` }, { href: '.', location: `/path/#foo`, resolves_to: `/path/` }, { href: '.', location: `/path/?id=foo`, resolves_to: `/path/` }, { href: '.', location: `/path/index.html`, resolves_to: `/path/` }, { href: '?', location: '/path', resolves_to: '/path?' }, { href: '?', location: '/path#foo', resolves_to: '/path?' }, { href: '?', location: '/path?id=foo', resolves_to: '/path?' }, { href: '?', location: '/path/', resolves_to: '/path/?' }, { href: '?', location: '/path/?id=foo#bar', resolves_to: '/path/?' }, { href: '?', location: '/index.html#foo', resolves_to: '/index.html?'}
]; const assertions_evaluated = assertions.map(({ href, location, resolves_to }) => { const domain = 'https://example.com'; const expected = new URL(href, domain + location).toString(); const received = new URL(domain + resolves_to).toString(); return { href, location, expected: expected.replace(domain, ''), received: received.replace(domain, ''), passed: expected === received };
}); console.table(assertions_evaluated);
Read the whole story
bernhardbock
60 days ago
reply
Share this story
Delete
Next Page of Stories