Research Note · methodology

Put your gradient-norm on a control chart: a Six-Sigma view of training stability

Tetracta AI Teams · 15 June 2026

A methodology note. It describes how we measure training stability — not the mechanism that produces it. No formula, threshold value, learning rate, or recipe is disclosed.

Ask ten engineers whether a training run was "stable" and you will get ten judgments, each formed by squinting at a loss curve. Stability, in practice, is usually a vibe. We think that is a mistake — and that a discipline manufacturing solved decades ago fixes it.

Treat the gradient-norm as a process

On a factory line you do not certify a part by glancing at it. You measure a characteristic, you set control limits, and you ask a quantitative question: is this process capable of staying inside its limits, and how often does it excur outside them? That is statistical process control (SPC), and the Six-Sigma vocabulary built on top of it — process capability (Cpk), defect rates, control charts — is exactly the language a training run needs.

The gradient-norm is a natural candidate for the controlled characteristic. It is cheap to log, it is sensitive to the onset of instability, and — crucially — it is a process that unfolds over thousands of steps, which is precisely what SPC was designed for.

What you get

Once the gradient-norm lives on a control chart, three things stop being subjective:

  1. Capability (Cpk). A single number for how comfortably the run stays within its control band. A high Cpk is a process with margin to spare; a low Cpk is one living on the edge of divergence even if it never quite blows up.
  2. Excursion (defect) rate. The fraction of windows that breach the control limit — the spikes. "A little spiky" becomes "this run is out of control on N% of its windows," a statement a reader can audit.
  3. A fair, early signal. You can see a run getting unhealthy long before the loss visibly diverges, and you can compare two configurations on stability as a measured quantity, not an impression.

That third point matters more than it looks. A very common way to manufacture a fake "win" for a new method is to compare it against a baseline that quietly destabilized — the baseline's worse number is an artifact of its instability, not the new method's quality. If you are scoring stability on a control chart, that collapse is visible and quantified instead of hidden. (In our own runs, a stabilizer-free softmax at high learning rate fell to a low capability — a Cpk on the order of 0.3, with excursions on roughly a fifth of its windows — while a stabilized variant stayed spike-free at high capability. The point is not the operator; it is that the gap was measured, not asserted.)

Stability you can put in a contract

The quiet payoff is commercial. "Stable training" is usually an aspiration. A capability number is a specification. A team that monitors gradient-norm capability can write training stability into an SLA — promise a Cpk floor, alert on an excursion-rate ceiling — the same way a supplier guarantees a tolerance on a machined part. Instability is GPU budget set on fire: a run that diverges is compute you paid for and threw away, plus the engineer-hours spent babysitting learning-rate sweeps. Turning stability into a measured, contract-able number is how you stop paying that tax blindly.

What we deliberately do not share

The calibration of the control limit — where the line sits, and how it is set for a given architecture and budget — is the part that takes judgment, and it is the part we keep. The technique above is general and we are glad to see it used; the specific thresholds that make it sharp for our work are not in this note. (Separately, the attention mechanism that produces our stability is not discussed here at all — that is a different subject, with a method patent application in preparation.)

The honest caveats

SPC measures stability, not quality — a perfectly capable run can still be a mediocre model. Control limits must be calibrated per setup; a threshold borrowed from someone else's run will mislead. And capability is one signal among several, not a verdict on its own. None of that diminishes the core point: if you are going to claim a run was stable, you should be able to put a number on it — and gradient-norm SPC is how.

— Tetracta AI Teams · for humans, like humans.

← All research notes   Talk methodology