Stealth · deep-tech AI · method patent application in preparation

Attention, reimagined for stability.

A drop-in attention operator that stays stable where bare softmax needs external crutches — at zero extra parameters and parity compute cost. Not a quality miracle: a quieter, cheaper, more predictable way to train.

for humans, like humans.

We're raising compute & the right partner to validate at 3B scale — fund the experiment that de-risks the thesis, not a claim that it's already proven.

training-gradient stabilitystabilizer-free · high LR
↑ gradient norm  ·  → training step  ·  ┄ excursion threshold
bare softmax — spikes & drifts (stabilizer-free, collapses at high LR) Tetracta rational — bounded, calm excursion threshold
softmax Cpk 0.30 · 29 excursions  |  rational Cpk ~1.59 · 0  — illustrative shape, real measured stats
0extra parameters
≤ 1%measured compute tax
drop-inop-behaviour preserved (bf16 cos ≈ 0.99999)
reproduciblepre-registered · md5-sealed
Pre-registered Deterministic recompute md5-sealed evidence gradient-norm SPC · Six-Sigma Method patent application in preparation 4-engineer team · ~20 yr

Two identities, one team

Revenue · today

An applied industrial-AI studio

Custom models, RAG & SFT, data→signal→model and digital-twin consulting, and quantitative research infrastructure — grounded in ~20 years of multi-sector industrial & analytical engineering. Real delivery, not slideware.

What we build
Reputation · IP

A frontier efficiency-research lab

A new attention foundation — measured training stability, drop-in / zero-parameter, capability-per-dollar. A fundable research thesis with a clean, auditable evidence chain; method patent application in preparation.

The research
Technology & Research

Tetracta Rational Attention method patent application in preparation

A drop-in replacement for softmax attention. Its weights do not sum to one; it draws on a bounded, divisive-normalization intuition — the way biological neurons compete and self-limit. We describe what it does, and the effects we measured. We never disclose the recipe.

Parameterless stability strongest leg

Where bare softmax needs external stabilizers and still spikes at high learning rates, rational stays calm — by construction, because its denominator is bounded. This is our strongest, most-measured result.

Drop-in · zero extra params

Identical parameter count, op-behaviour preserved (bf16 cosine ≈ 0.99999), lighter memory. It replaces the softmax op in place and keeps the entire softmax ecosystem — no architectural surgery, no retraining infrastructure, no new dependencies.

Capability per dollar

MFU is at parity with vanilla; the measured compute tax is ≤ 1%. Higher learning-rate tolerance converts into steadier convergence on the same hardware — fewer failed runs, less babysitting, lower training risk.

A modest, honest edge

A small, direction-consistent loss improvement at 3B and 7B — pre-registered and md5-sealed. We sell direction, not a law, and best attribute it to learning-rate headroom. No quality-leap claim.

Methodology as a discipline

Pre-registration, deterministic recompute, gradient-norm SPC / Six-Sigma gates, md5 evidence chains. Diligence-grade rigour is the trust layer behind every number — and a product in its own right.

Abstain research hypothesis

The bounded design provides a natural "I'd rather not answer" sink. Whether this reduces hallucination is not yet proven — a controlled test was an honest null. It is exactly what we are raising compute to test at 3B behavioural scale.

Compatibility

Drops into your entire softmax stack — proven, not promised

Rational replaces one operator and leaves everything else untouched. It was trained inside a modern transformer stack — not in a toy setting — so it composes with the parts you already use. Nothing in your pipeline has to change.

  • qk-norm
  • z-loss
  • µP
  • Muon + AdamW
  • Mixture-of-Experts
  • GQA
  • RoPE
  • bf16
  • gradient checkpointing
  • fused / Flash-style kernels
  • multi-GPU (DDP)

Positioned honestly next to prior work — Apple's FlashSigmoid, NVIDIA's nGPT, attention-sink / softmax1 (Miller, 2023), PolaFormer, Microsoft's Differential Transformer. Each component exists somewhere in the literature; our angle is the intersection — bounded divisive-norm denominator + drop-in/zero-param + a measured parameterless-stability effect. We cite, we don't overclaim.

Results & Evidence

What we measured — and you can verify

Diagnostic-scale (sub-Chinchilla, ≤236M tokens) — provisional, pre-registered, reproducible from sealed result files. Headline effects below; the downloadable bundle lets you check the numbers yourself. No formula is disclosed.

Training-process defect rate — gradient-norm excursions (↓ better)stabilizer-free · high LR
softmax ~18%
rational ~0%
Process capability — Cpk (↑ better)stabilizer-free · high LR
softmax 0.30
rational ~1.59
Compute efficiency — MFU (parity is the result)tax ≤ 1%
softmax baseline
rational ≈ equal
Best-vs-best held-out loss edge — direction across scale (lower = rational ahead)3B · 7B
softmax baseline (0) 3B 7B −0.0146 −0.0172

Read this honestly. Two width points establish a direction, not a scaling law. The edge (~1.5% perplexity) is best attributed to learning-rate headroom, not a separate nonlinearity advantage — at equal LR with both arms stabilized, the two are at parity.

Stability, dramatic

Stabilizer-free, bare softmax shows ~6.5× larger peak gradient norm (≈42.9 vs ≈6.6) and 29 spikes vs. 0 for rational — same body, optimizer, seed and data.

Scale tested

≈601M · 1B · 3B · 7B on H100 / H200. The clean comparison is 3B→7B; the decisive next step is a properly-budgeted 3B run where we expect rational to separate from vanilla.

The ask

Compute + capital + the right partner to convert provisional numbers into production-grade evidence, and to test downstream + abstain for the first time.

Downloads · sanitized evidence

Don't trust us — verify

The full data — not summaries: every experiment, every training step, every stability statistic. The only things withheld are the recipe (formula, kernel internals, exact learning rates, schedule, SPC thresholds) and quant performance figures — those are the IP. Checksums let you verify each file.

★ Full results report

The complete write-up — isolation design, all results with numbers, every table, methodology, provenance and honest limits. Start here.

Download .md

Experiment ledger — all 17 runs

Per-run BPB, gradient-norm median/peak, excursions, Cpk, defect rate, MFU. Every experiment, nothing cherry-picked.

Download .csv

3B convergence — full 36-step curve

Held-out BPB at every step for rational vs. tuned softmax, plus the per-step delta. Rational leads at every converged step.

Download .csv

Stability pairs — gradient-norm SPC

Stabilizer-free softmax vs. rational at the same LR: gn median/peak, excursions, Cpk, defect rate. The collapse, in numbers.

Download .csv

Kernel equivalence & efficiency

Forward/backward cosine ≈ 0.99999, VRAM −86% vs. our naive reference, plus a corrupted control that's correctly rejected.

Download .csv

Scale summary & 7B edge

3B / 7B best-vs-best edge (−0.0146 → −0.0172) + the 7B converged-phase deltas. Direction, not a law.

Download .md.csv

Stability benchmark — summary

The headline stability table (Cpk, defect, peak gn, MFU parity, drop-in equivalence) in prose + csv.

Download .md.csv

Evaluation methodology — one-pager

Pre-registration, matched-step, deterministic recompute, gradient-norm SPC, md5 chain.

Download .md

Quant research — methodology card

Walk-forward, multi-horizon, leak-hardened, placebo-tested. Methodology only — no performance figures.

Download .md

Reproducibility manifest

md5 of every public artifact above — verify with md5sum -c. Internal raw-log chain available under NDA.

Download .txt
Want the full, mechanism-level technical brief, the internal raw-log evidence chain, and team intros? Those are shared under NDA with qualified investors and partners — request it here.
Capabilities · portfolio

What the team can build

Architectural range that signals one thing to investors and partners: this team ships, it doesn't just talk. ML engineering is one capability among several — the rarer edge is analytical engineering beyond ML.

Quantitative research engine ● live

A production-grade, walk-forward, multi-horizon cross-sectional pipeline — data → signal → model → decision — producing models today, with leak-hardened, placebo-tested evaluation and in-production SPC monitoring. (B2B, licensed institutions only; not investment advice.)

Industrial data & digital twins

Turning raw industrial/market data into decisions: predictive maintenance, time-series, FFT / spectral signal analysis — and digital-twin data strategy (what to collect, how, the real ROI) before a single model is built.

Brain-inspired architecture R&D

Research proof-of-concepts outside the softmax ecosystem (concept binding, spreading activation). Early-stage, framed honestly — evidence of deep architectural flexibility, not a shipped product.

Applied LLM systems

On-prem / data-sovereign assistants: Turkish-fluent SFT, tool-routing, live-web grounding with sources, data-cleaning & QA pipelines, and a rigorous training-QC discipline.

Services · for partners

How we engage

Ordered by how quickly and reliably they create value. We work with partners worldwide; Türkiye is our R&D base, not a ceiling. Under NDA we share enough to evaluate seriously.

Custom model + SFT + RAGcore
A capable assistant on your data & tools — vanilla or rational (a stability / cost option, not a quality claim).
Training-data pipelinesdata
You bring raw text; we return a clean, deduplicated, domain-mixed, packed & tokenized training-ready stream — leak-checked, with a quality report. Your corpus, made model-ready by automated pipelines.
Data→model & digital-twin consultingwedge
Low-commitment diagnostic report → full pipeline. Predictive maintenance, FFT / DSP, regime detection.
Strategic & analytical engineering advisoryadvisory
For companies and individuals: where the value is, which technical-business move to make. Results-oriented, realistic.
Quant research & signal infrastructureB2B
Engine licence / signal-as-research for licensed institutions only. Technology & rigour — not investment advice.
"Türkiye arm" / local presencepartnership
A TR R&D & delivery arm for a foreign firm or lab — including a frontier-lab office option. Our IP stays ours.
Training-stability / QC auditmethodology
We make training failures visible: pre-registered, reproducible, SPC-gated. Live-monitor your own runs.
Flexible / hourly micro-servicesfast cash
Editorial, data & evaluation services for teams building Turkish models. Hourly / fractional.
Data-sovereign / on-prem assistantvertical
For regulated SMBs that can't use public APIs (KVKK / data residency): a grounded assistant behind your firewall.
IP · patent in preparation

Protected, narrowly and honestly

We show the effects; we never publish the mechanism. The patent is a defensive shield, not a broad claim — the real moat is the recipe (trade secret) plus execution and capability-per-dollar.

Narrow composition + measured effect

The filing covers a specific composition of bounded components together with a measured technical effect (training stability + learning-rate headroom) — not a broad formula monopoly.

Prior art, cited not hidden

We position openly against attention-sink / softmax1, FlashSigmoid, nGPT, PolaFormer and Differential Transformer. Hiding precedent fails diligence; citing it is the defensible position.

Recipe = trade secret

The formula, kernel internals, exact learning rates, schedule, and SPC thresholds are never disclosed — more valuable kept secret than patented. Effects are public; "how" is not.

Filing-first, then disclosure

Priority is filed before any public mechanism disclosure (EU/EPO has no grace period). This site carries effects only and is publication-safe; mechanism stays under NDA.

Public disclosure follows IP protection — by design. The moment our priority filing is locked in, we'll open the full results and a technical paper to the world. Until then the complete picture — methodology, raw-log evidence, and the team behind it — is already on the table under NDA for serious investors and partners. The evidence is ready; the public unveiling is just a matter of timing.

About

Stealth by design

We are a four-engineer team with ~20 years of industrial experience on average — top-tier expertise across logistics, industry and software, plus advanced analytical engineering beyond ML: systems, optimization, numerical methods, signal processing, economics. ML engineering is one capability; our rarer edge is the combination. We are results-oriented and realistic — engineering that works and can be measured, not academic abstraction.

We are in stealth to protect our IP while a method patent application is in preparation. Full technical disclosure and team introductions are available under NDA to qualified investors and partners. Don't trust us — run it: our headline numbers regenerate from sealed, pre-registered result files.

Our ambition is global and our mindset is borderless — we're open to the right strategic partners, investors and support wherever in the world they are. Türkiye is our R&D base and cost-talent root, not a ceiling. We are investment-ready and can incorporate cleanly the moment the right partner and terms appear.

Investors · partners

Fund the experiment that de-risks the thesis

We measured rational's parameterless-stability, drop-in / zero-param, capability-per-dollar and modest edge at a 200M-token diagnostic scale. The support we're seeking — compute, capital, the right partner — lets us validate at 3B with a Chinchilla-sufficient budget. We expect that run to prove our difference from vanilla more clearly and bring our capabilities into the open:

  • we expect the edge over vanilla to hold — and we'll learn whether it comes from the nonlinearity or from learning-rate headroom;
  • the first real test of our downstream-capability and calibration / abstain (anti-hallucination) hypotheses — behaviour we hypothesise (not yet demonstrated — a controlled test was an honest null) might set rational apart, which we simply haven't been able to measure yet;
  • it turns provisional signal into production-grade evidence.

A virtual data-room — full technical brief, internal raw-log evidence chain, team intros — opens under NDA.

Request the technical brief

NDA-gated. Email us and we’ll reply from a Tetracta address to arrange access.

Email to request access

No mechanism is shared before an NDA. Effects & methodology are public above.

Contact

Let's talk

Anonymous in public, but ready for online meetings with qualified counterparts. Tell us who you are and we'll share what's appropriate — under NDA where it matters.

  • Investors & partners: the 3B-validation brief, results and team intros — under NDA.
  • Customers: book a pilot (custom model, RAG / SFT, data & digital-twin, quant).
  • Researchers: methodology and an honest discussion of what's proven and what isn't.

Email us directly

This site uses no contact forms and no cookies — only privacy-first analytics that never stores your IP. Email us and tell us which one you are (investor, partner, customer, researcher, or hire-us-hourly); we’ll take it from there, under NDA where it matters.

Email [email protected]

We never share your details, and we don’t store anything from this page. No technical formulae are exchanged before an NDA.