Parameterless stability strongest leg
Where bare softmax needs external stabilizers and still spikes at high learning rates, rational stays calm — by construction, because its denominator is bounded. This is our strongest, most-measured result.
A drop-in attention operator that stays stable where bare softmax needs external crutches — at zero extra parameters and parity compute cost. Not a quality miracle: a quieter, cheaper, more predictable way to train.
for humans, like humans.
We're raising compute & the right partner to validate at 3B scale — fund the experiment that de-risks the thesis, not a claim that it's already proven.
Custom models, RAG & SFT, data→signal→model and digital-twin consulting, and quantitative research infrastructure — grounded in ~20 years of multi-sector industrial & analytical engineering. Real delivery, not slideware.
What we buildA new attention foundation — measured training stability, drop-in / zero-parameter, capability-per-dollar. A fundable research thesis with a clean, auditable evidence chain; method patent application in preparation.
The researchA drop-in replacement for softmax attention. Its weights do not sum to one; it draws on a bounded, divisive-normalization intuition — the way biological neurons compete and self-limit. We describe what it does, and the effects we measured. We never disclose the recipe.
Where bare softmax needs external stabilizers and still spikes at high learning rates, rational stays calm — by construction, because its denominator is bounded. This is our strongest, most-measured result.
Identical parameter count, op-behaviour preserved (bf16 cosine ≈ 0.99999), lighter memory. It replaces the softmax op in place and keeps the entire softmax ecosystem — no architectural surgery, no retraining infrastructure, no new dependencies.
MFU is at parity with vanilla; the measured compute tax is ≤ 1%. Higher learning-rate tolerance converts into steadier convergence on the same hardware — fewer failed runs, less babysitting, lower training risk.
A small, direction-consistent loss improvement at 3B and 7B — pre-registered and md5-sealed. We sell direction, not a law, and best attribute it to learning-rate headroom. No quality-leap claim.
Pre-registration, deterministic recompute, gradient-norm SPC / Six-Sigma gates, md5 evidence chains. Diligence-grade rigour is the trust layer behind every number — and a product in its own right.
The bounded design provides a natural "I'd rather not answer" sink. Whether this reduces hallucination is not yet proven — a controlled test was an honest null. It is exactly what we are raising compute to test at 3B behavioural scale.
Rational replaces one operator and leaves everything else untouched. It was trained inside a modern transformer stack — not in a toy setting — so it composes with the parts you already use. Nothing in your pipeline has to change.
Positioned honestly next to prior work — Apple's FlashSigmoid, NVIDIA's nGPT, attention-sink / softmax1 (Miller, 2023), PolaFormer, Microsoft's Differential Transformer. Each component exists somewhere in the literature; our angle is the intersection — bounded divisive-norm denominator + drop-in/zero-param + a measured parameterless-stability effect. We cite, we don't overclaim.
Diagnostic-scale (sub-Chinchilla, ≤236M tokens) — provisional, pre-registered, reproducible from sealed result files. Headline effects below; the downloadable bundle lets you check the numbers yourself. No formula is disclosed.
Read this honestly. Two width points establish a direction, not a scaling law. The edge (~1.5% perplexity) is best attributed to learning-rate headroom, not a separate nonlinearity advantage — at equal LR with both arms stabilized, the two are at parity.
Stabilizer-free, bare softmax shows ~6.5× larger peak gradient norm (≈42.9 vs ≈6.6) and 29 spikes vs. 0 for rational — same body, optimizer, seed and data.
≈601M · 1B · 3B · 7B on H100 / H200. The clean comparison is 3B→7B; the decisive next step is a properly-budgeted 3B run where we expect rational to separate from vanilla.
Compute + capital + the right partner to convert provisional numbers into production-grade evidence, and to test downstream + abstain for the first time.
The full data — not summaries: every experiment, every training step, every stability statistic. The only things withheld are the recipe (formula, kernel internals, exact learning rates, schedule, SPC thresholds) and quant performance figures — those are the IP. Checksums let you verify each file.
The complete write-up — isolation design, all results with numbers, every table, methodology, provenance and honest limits. Start here.
Download .mdPer-run BPB, gradient-norm median/peak, excursions, Cpk, defect rate, MFU. Every experiment, nothing cherry-picked.
Download .csvHeld-out BPB at every step for rational vs. tuned softmax, plus the per-step delta. Rational leads at every converged step.
Download .csvStabilizer-free softmax vs. rational at the same LR: gn median/peak, excursions, Cpk, defect rate. The collapse, in numbers.
Download .csvForward/backward cosine ≈ 0.99999, VRAM −86% vs. our naive reference, plus a corrupted control that's correctly rejected.
Download .csv3B / 7B best-vs-best edge (−0.0146 → −0.0172) + the 7B converged-phase deltas. Direction, not a law.
Download .md.csvThe headline stability table (Cpk, defect, peak gn, MFU parity, drop-in equivalence) in prose + csv.
Download .md.csvPre-registration, matched-step, deterministic recompute, gradient-norm SPC, md5 chain.
Download .mdWalk-forward, multi-horizon, leak-hardened, placebo-tested. Methodology only — no performance figures.
Download .mdmd5 of every public artifact above — verify with md5sum -c. Internal raw-log chain available under NDA.
Download .txtArchitectural range that signals one thing to investors and partners: this team ships, it doesn't just talk. ML engineering is one capability among several — the rarer edge is analytical engineering beyond ML.
A production-grade, walk-forward, multi-horizon cross-sectional pipeline — data → signal → model → decision — producing models today, with leak-hardened, placebo-tested evaluation and in-production SPC monitoring. (B2B, licensed institutions only; not investment advice.)
Turning raw industrial/market data into decisions: predictive maintenance, time-series, FFT / spectral signal analysis — and digital-twin data strategy (what to collect, how, the real ROI) before a single model is built.
Research proof-of-concepts outside the softmax ecosystem (concept binding, spreading activation). Early-stage, framed honestly — evidence of deep architectural flexibility, not a shipped product.
On-prem / data-sovereign assistants: Turkish-fluent SFT, tool-routing, live-web grounding with sources, data-cleaning & QA pipelines, and a rigorous training-QC discipline.
Ordered by how quickly and reliably they create value. We work with partners worldwide; Türkiye is our R&D base, not a ceiling. Under NDA we share enough to evaluate seriously.
We show the effects; we never publish the mechanism. The patent is a defensive shield, not a broad claim — the real moat is the recipe (trade secret) plus execution and capability-per-dollar.
The filing covers a specific composition of bounded components together with a measured technical effect (training stability + learning-rate headroom) — not a broad formula monopoly.
We position openly against attention-sink / softmax1, FlashSigmoid, nGPT, PolaFormer and Differential Transformer. Hiding precedent fails diligence; citing it is the defensible position.
The formula, kernel internals, exact learning rates, schedule, and SPC thresholds are never disclosed — more valuable kept secret than patented. Effects are public; "how" is not.
Priority is filed before any public mechanism disclosure (EU/EPO has no grace period). This site carries effects only and is publication-safe; mechanism stays under NDA.
Public disclosure follows IP protection — by design. The moment our priority filing is locked in, we'll open the full results and a technical paper to the world. Until then the complete picture — methodology, raw-log evidence, and the team behind it — is already on the table under NDA for serious investors and partners. The evidence is ready; the public unveiling is just a matter of timing.
We are a four-engineer team with ~20 years of industrial experience on average — top-tier expertise across logistics, industry and software, plus advanced analytical engineering beyond ML: systems, optimization, numerical methods, signal processing, economics. ML engineering is one capability; our rarer edge is the combination. We are results-oriented and realistic — engineering that works and can be measured, not academic abstraction.
We are in stealth to protect our IP while a method patent application is in preparation. Full technical disclosure and team introductions are available under NDA to qualified investors and partners. Don't trust us — run it: our headline numbers regenerate from sealed, pre-registered result files.
Our ambition is global and our mindset is borderless — we're open to the right strategic partners, investors and support wherever in the world they are. Türkiye is our R&D base and cost-talent root, not a ceiling. We are investment-ready and can incorporate cleanly the moment the right partner and terms appear.
We measured rational's parameterless-stability, drop-in / zero-param, capability-per-dollar and modest edge at a 200M-token diagnostic scale. The support we're seeking — compute, capital, the right partner — lets us validate at 3B with a Chinchilla-sufficient budget. We expect that run to prove our difference from vanilla more clearly and bring our capabilities into the open:
A virtual data-room — full technical brief, internal raw-log evidence chain, team intros — opens under NDA.
NDA-gated. Email us and we’ll reply from a Tetracta address to arrange access.
Email to request accessNo mechanism is shared before an NDA. Effects & methodology are public above.
Anonymous in public, but ready for online meetings with qualified counterparts. Tell us who you are and we'll share what's appropriate — under NDA where it matters.
Or email [email protected]
This site uses no contact forms and no cookies — only privacy-first analytics that never stores your IP. Email us and tell us which one you are (investor, partner, customer, researcher, or hire-us-hourly); we’ll take it from there, under NDA where it matters.
Email [email protected]We never share your details, and we don’t store anything from this page. No technical formulae are exchanged before an NDA.