Research Note · small-model engineering

A 1B model should route, not remember

Tetracta AI Teams · 15 June 2026

A small-model engineering note from our 1-billion-parameter assistant work. It shares a design lesson and honestly-stated results. No architecture, training recipe, dataset, or competitor comparison is disclosed.

There is a persistent temptation, when you build a small language model, to treat it like a small encyclopedia — to cram as much world-knowledge into the weights as they will hold and hope it answers like a big model on a budget. We spent real effort on a 1-billion-parameter assistant, and the most useful thing we learned is that this instinct is backwards.

Small models are bad encyclopedias — and that's fine

Be honest about what a 1B model is. Its internal, parametric recall of specific facts is weak; pushed on open-ended factual questions, a model this size is not a reliable knowledge store, and no amount of wishful prompting changes that. Pretending otherwise is how you ship a confident, wrong assistant. We measured this on our own model and did not flinch from it: raw, internal knowledge at this scale is not competitive, and we do not sell it as if it were.

The mistake is treating that as a failure. It isn't — it's a design constraint that points at a better architecture.

Route, don't remember

The useful role for a small model is not knowing; it is orchestrating. A 1B model can be very good at the things that don't require a huge memory: understanding what the user wants, deciding which tool to call, forming a clean query, and composing a grounded answer from what the tools return. Arithmetic goes to a calculator, not the weights. Dates and computations are resolved exactly, by code. Facts come from a live search with the source attached, not from a hazy parametric memory. The model's job is to be the fast, cheap conductor — not the orchestra.

When we rebuilt our assistant around that idea, the honest results were the ones that actually matter for a product: tool selection that lands reliably on the right tool, exact answers where exactness is checkable (math, dates) because those are delegated to code, web-grounded answers that carry their sources, and fast responses — on the order of a second or two from cache. None of those depend on the model "knowing" much. They depend on it routing well.

Why this is the right trade for small teams

Grounding-over-parameters is not just an accuracy story; it is an economics one. A small routing model is cheap to run, cheap to host, and small enough to deploy on-premise — which is exactly what budget-bound teams, sovereign-AI efforts, and data-resident (regulated) deployments need. You get an assistant whose facts are as current as your search index and as auditable as the sources it cites, without renting a large model's inference bill. The capability you care about per dollar spent is high precisely because you stopped asking the parameters to do a job retrieval does better.

The honest limits

We state the ceiling plainly. A 1B assistant is strongest in its primary language and weaker in others; its unaided, open-domain language quality is not at the frontier, and we don't claim it is. Grounding reduces hallucination relative to ungrounded small-model output, but it does not abolish it — a bad source or a bad query still produces a bad answer, so the tool layer and the sources matter as much as the model. And "route, don't remember" is a design philosophy, not a magic switch: the routing itself has to be good, and making it good is most of the work.

None of that undercuts the lesson, which we think generalizes well beyond our own assistant: if your model is small, spend your effort on what it routes to, not on what it memorizes. The cheapest, most honest assistant is the one that knows when to look something up — and then actually does.

— Tetracta AI Teams · for humans, like humans.

← All research notes   Talk methodology