Iterative self-labeling: how we taught a compression model to compress itself

The silver label problem (previous post) is this: kompress was trained on Q&A compression pairs where 28% of must-keep tokens (numbers, error names, paths, flags) were labeled as "drop" because the Q&A compressor didn't need them to answer the question. The model faithfully learned the wrong policy.

We tried four training runs with different data mixes and loss weights. exact_keep_pct plateaued at 0.877-0.882 across all of them.

Then we ran Experiment A.

Experiment A: use the model to label itself

The insight: kompress-v3 with the hard inference override (PR #1400) produces correct compressions. Numbers, ALLCAPS error names, paths — all forced-kept regardless of model score. If we compress the training texts with this oracle, the output is a reference with near-correct must-keep labels.

def compress_with_override(text):
    # run v3 normally
    scores = model.get_scores(tokenize(text))
    keep = scores > 0.5
    # hard override: force-keep must-keep tokens
    for i, token in enumerate(tokens):
        if MUST_KEEP_RE.search(detokenize(token)):
            keep[i] = True
    return detokenize([t for t, k in zip(tokens, keep) if k])

We ran this on 1802 training texts on a GPU (RTX 4090, ~5 minutes). Then trained kompress-v4 on the output.

mk_in_ref before: 0.72 (ultrawhale Q&A references) mk_in_ref after: 0.823 (self-labeled via v3+override)

Not 1.0 — the override doesn't perfectly reconstruct every must-keep token at the tokenization boundary. But 0.823 is meaningfully better than 0.72.

The result

Eval on heretic-style adversarial prompts (dense with must-keep tokens: chemical formulas, memory addresses, CVEs, error codes):

Version	Training labels	mk_in_ref	Heretic exact_pct	Override delta
v3	ultrawhale Q&A	0.72	0.942	+0.027
v3.1	+domain data	~0.85*	0.925	+0.002
v3.2	+domain, LoRA	~0.85*	0.929	+0.002
v3.3	domain only	1.00	0.942*	—
v4	self-labeled	0.823	0.967	+0.000

*estimated

The override delta for v4 is zero. The model learned to preserve must-keep tokens on its own. It no longer needs the inference-time fallback.

This answers the central question from the silver label post: the problem was label quality, not model capacity. With mk_in_ref=0.823, the model internalized what the override was enforcing. The override becomes a defense-in-depth no-op.

The iterative loop

v4 was trained using v3+override as the reference generator. But v4 is a better compressor — it preserves more must-keep tokens with higher confidence. If we use v4+override to generate v5's references, mk_in_ref should be higher. Each generation might bring the labels closer to 1.0.

This is the self-improvement loop:

v3 → (v3+override labels) → v4 → (v4+override labels) → v5 → ...

Until the override delta stays at zero and mk_in_ref stops improving. That's the convergence criterion.

v5 result: exact_pct = 0.961, override delta = 0.000. The loop converged. v5 is slightly worse than v4 (0.961 vs 0.967) on one prompt (SSL cert bypass regressed: 0.895 → 0.789), all others held or improved. The iterative self-labeling has a natural ceiling — the gain from v3→v4 was the large jump; v4→v5 adds noise rather than signal.

The convergence point is v4. Further iterations would need qualitatively different training data (C3 self-distillation from real production traffic), not another round of the same self-labeling loop.

What this means for the architecture

The inference-time override (PR #1400, approved by JerrettDavis) is still the right thing to ship:

Current model is v2-base. v4 isn't the default yet. Until it ships as the default, the override is essential.
Defense in depth. Even once v4 is default, the override costs one regex pass per chunk (~0.1ms). It catches edge cases the model hasn't seen.
The override makes training possible. Without the override at inference time, we couldn't generate good self-labels to train the next version.

The override and the training improvement are not alternatives — the override enables the training loop that makes itself redundant.

Cost

Run	Instance	Time	Cost
v3 training	RTX 4090	~15 min	$0.09
v4 self-label + train	RTX 4090	~25 min	$0.15
v5 self-label + train	RTX 4090	~25 min	$0.15
Total compute			~$0.55

All of today's training runs: $0.55. One cold brew.

v5 results

(updating when the run completes)

Metric	v4	v5
mk_in_ref	0.823	~0.86 (est.)
Heretic exact_pct	0.967	0.961
Override delta	0.000	0.000
Verdict	breakthrough	converged — ceiling reached

Model: PeetPedro/kompress-v4 Code: ultrawhale/scripts/run_training_v4.sh PR: headroomlabs-ai/headroom#1400