LoopKit — your first loop engineering starter kit

After 15 versions of kompress, 8 teachers, 4 data sources, 3 architectures, and one council that said RETRAIN three times in a row — we extracted the pattern.

It's called LoopKit. And it's yours.

What is LoopKit?

A monorepo starter kit for building self-improving systems. The same four-phase loop that produced every kompress model — plan, execute, evaluate, decide — wrapped in a box that anyone can clone and extend.

git clone https://github.com/peterlodri-sec/loopkit
cd loopkit
python -m loops.hello.loop
# 5 iterations. SHIP. You just ran your first loop.

▶️ Open in Colab

The pattern that produced 15 models

Every kompress model was one iteration of a loop:

Version	What we tried	Heretic	Decision
v2	—	0.975	Baseline established
v4	Self-labels	0.943	Override internalized — ship
v6	Agent-distribution	0.962	Dead end — pivot
v8	Qwen2.5 teacher	0.955	Production — ship
v9	C3-only	0.921	Overfit — retrain with diversity
v11	Larger encoder	0.906	Capacity ≠ precision — pivot
v14	Council training	0.882	Concept proven — retrain

Each row is plan → execute → evaluate → decide. The outer loop — the decision about what to try next — was us: a human and an AI agent, reviewing results, brainstorming ideas, launching experiments.

LoopKit automates the outer loop so you can scale it.

What's in the box

loopkit/
├── GUIDE.md              ← The full guide (tiered: quick start → deep dive)
├── README.md             ← You are here
├── pyproject.toml
├── bot/                  ← Telegram bot (your outer loop operator)
│   ├── main.py           ← /new, /run, /decide, natural chat
│   ├── memory.py         ← SQLite — remembers across restarts
│   └── council.py        ← LLM reviews results, suggests next actions
├── loops/
│   ├── base.py           ← Abstract Loop class
│   ├── hello/            ← Minimal example (5 lines of logic)
│   ├── template/         ← cp -r template myloop → start building
│   └── kompress/         ← The full 15-model pipeline
├── concepts/             ← Reference implementations
│   ├── self_labeling.py
│   ├── evaluator_optimizer.py
│   └── council.py
├── evals/
│   └── heretic.py        ← Portable adversarial benchmark
└── notebooks/
    └── loopkit_hello.ipynb  ← Colab-ready

The Telegram bot — your outer loop in chat

User: /new kompress-v15
Bot:   ✅ Created loop kompress-v15.

User: /run kompress-v15
Bot:   🔄 Running kompress-v15...
       ✅ kompress-v15-001 complete
       Results: heretic 0.961, keep_rate 0.85
       Decision: Council says SHIP 🚀

User: /history kompress-v15
Bot:   📜 kompress-v15 — 1 experiment
       🚀 v15-001: ship — "Beats v8 (0.955), ready to deploy"

User: my model regressed, what should I try?
Bot:   Regression happens! Here's what I'd check:
       1. Label quality — is your teacher too aggressive?
       2. Data diversity — are you mixing in generic data?
       3. Epochs — 3 was the sweet spot, more = overfitting

The bot remembers everything in SQLite, uses an LLM council (GLM-5.1 by default, configurable to anything), and falls back to heuristic rules when no LLM is available. It's the outer loop operator — the thing that asks "what next?" and then executes it.

Group Engineering meets Loop Engineering

Anthropic's Engineering Groups of AI Agents (June 2025) describes how multiple agents collaborate in structured groups. Loop engineering is the temporal version:

Group Engineering	Loop Engineering
Multiple agents collaborate in parallel	Multiple iterations build on each other
Coordinator delegates tasks	Council decides next experiment
Agents have specialized roles	Each iteration has a hypothesis
Results merge into a solution	Results converge toward a target

The council is the coordinator. The loops are the agents. Time is the orchestrator.

Patterns you can use today

The GUIDE.md documents five patterns extracted from the kompress loop:

Self-Labeling — model labels its own training data
Evaluator-Optimizer — stronger teacher corrects student's mistakes
C3 Self-Distillation — Collect → Curate → Compress on real-world data
Council — LLM reviews results and decides what to try next
The Loop Pattern — combine all four into a self-improving pipeline

Each has a reference implementation in concepts/. Each is production-tested on real models.

The Loop Engineering Ecosystem

LoopKit didn't emerge in a vacuum. It's part of a growing movement sparked by Addy Osmani's Loop Engineering essay — the canonical text that defined the 5 building blocks every loop needs: automations, worktrees, skills, plugins, sub-agents, and memory.

Here's how the pieces connect:

Piece	What it is	Link
Addy Osmani's essay	The canonical text — 5 building blocks + memory, practical patterns	addyosmani.com
Cobus Greyling's reference impl	npm tools (loop-audit, loop-init, loop-cost), 7 patterns, pattern picker, goal engineering	github.com/cobusgreyling
LangChain: The Art of Loop Engineering	4 stacked loops (Agent → Verification → Event-Driven → Hill Climbing), "loopcraft"	langchain.com
LoopKit (this post)	Python-native starter kit, Telegram bot, council, Colab notebook, kompress case study	github.com/peterlodri-sec

The key quotes that drive this:

"You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents." — Peter Steinberger

"I don't prompt Claude anymore. I have loops running that prompt Claude. My job is to write loops." — Boris Cherny (Head of Claude Code, Anthropic)

The 4 stacked loops (LangChain)

LangChain's framework describes loop engineering as stacking four levels of loops:

Agent Loop — model calls tools until done (LoopKit: Loop.run())
Verification Loop — grader checks output, retries on failure (LoopKit: loops/verification/)
Event-Driven Loop — webhooks/cron trigger agents (LoopKit: Telegram bot)
Hill Climbing Loop — analysis agent reviews traces, rewrites the harness (LoopKit: Council + Ralph)

The insight from level 4: the return arrow "reaches inside and updates the agent loop directly." The loop that watches the loop that watches the loop. Meta-stability through recursion.

7 battle-tested patterns (Cobus Greyling)

Cobus's reference implementation includes 7 production patterns with real win/failure stories:

Daily Triage — morning routine for any repo (LoopKit: loops/daily_triage/)
PR Review — sub-agent drafts, second reviews, opens PR
Dependency Update — checks deps, updates, tests, opens PR
Release Notes — reads commits, generates changelog
Code Migration — finds deprecated patterns, replaces, tests
Bug Hunt — reads reports, searches codebase, proposes fix
Documentation Drift — compares code to docs, flags gaps

Every pattern follows the same loop: discover → plan → execute → verify → ship.

Interactive Docs Site

We built a single-page guide site for LoopKit — same style as cobusgreyling.github.io/loop-engineering. It includes the full loop pattern visualization, kompress results table, ecosystem links, 5 loops, 7 production patterns, and the Telegram bot setup — all on one page.

→ loopkit docs

The Full Stack: How LoopKit + Cobus Greyling Work Together

LoopKit and Cobus Greyling's loop-engineering are complementary. Here's how they fit:

Layer	Cobus Greyling	LoopKit
Audit & Planning	`loop-audit` — scores loop readiness, suggests improvements	Council — LLM reviews results, decides next action
Scaffolding	`loop-init` — scaffolds from 7 proven patterns	`loops/template/` — `cp -r template myloop`
Cost Estimation	`loop-cost` — estimates token spend before running	Budget tracking in `state.json`
Execution	Grok/Claude Code/Codex native loops	Python `Loop.run()` — plan→execute→evaluate→decide
Persistence	Markdown files (LOOP.md, STATE.md, loop-run-log.md)	SQLite + `state.json` per loop
Monitoring	GitHub Actions, loop-audit dogfood	Ralph Loop — loop watching loops with OpenTelemetry
Sharing	Stories directory — real wins + failures	HuggingFace Datasets — experiment history as queryable datasets

Use Cobus's tools to plan and audit your loops. Use LoopKit to run and scale them. Together they form the complete loop engineering stack.

Cobus's key patterns we've adopted:

AGENTS.md — project conventions (our concepts/ directory)
LOOP.md — loop design document (our GUIDE.md)
STATE.md — durable memory outside conversation (our state.json + SQLite)
loop-run-log.md — experiment log (our Experiment dataclass + history)
loop-budget.md — cost tracking (our budget section in GUIDE)

Why this matters

Most ML experimentation is ad-hoc. You try something, get a result, and think "what next?" The loop pattern makes it systematic. Every experiment has a hypothesis. Every result has a decision. Every decision feeds into the next plan.

LoopKit gives you the scaffolding. You bring the idea. The loop does the rest.

GitHub: peterlodri-sec/loopkit Guide: GUIDE.md Colab: loopkit_hello.ipynb Models: PeetPedro on HuggingFace The kompress story

This post is part of the LoopKit project. See also: the kompress heretic eval, all kompress models on HuggingFace, the ultrawhale training repo, and headroom.