LoopKit — your first loop engineering starter kit
After 15 versions of kompress, 8 teachers, 4 data sources, 3 architectures, and one council that said RETRAIN three times in a row — we extracted the pattern.
It's called LoopKit. And it's yours.
What is LoopKit?
A monorepo starter kit for building self-improving systems. The same four-phase loop that produced every kompress model — plan, execute, evaluate, decide — wrapped in a box that anyone can clone and extend.
git clone https://github.com/peterlodri-sec/loopkit
cd loopkit
python -m loops.hello.loop
# 5 iterations. SHIP. You just ran your first loop.
The pattern that produced 15 models
Every kompress model was one iteration of a loop:
| Version | What we tried | Heretic | Decision |
|---|---|---|---|
| v2 | — | 0.975 | Baseline established |
| v4 | Self-labels | 0.943 | Override internalized — ship |
| v6 | Agent-distribution | 0.962 | Dead end — pivot |
| v8 | Qwen2.5 teacher | 0.955 | Production — ship |
| v9 | C3-only | 0.921 | Overfit — retrain with diversity |
| v11 | Larger encoder | 0.906 | Capacity ≠ precision — pivot |
| v14 | Council training | 0.882 | Concept proven — retrain |
Each row is plan → execute → evaluate → decide. The outer loop — the decision about what to try next — was us: a human and an AI agent, reviewing results, brainstorming ideas, launching experiments.
LoopKit automates the outer loop so you can scale it.
What's in the box
loopkit/
├── GUIDE.md ← The full guide (tiered: quick start → deep dive)
├── README.md ← You are here
├── pyproject.toml
├── bot/ ← Telegram bot (your outer loop operator)
│ ├── main.py ← /new, /run, /decide, natural chat
│ ├── memory.py ← SQLite — remembers across restarts
│ └── council.py ← LLM reviews results, suggests next actions
├── loops/
│ ├── base.py ← Abstract Loop class
│ ├── hello/ ← Minimal example (5 lines of logic)
│ ├── template/ ← cp -r template myloop → start building
│ └── kompress/ ← The full 15-model pipeline
├── concepts/ ← Reference implementations
│ ├── self_labeling.py
│ ├── evaluator_optimizer.py
│ └── council.py
├── evals/
│ └── heretic.py ← Portable adversarial benchmark
└── notebooks/
└── loopkit_hello.ipynb ← Colab-ready
The Telegram bot — your outer loop in chat
User: /new kompress-v15
Bot: ✅ Created loop kompress-v15.
User: /run kompress-v15
Bot: 🔄 Running kompress-v15...
✅ kompress-v15-001 complete
Results: heretic 0.961, keep_rate 0.85
Decision: Council says SHIP 🚀
User: /history kompress-v15
Bot: 📜 kompress-v15 — 1 experiment
🚀 v15-001: ship — "Beats v8 (0.955), ready to deploy"
User: my model regressed, what should I try?
Bot: Regression happens! Here's what I'd check:
1. Label quality — is your teacher too aggressive?
2. Data diversity — are you mixing in generic data?
3. Epochs — 3 was the sweet spot, more = overfitting
The bot remembers everything in SQLite, uses an LLM council (GLM-5.1 by default, configurable to anything), and falls back to heuristic rules when no LLM is available. It's the outer loop operator — the thing that asks "what next?" and then executes it.
Group Engineering meets Loop Engineering
Anthropic's Engineering Groups of AI Agents (June 2025) describes how multiple agents collaborate in structured groups. Loop engineering is the temporal version:
| Group Engineering | Loop Engineering |
|---|---|
| Multiple agents collaborate in parallel | Multiple iterations build on each other |
| Coordinator delegates tasks | Council decides next experiment |
| Agents have specialized roles | Each iteration has a hypothesis |
| Results merge into a solution | Results converge toward a target |
The council is the coordinator. The loops are the agents. Time is the orchestrator.
Patterns you can use today
The GUIDE.md documents five patterns extracted from the kompress loop:
- Self-Labeling — model labels its own training data
- Evaluator-Optimizer — stronger teacher corrects student's mistakes
- C3 Self-Distillation — Collect → Curate → Compress on real-world data
- Council — LLM reviews results and decides what to try next
- The Loop Pattern — combine all four into a self-improving pipeline
Each has a reference implementation in concepts/. Each is production-tested on real models.
The Loop Engineering Ecosystem
LoopKit didn't emerge in a vacuum. It's part of a growing movement sparked by Addy Osmani's Loop Engineering essay — the canonical text that defined the 5 building blocks every loop needs: automations, worktrees, skills, plugins, sub-agents, and memory.
Here's how the pieces connect:
| Piece | What it is | Link |
|---|---|---|
| Addy Osmani's essay | The canonical text — 5 building blocks + memory, practical patterns | addyosmani.com |
| Cobus Greyling's reference impl | npm tools (loop-audit, loop-init, loop-cost), 7 patterns, pattern picker, goal engineering | github.com/cobusgreyling |
| LangChain: The Art of Loop Engineering | 4 stacked loops (Agent → Verification → Event-Driven → Hill Climbing), "loopcraft" | langchain.com |
| LoopKit (this post) | Python-native starter kit, Telegram bot, council, Colab notebook, kompress case study | github.com/peterlodri-sec |
The key quotes that drive this:
"You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents." — Peter Steinberger
"I don't prompt Claude anymore. I have loops running that prompt Claude. My job is to write loops." — Boris Cherny (Head of Claude Code, Anthropic)
The 4 stacked loops (LangChain)
LangChain's framework describes loop engineering as stacking four levels of loops:
- Agent Loop — model calls tools until done (LoopKit:
Loop.run()) - Verification Loop — grader checks output, retries on failure (LoopKit:
loops/verification/) - Event-Driven Loop — webhooks/cron trigger agents (LoopKit: Telegram bot)
- Hill Climbing Loop — analysis agent reviews traces, rewrites the harness (LoopKit: Council + Ralph)
The insight from level 4: the return arrow "reaches inside and updates the agent loop directly." The loop that watches the loop that watches the loop. Meta-stability through recursion.
7 battle-tested patterns (Cobus Greyling)
Cobus's reference implementation includes 7 production patterns with real win/failure stories:
- Daily Triage — morning routine for any repo (LoopKit:
loops/daily_triage/) - PR Review — sub-agent drafts, second reviews, opens PR
- Dependency Update — checks deps, updates, tests, opens PR
- Release Notes — reads commits, generates changelog
- Code Migration — finds deprecated patterns, replaces, tests
- Bug Hunt — reads reports, searches codebase, proposes fix
- Documentation Drift — compares code to docs, flags gaps
Every pattern follows the same loop: discover → plan → execute → verify → ship.
Interactive Docs Site
We built a single-page guide site for LoopKit — same style as cobusgreyling.github.io/loop-engineering. It includes the full loop pattern visualization, kompress results table, ecosystem links, 5 loops, 7 production patterns, and the Telegram bot setup — all on one page.
The Full Stack: How LoopKit + Cobus Greyling Work Together
LoopKit and Cobus Greyling's loop-engineering are complementary. Here's how they fit:
| Layer | Cobus Greyling | LoopKit |
|---|---|---|
| Audit & Planning | loop-audit — scores loop readiness, suggests improvements |
Council — LLM reviews results, decides next action |
| Scaffolding | loop-init — scaffolds from 7 proven patterns |
loops/template/ — cp -r template myloop |
| Cost Estimation | loop-cost — estimates token spend before running |
Budget tracking in state.json |
| Execution | Grok/Claude Code/Codex native loops | Python Loop.run() — plan→execute→evaluate→decide |
| Persistence | Markdown files (LOOP.md, STATE.md, loop-run-log.md) | SQLite + state.json per loop |
| Monitoring | GitHub Actions, loop-audit dogfood | Ralph Loop — loop watching loops with OpenTelemetry |
| Sharing | Stories directory — real wins + failures | HuggingFace Datasets — experiment history as queryable datasets |
Use Cobus's tools to plan and audit your loops. Use LoopKit to run and scale them. Together they form the complete loop engineering stack.
Cobus's key patterns we've adopted:
- AGENTS.md — project conventions (our
concepts/directory) - LOOP.md — loop design document (our
GUIDE.md) - STATE.md — durable memory outside conversation (our
state.json+ SQLite) - loop-run-log.md — experiment log (our
Experimentdataclass + history) - loop-budget.md — cost tracking (our budget section in GUIDE)
Why this matters
Most ML experimentation is ad-hoc. You try something, get a result, and think "what next?" The loop pattern makes it systematic. Every experiment has a hypothesis. Every result has a decision. Every decision feeds into the next plan.
LoopKit gives you the scaffolding. You bring the idea. The loop does the rest.
GitHub: peterlodri-sec/loopkit Guide: GUIDE.md Colab: loopkit_hello.ipynb Models: PeetPedro on HuggingFace The kompress story
This post is part of the LoopKit project. See also: the kompress heretic eval, all kompress models on HuggingFace, the ultrawhale training repo, and headroom.