From coding to orchestrating
Most teams today have AI writing code while humans review it line by line. The hardest transition is moving beyond that — replacing ad-hoc human review with structured, repeatable verification that you actually trust. Dan Shapiro’s five-level framework describes this progression well.What makes it work
The dark factory isn’t a single tool or practice. It’s a set of capabilities that compound: Declarative workflows over imperative prompts. When the process is a version-controlled graph — not a chat transcript — you can review, iterate, and share it like any other source file. The workflow itself becomes the specification of how work gets done. Deterministic verification over human review. Test suites, linters, type checkers, and LLM-as-judge evaluations replace line-by-line code review. Failures route back to fix loops automatically. Humans define the criteria; the system enforces them. Multi-model ensembles over single-model dependence. Using different models for implementation and verification breaks the circularity problem — where the builder and inspector share the same blind spots. Cross-critique with fresh eyes catches what self-review misses. Checkpointed execution over black-box runs. Git commits after every stage create an audit trail. When something goes wrong, you can inspect, revert, or fork from any point — without having watched the run live. Continuous improvement over static processes. Automatic retrospectives after every run feed a learning loop. Workflows get better over time, not just the code they produce.The human role in a dark factory
The dark factory doesn’t eliminate engineering judgment. It redirects it:| Before | After |
|---|---|
| Writing code | Defining workflows and prompts |
| Reviewing diffs | Defining verification criteria |
| Debugging test failures | Designing fix loops |
| Watching agent sessions | Reviewing retrospectives |
| Manual quality checks | Tuning goal gates and evals |