Skip to main content
The term “dark factory” comes from manufacturing. Since 2001, FANUC has operated a factory near Mt. Fuji where robots build other robots — running unsupervised for up to 30 days at a time. The factory is “dark” because no humans are present and robots don’t need light. In software, the dark factory concept is different. It doesn’t mean zero human involvement — it means minimal human interaction with the code itself. Humans supervise the specs, guardrails, and outcomes, not each line of code. Engineers shift from writing and reviewing code to defining what should be built, how quality is measured, and when to intervene. This is an aspirational concept, and getting there is iterative.

From coding to orchestrating

Most teams today have AI writing code while humans review it line by line. The hardest transition is moving beyond that — replacing ad-hoc human review with structured, repeatable verification that you actually trust. Dan Shapiro’s five-level framework describes this progression well.

What makes it work

The dark factory isn’t a single tool or practice. It’s a set of capabilities that compound: Declarative workflows over imperative prompts. When the process is a version-controlled graph — not a chat transcript — you can review, iterate, and share it like any other source file. The workflow itself becomes the specification of how work gets done. Deterministic verification over human review. Test suites, linters, type checkers, and LLM-as-judge evaluations replace line-by-line code review. Failures route back to fix loops automatically. Humans define the criteria; the system enforces them. Multi-model ensembles over single-model dependence. Using different models for implementation and verification breaks the circularity problem — where the builder and inspector share the same blind spots. Cross-critique with fresh eyes catches what self-review misses. Checkpointed execution over black-box runs. Git commits after every stage create an audit trail. When something goes wrong, you can inspect, revert, or fork from any point — without having watched the run live. Continuous improvement over static processes. Automatic retrospectives after every run feed a learning loop. Workflows get better over time, not just the code they produce.

The human role in a dark factory

The dark factory doesn’t eliminate engineering judgment. It redirects it:
BeforeAfter
Writing codeDefining workflows and prompts
Reviewing diffsDefining verification criteria
Debugging test failuresDesigning fix loops
Watching agent sessionsReviewing retrospectives
Manual quality checksTuning goal gates and evals
The goal is to spend your time on the parts that require human judgment — what to build, how to verify it, and when something doesn’t look right — while the factory handles the rest.

Further reading