Event stream
The event stream is the foundation of Fabro’s observability. Every workflow run emits a sequence ofWorkflowRunEvent records that are:
- Written to
progress.jsonlin the run’s directory (one JSON object per line) - Broadcast via SSE to connected API clients in real time
- Logged via
tracingto the daily log file at~/.fabro/logs/
Event types
Events fall into several categories: Run lifecycle — bookend events for the entire run:| Event | Key fields | Description |
|---|---|---|
WorkflowRunStarted | name, run_id, base_sha, run_branch | Run begins |
WorkflowRunCompleted | duration_ms, artifact_count, total_cost | Run finishes successfully |
WorkflowRunFailed | error, duration_ms | Run terminates with an error |
| Event | Key fields | Description |
|---|---|---|
StageStarted | node_id, name, handler_type, attempt, max_attempts | Node begins executing |
StageCompleted | node_id, duration_ms, status, usage, files_touched | Node finishes |
StageFailed | node_id, failure, will_retry | Node fails (may or may not retry) |
StageRetrying | node_id, attempt, max_attempts, delay_ms | Retry scheduled after failure |
Agent.:
| Event | Key fields | Description |
|---|---|---|
Agent.SessionStarted | stage | Agent session begins |
Agent.ToolCallStarted | stage, tool_name, arguments | Agent invokes a tool |
Agent.ToolCallCompleted | stage, tool_name, output, is_error | Tool returns a result |
Agent.AssistantMessage | stage, text, model, usage | LLM responds with text |
Agent.Error | stage, error | Agent-level error |
Agent.LoopDetected | stage | Repeated tool call pattern detected |
Agent.SteeringInjected | stage, text | Human steering message injected |
Agent.ContextWindowWarning | stage, estimated_tokens, context_window_size, usage_percent | Token usage exceeds context window threshold |
Agent.CompactionStarted | stage, estimated_tokens, context_window_size | Context compaction triggered |
Agent.CompactionCompleted | stage, original_turn_count, preserved_turn_count, summary_token_estimate, tracked_file_count | Context compaction finished |
Agent.LlmRetry | stage, provider, model, attempt, delay_secs | LLM API call retried |
Agent.SubAgentSpawned | stage, agent_id, task | Sub-agent launched |
Agent.SubAgentCompleted | stage, agent_id, success, turns_used | Sub-agent finished |
| Event | Key fields | Description |
|---|---|---|
EdgeSelected | from_node, to_node, label, condition | Transition between nodes |
LoopRestart | from_node, to_node | Loop restart edge taken |
CheckpointSaved | node_id | Checkpoint written to disk |
GitCheckpoint | node_id, git_commit_sha | Checkpoint committed to Git |
Failover | stage, from_provider, to_provider, error | LLM provider failover |
| Event | Key fields | Description |
|---|---|---|
ParallelStarted | branch_count, join_policy, error_policy | Fan-out begins |
ParallelBranchStarted | branch, index | Individual branch begins |
ParallelBranchCompleted | branch, duration_ms, status | Branch finishes |
ParallelCompleted | duration_ms, success_count, failure_count | All branches done |
| Event | Key fields | Description |
|---|---|---|
InterviewStarted | question, stage, question_type | Human input requested |
InterviewCompleted | question, answer, duration_ms | Human responded |
InterviewTimeout | question, stage, duration_ms | Human didn’t respond in time |
| Event | Key fields | Description |
|---|---|---|
Sandbox.Initializing | provider | Sandbox creation started |
Sandbox.Ready | provider, duration_ms, cpu, memory | Sandbox ready |
SetupStarted | command_count | Setup commands beginning |
SetupCommandCompleted | command, exit_code, duration_ms | Single setup command finished |
SetupFailed | command, exit_code, stderr | Setup command failed |
SshAccessReady | ssh_command | SSH connection command printed |
StallWatchdogTimeout | node, idle_seconds | No activity for too long |
Event envelope format
Each line inprogress.jsonl is a JSON object with a standard envelope:
ts, run_id, and event fields are always present. The remaining fields vary by event type. Events are flattened — nested variants like agent and sandbox events use dot notation (e.g. Agent.ToolCallStarted, Sandbox.Ready).
Log files
Fabro writes two kinds of logs:Run logs (progress.jsonl)
Every run writes its event stream to {run_dir}/progress.jsonl. This is the primary data source for post-run analysis — retros read it, and you can query it directly with standard tools:
Live snapshot (live.json)
During execution, Fabro also writes live.json — a pretty-printed copy of the most recent event. This is useful for quick status checks while a run is in progress:
Application logs
Fabro uses thetracing crate to write structured logs to ~/.fabro/logs/YYYY-MM-DD.log. Control the log level with the FABRO_LOG environment variable:
| Level | What’s logged |
|---|---|
error | Run failures, stage failures that won’t retry |
warn | Retries, timeouts, interview timeouts, failovers, early terminations |
info | Run start/complete, SSH access ready |
debug | Stage start/complete, edge selections, checkpoints, parallel branches, tool calls |
Real-time monitoring
API: Server-Sent Events
When running workflows through the API server, subscribe to a live event stream via the run events endpoint. Each event is a JSON-serializedWorkflowRunEvent. The stream stays open until the run completes.
Web UI
The web frontend connects to the SSE stream automatically and displays run progress in real time — stage transitions, agent tool calls, and human gate prompts are all visible as they happen.
CLI progress
The CLI displays a live progress bar during execution with per-stage status, duration, and cost tracking. This is rendered to stderr so it doesn’t interfere with output piping.Post-run analysis
Listing runs
Browse your run history withfabro ps:
~/.fabro/runs/ and displays each run’s ID, workflow name, status, and start time. Use --json for machine-readable output.
Run artifacts
Each run’s directory contains a standard set of files:| File | Description |
|---|---|
manifest.json | Run metadata — ID, workflow name, start time, labels |
progress.jsonl | Full event stream |
live.json | Last event snapshot (overwritten during run) |
checkpoint.json | Final execution state |
retro.json | Retrospective (if retro generation is enabled) |
conclusion.json | Terminal status (completed, failed, canceled) |
Inspecting stages and turns
The API provides endpoints for drilling into individual stages and the agent turns within them. See the stages and turns API reference for details.Insights (SQL analytics)
The Insights feature lets you run SQL queries across your run data using DuckDB. This is useful for aggregate analysis — finding slow workflows, tracking failure rates, comparing model costs, and spotting trends. Insights is managed through the Insights API endpoints. You can save, update, and execute queries programmatically.Example queries
Average run duration by workflow:Aggregate usage
The API server tracks aggregate usage counters across all runs — total run count, total runtime, and per-model breakdowns of token usage and cost. See the usage endpoint in the API reference. Counters reset on server restart.
Credential redaction
All event output —progress.jsonl, SSE streams, and log files — is automatically redacted before being written. Fabro detects and replaces patterns that look like API keys, AWS credentials, bearer tokens, and other secrets with REDACTED. This ensures sensitive values that appear in tool call arguments or command output are never persisted to disk.