Observatory Dashboard

A static HTML dashboard for runs, benchmarks, and lessons learned from your CCO sessions.

What is the Observatory?

The Observatory is a static HTML dashboard that visualizes your CCO runtime history. It provides a comprehensive view of past runs, benchmark comparisons, and cross-run insights to help you understand patterns and improve your workflow.

Unlike dynamic dashboards that require a running server, the Observatory generates static HTML files that can be viewed offline and shared easily. The dashboard is built from run artifacts and benchmark data collected during CCO execution.

ℹ️

The Observatory is regenerated after notable runs or benchmarks land. Use python3 scripts/build_report_dashboard.py to rebuild it manually.

Dashboard Location

The Observatory dashboard is available at:

File	Description
`docs/observatory/index.html`	The main dashboard view - open this file in a browser
`docs/observatory/dashboard-source.html`	The source template for generating the dashboard

Quick Access

Open docs/observatory/index.html directly in your browser to view the dashboard. No server required.

Features

Run Tracking

The Observatory maintains a complete history of all CCO runs with full artifact preservation:

View past runs with artifacts - Browse through previous executions including their outputs, metrics, and logs
Trace execution flow via run_id - Use the unique run identifier to track exactly what happened during each session
Branch-based isolation - Each run maintains its own branch context for clean reproducibility

Benchmark Comparisons

Compare performance across different runs and configurations:

Context usage metrics - Track how efficiently CCO uses context across different scenarios
Role distribution charts - Visualize which agents are being used most frequently
Task completion rates - Measure success rates and identify bottlenecks

Lessons Learned

The Observatory captures institutional knowledge from your runs:

Cross-run memory insights - Patterns that emerge across multiple runs are automatically distilled
Pattern recognition - Common success paths and failure modes are highlighted

Paper Alignment Metrics

Measure how well your CCO usage aligns with the research paper's framework (arxiv:2602.20478):

Metric	Description
Tasks Count	Total number of tasks executed, showing workflow volume
Agent Role Distribution	Which specialized agents are being invoked and how often
Context Retrieval Hit Rate	Percentage of context lookups that found relevant information
Completion Rates	Ratio of completed vs abandoned tasks
Task Patterns	Identified patterns in task types and execution flows

Parallel Workstream Visualization

The Observatory includes an SVG Gantt chart showing concurrent runs:

Concurrent run tracking - See multiple CCO sessions running in parallel
Timeline visualization - Understand temporal relationships between runs
Resource utilization - Identify periods of high and low activity

Gantt Chart Features

The SVG-based Gantt visualization is interactive and shows:

Run start/end times
Duration and overlap between concurrent sessions
Agent invocation density across time

Accessing Observatory

# Open directly in browser
open docs/observatory/index.html

# Or rebuild the dashboard first
python3 scripts/build_report_dashboard.py

Files Reference

Path	Purpose
`docs/observatory/index.html`	Generated dashboard - view this in browser
`docs/observatory/dashboard-source.html`	Source template - edit this to customize dashboard structure

Metrics Dashboard

The Observatory includes a comprehensive metrics dashboard that provides deep visibility into your CCO usage patterns.

Available Metrics

Total Tasks - Cumulative count of all tasks across runs
Role Distribution - Breakdown of which specialized agents handled work
Context Retrieval Hits - How often relevant context was found in memory
Completion Rates - Percentage of tasks that finished successfully
Task Patterns - Identified recurring execution patterns

Using Metrics for Continuous Improvement

Review the metrics dashboard regularly to:

Identify underutilized agents that could help with certain task types
Spot context retrieval patterns that indicate memory gaps
Track completion rate trends to catch degradation early
Compare benchmark results across different model configurations

Interpreting the Dashboard

The dashboard uses visual indicators to help you quickly assess status:

Green indicators - Metrics within expected ranges
Yellow indicators - Metrics approaching threshold limits
Red indicators - Metrics requiring attention or correction