Session Metrics & Coherence
Track development session quality and measure coherence with the paper's evaluation framework.
What are Session Metrics?
Session metrics track the quality and efficiency of your development sessions. CCO collects detailed telemetry during each task execution, measuring everything from context retrieval effectiveness to token usage efficiency.
These metrics align with the research paper's evaluation framework, allowing you to compare your workflow against established baselines and identify areas for improvement.
Metrics are stored in memory and can be retrieved later for historical analysis or team sharing.
Paper Reference
The metrics system is inspired by and aligned with the research paper's findings:
Key Statistics from Research
Baseline measurements from the paper's evaluation
2,801 prompts → 1,197 agent invocations → 16,522 agent turns
This translates to key baseline ratios:
- ~2.34 prompts per invocation - how many prompts trigger each agent call
- ~0.072 invocations per turn - agent invocation density
- ~13.8 turns per invocation - conversation depth per agent call
SessionMetrics Class
The SessionMetrics class captures comprehensive data about each development session:
Properties
| Property | Description |
|---|---|
tasks_completed |
Number of tasks successfully completed during the session |
tasks_abandoned |
Number of tasks started but not completed |
context_retrieval_hits |
Times relevant context was found in memory |
context_retrieval_misses |
Times relevant context was NOT found in memory |
role_distribution |
Dictionary mapping agent roles to invocation counts |
turns_per_task |
Average conversation turns per task |
session_duration |
Total session time in seconds |
token_usage |
Object with prompt_tokens, completion_tokens, total_tokens |
prompts_per_invocation |
Ratio of prompts to agent invocations (baseline: ~2.34) |
invocations_per_turn |
Ratio of invocations to turns (baseline: ~0.072) |
CoherenceScore Class
The CoherenceScore provides a single quality grade [0-100] for your session based on multiple factors:
Components
| Factor | Weight | Description |
|---|---|---|
| Task completion rate | 25% | Ratio of completed vs abandoned tasks |
| Context retrieval rate | 25% | Hits divided by total retrieval attempts |
| Efficiency factor | 25% | How your metrics compare to paper baselines |
| Role diversity | 25% | Distribution of agent roles used |
Grade Scale
A (90-100)
Excellent
Outstanding session coherence with high completion and efficiency
B (80-89)
Good
Solid performance with minor optimization opportunities
C (70-79)
Average
Acceptable session with some areas for improvement
D (60-69)
Below Average
Noticeable inefficiencies or completion issues
F (<60)
Poor
Significant issues requiring attention
MetricCollector Class
The MetricCollector is the main interface for collecting and storing session metrics:
Basic Usage
from codified_orchestrator.session_metrics import MetricCollector
collector = MetricCollector()
collector.start_session()
# Record task lifecycle events
collector.record_task_start("task-123")
collector.record_agent_invocation("planner")
collector.record_context_hit()
# Continue task work...
collector.record_task_completion()
# End session and get results
results = collector.end_session()
# Store in memory for later retrieval
collector.store_in_memory(source_ref="session-123")
Key Methods
| Method | Description |
|---|---|
start_session() |
Initialize a new metrics collection session |
record_task_start(task_id) |
Mark when a task begins execution |
record_task_completion() |
Mark the current task as successfully completed |
record_task_abandon() |
Mark the current task as abandoned |
record_agent_invocation(role) |
Record an agent call, optionally specifying the role |
record_context_hit() |
Log a successful context retrieval from memory |
record_context_miss() |
Log a failed context retrieval attempt |
record_token_usage(prompt, completion, total) |
Log token consumption for the current operation |
end_session() |
Finalize collection and return SessionMetrics object |
store_in_memory(source_ref) |
Persist results to AOMA memory store |
CLI Command
The cco metrics command provides quick access to session statistics:
cco metrics
Parameters
| Parameter | Description |
|---|---|
--json |
Output results in JSON format for programmatic use |
--session |
Specify a session ID to retrieve (default: latest) |
--store |
Store the current session metrics after display |
Example Output
Session Metrics Summary ======================== Session ID: abc-123-def Duration: 45m 32s Tasks: 3 completed, 1 abandoned Context Retrieval: Hits: 47 Misses: 8 Hit Rate: 85.5% Token Usage: Prompt: 124,500 Completion: 89,200 Total: 213,700 Coherence Score: B (84/100) Task Rate: 75% (weight: 25%) Context Rate: 85% (weight: 25%) Efficiency: 92% (weight: 25%) Role Diversity: 80% (weight: 25%)
Use cco metrics --json for automation scripts or integration with external dashboards.
Interpreting Metrics
Context Retrieval Rate
A higher context retrieval hit rate indicates effective use of memory. Aim for 80%+ to ensure the agent has relevant context for decisions.
If your context retrieval rate is below 60%, consider adding more relevant documents to memory or improving your context patterns.
Token Efficiency
Compare your prompts_per_invocation ratio against the 2.34 baseline. Lower values indicate more efficient prompting; higher values may suggest room for prompt optimization.
Role Diversity
A healthy session should involve multiple roles (planner, executor, reviewer). Sessions using only one or two roles may be missing important perspectives.
Task Completion
Track your completed vs abandoned task ratio over time. A high abandonment rate may indicate:
- Tasks too complex for single-session completion
- Poor task scoping at the start
- Context gaps causing direction changes