Session Metrics & Coherence

Track development session quality and measure coherence with the paper's evaluation framework.

What are Session Metrics?

Session metrics track the quality and efficiency of your development sessions. CCO collects detailed telemetry during each task execution, measuring everything from context retrieval effectiveness to token usage efficiency.

These metrics align with the research paper's evaluation framework, allowing you to compare your workflow against established baselines and identify areas for improvement.

ℹ️

Metrics are stored in memory and can be retrieved later for historical analysis or team sharing.

Paper Reference

The metrics system is inspired by and aligned with the research paper's findings:

Key Statistics from Research

Baseline measurements from the paper's evaluation

2,801 prompts → 1,197 agent invocations → 16,522 agent turns

This translates to key baseline ratios:

~2.34 prompts per invocation - how many prompts trigger each agent call
~0.072 invocations per turn - agent invocation density
~13.8 turns per invocation - conversation depth per agent call

SessionMetrics Class

The SessionMetrics class captures comprehensive data about each development session:

Properties

Property	Description
`tasks_completed`	Number of tasks successfully completed during the session
`tasks_abandoned`	Number of tasks started but not completed
`context_retrieval_hits`	Times relevant context was found in memory
`context_retrieval_misses`	Times relevant context was NOT found in memory
`role_distribution`	Dictionary mapping agent roles to invocation counts
`turns_per_task`	Average conversation turns per task
`session_duration`	Total session time in seconds
`token_usage`	Object with prompt_tokens, completion_tokens, total_tokens
`prompts_per_invocation`	Ratio of prompts to agent invocations (baseline: ~2.34)
`invocations_per_turn`	Ratio of invocations to turns (baseline: ~0.072)

CoherenceScore Class

The CoherenceScore provides a single quality grade [0-100] for your session based on multiple factors:

Components

Factor	Weight	Description
Task completion rate	25%	Ratio of completed vs abandoned tasks
Context retrieval rate	25%	Hits divided by total retrieval attempts
Efficiency factor	25%	How your metrics compare to paper baselines
Role diversity	25%	Distribution of agent roles used

Grade Scale

A (90-100)

Excellent

Outstanding session coherence with high completion and efficiency

B (80-89)

Good

Solid performance with minor optimization opportunities

C (70-79)

Average

Acceptable session with some areas for improvement

D (60-69)

Below Average

Noticeable inefficiencies or completion issues

F (<60)

Poor

Significant issues requiring attention

MetricCollector Class

The MetricCollector is the main interface for collecting and storing session metrics:

Basic Usage

from codified_orchestrator.session_metrics import MetricCollector

collector = MetricCollector()
collector.start_session()

# Record task lifecycle events
collector.record_task_start("task-123")
collector.record_agent_invocation("planner")
collector.record_context_hit()

# Continue task work...
collector.record_task_completion()

# End session and get results
results = collector.end_session()

# Store in memory for later retrieval
collector.store_in_memory(source_ref="session-123")

Key Methods

Method	Description
`start_session()`	Initialize a new metrics collection session
`record_task_start(task_id)`	Mark when a task begins execution
`record_task_completion()`	Mark the current task as successfully completed
`record_task_abandon()`	Mark the current task as abandoned
`record_agent_invocation(role)`	Record an agent call, optionally specifying the role
`record_context_hit()`	Log a successful context retrieval from memory
`record_context_miss()`	Log a failed context retrieval attempt
`record_token_usage(prompt, completion, total)`	Log token consumption for the current operation
`end_session()`	Finalize collection and return SessionMetrics object
`store_in_memory(source_ref)`	Persist results to AOMA memory store

CLI Command

The cco metrics command provides quick access to session statistics:

cco metrics

Parameters

Parameter	Description
`--json`	Output results in JSON format for programmatic use
`--session`	Specify a session ID to retrieve (default: latest)
`--store`	Store the current session metrics after display

Example Output

Session Metrics Summary
========================
Session ID:    abc-123-def
Duration:      45m 32s
Tasks:         3 completed, 1 abandoned

Context Retrieval:
  Hits:         47
  Misses:       8
  Hit Rate:     85.5%

Token Usage:
  Prompt:       124,500
  Completion:   89,200
  Total:        213,700

Coherence Score: B (84/100)
  Task Rate:     75% (weight: 25%)
  Context Rate:  85% (weight: 25%)
  Efficiency:    92% (weight: 25%)
  Role Diversity: 80% (weight: 25%)

ℹ️

Use cco metrics --json for automation scripts or integration with external dashboards.

Interpreting Metrics

Context Retrieval Rate

A higher context retrieval hit rate indicates effective use of memory. Aim for 80%+ to ensure the agent has relevant context for decisions.

⚠️

If your context retrieval rate is below 60%, consider adding more relevant documents to memory or improving your context patterns.

Token Efficiency

Compare your prompts_per_invocation ratio against the 2.34 baseline. Lower values indicate more efficient prompting; higher values may suggest room for prompt optimization.

Role Diversity

A healthy session should involve multiple roles (planner, executor, reviewer). Sessions using only one or two roles may be missing important perspectives.

Task Completion

Track your completed vs abandoned task ratio over time. A high abandonment rate may indicate:

Tasks too complex for single-session completion
Poor task scoping at the start
Context gaps causing direction changes