Session Metrics & Coherence

Track development session quality and measure coherence with the paper's evaluation framework.

What are Session Metrics?

Session metrics track the quality and efficiency of your development sessions. CCO collects detailed telemetry during each task execution, measuring everything from context retrieval effectiveness to token usage efficiency.

These metrics align with the research paper's evaluation framework, allowing you to compare your workflow against established baselines and identify areas for improvement.

ℹ️

Metrics are stored in memory and can be retrieved later for historical analysis or team sharing.

Paper Reference

The metrics system is inspired by and aligned with the research paper's findings:

Key Statistics from Research

Baseline measurements from the paper's evaluation

text
2,801 prompts → 1,197 agent invocations → 16,522 agent turns

This translates to key baseline ratios:

  • ~2.34 prompts per invocation - how many prompts trigger each agent call
  • ~0.072 invocations per turn - agent invocation density
  • ~13.8 turns per invocation - conversation depth per agent call

SessionMetrics Class

The SessionMetrics class captures comprehensive data about each development session:

Properties

Property Description
tasks_completed Number of tasks successfully completed during the session
tasks_abandoned Number of tasks started but not completed
context_retrieval_hits Times relevant context was found in memory
context_retrieval_misses Times relevant context was NOT found in memory
role_distribution Dictionary mapping agent roles to invocation counts
turns_per_task Average conversation turns per task
session_duration Total session time in seconds
token_usage Object with prompt_tokens, completion_tokens, total_tokens
prompts_per_invocation Ratio of prompts to agent invocations (baseline: ~2.34)
invocations_per_turn Ratio of invocations to turns (baseline: ~0.072)

CoherenceScore Class

The CoherenceScore provides a single quality grade [0-100] for your session based on multiple factors:

Components

Factor Weight Description
Task completion rate 25% Ratio of completed vs abandoned tasks
Context retrieval rate 25% Hits divided by total retrieval attempts
Efficiency factor 25% How your metrics compare to paper baselines
Role diversity 25% Distribution of agent roles used

Grade Scale

A (90-100)

Excellent

Outstanding session coherence with high completion and efficiency

B (80-89)

Good

Solid performance with minor optimization opportunities

C (70-79)

Average

Acceptable session with some areas for improvement

D (60-69)

Below Average

Noticeable inefficiencies or completion issues

F (<60)

Poor

Significant issues requiring attention

MetricCollector Class

The MetricCollector is the main interface for collecting and storing session metrics:

Basic Usage

python
from codified_orchestrator.session_metrics import MetricCollector

collector = MetricCollector()
collector.start_session()

# Record task lifecycle events
collector.record_task_start("task-123")
collector.record_agent_invocation("planner")
collector.record_context_hit()

# Continue task work...
collector.record_task_completion()

# End session and get results
results = collector.end_session()

# Store in memory for later retrieval
collector.store_in_memory(source_ref="session-123")

Key Methods

Method Description
start_session() Initialize a new metrics collection session
record_task_start(task_id) Mark when a task begins execution
record_task_completion() Mark the current task as successfully completed
record_task_abandon() Mark the current task as abandoned
record_agent_invocation(role) Record an agent call, optionally specifying the role
record_context_hit() Log a successful context retrieval from memory
record_context_miss() Log a failed context retrieval attempt
record_token_usage(prompt, completion, total) Log token consumption for the current operation
end_session() Finalize collection and return SessionMetrics object
store_in_memory(source_ref) Persist results to AOMA memory store

CLI Command

The cco metrics command provides quick access to session statistics:

bash
cco metrics

Parameters

Parameter Description
--json Output results in JSON format for programmatic use
--session Specify a session ID to retrieve (default: latest)
--store Store the current session metrics after display

Example Output

bash
Session Metrics Summary
========================
Session ID:    abc-123-def
Duration:      45m 32s
Tasks:         3 completed, 1 abandoned

Context Retrieval:
  Hits:         47
  Misses:       8
  Hit Rate:     85.5%

Token Usage:
  Prompt:       124,500
  Completion:   89,200
  Total:        213,700

Coherence Score: B (84/100)
  Task Rate:     75% (weight: 25%)
  Context Rate:  85% (weight: 25%)
  Efficiency:    92% (weight: 25%)
  Role Diversity: 80% (weight: 25%)
ℹ️

Use cco metrics --json for automation scripts or integration with external dashboards.

Interpreting Metrics

Context Retrieval Rate

A higher context retrieval hit rate indicates effective use of memory. Aim for 80%+ to ensure the agent has relevant context for decisions.

⚠️

If your context retrieval rate is below 60%, consider adding more relevant documents to memory or improving your context patterns.

Token Efficiency

Compare your prompts_per_invocation ratio against the 2.34 baseline. Lower values indicate more efficient prompting; higher values may suggest room for prompt optimization.

Role Diversity

A healthy session should involve multiple roles (planner, executor, reviewer). Sessions using only one or two roles may be missing important perspectives.

Task Completion

Track your completed vs abandoned task ratio over time. A high abandonment rate may indicate:

  • Tasks too complex for single-session completion
  • Poor task scoping at the start
  • Context gaps causing direction changes