Reasoning traces

Use the reasoning-trace flow to capture the chain-of-thought output of thinking-capable language and vision-language models alongside the visible response. v0.3.0 introduces two DTOs in model-service/src/application/dto/reasoning.py:

@dataclass(frozen=True)
class ThinkingStep:
    content: str
    tokens_used: int | None = None

@dataclass(frozen=True)
class ThinkingTrace:
    steps: list[ThinkingStep]
    total_tokens: int | None = None
    model_id: str = ""

@dataclass(frozen=True)
class ReasonedText:
    text: str
    thinking: ThinkingTrace | None = None
    tokens_used: int | None = None

ReasonedText is what the ILanguageModel.generate_reasoned port returns. text is the visible response; thinking is the parsed reasoning trace, or None for non-thinking models.

When the trace is populated

Thinking-capable models populate ReasonedText.thinking from the <think>...</think> blocks in the raw output. The current catalog lists the following options as thinking-capable:

qwen-3-vl-8b-thinking         video summarization
qwen-3-vl-30b-a3b-thinking    video summarization
deepseek-r1-distill-qwen-14b  ontology / claims
deepseek-r1-distill-qwen-32b  ontology / claims
deepseek-r1-distill-qwen-1-5b-gguf  ontology / claims (CPU)

Non-thinking models (Qwen3-VL non-thinking variants, Llama-4, Pixtral, Claude family, GPT family, Gemini family) return ReasonedText with thinking=None.

Where the trace surfaces

The model-service FastAPI schemas in infrastructure/adapters/inbound/fastapi/schemas/reasoning.py expose the trace on every response that runs through a use case that calls a thinking-capable model. The backend forwards the trace unchanged. The frontend renders the trace inside a collapsible "thinking" panel below the visible response.

Properties

ThinkingTrace.is_empty           True when steps is empty
ThinkingTrace.combined_text      steps joined by blank lines
ReasonedText.has_thinking        True iff thinking is not None
                                 and not empty

These properties are what the FastAPI schemas use to decide whether to emit a thinking block in the response body; the frontend uses them to decide whether to render the panel.

When the trace is populated​

Where the trace surfaces​

Properties​

When the trace is populated

Where the trace surfaces

Properties