Skip to main content

Reasoning traces

Use the reasoning-trace flow to capture the chain-of-thought output of thinking-capable language and vision-language models alongside the visible response. v0.3.0 introduces two DTOs in model-service/src/application/dto/reasoning.py:

@dataclass(frozen=True)
class ThinkingStep:
content: str
tokens_used: int | None = None

@dataclass(frozen=True)
class ThinkingTrace:
steps: list[ThinkingStep]
total_tokens: int | None = None
model_id: str = ""

@dataclass(frozen=True)
class ReasonedText:
text: str
thinking: ThinkingTrace | None = None
tokens_used: int | None = None

ReasonedText is what the ILanguageModel.generate_reasoned port returns. text is the visible response; thinking is the parsed reasoning trace, or None for non-thinking models.

When the trace is populated

Thinking-capable models populate ReasonedText.thinking from the <think>...</think> blocks in the raw output. The current catalog lists the following options as thinking-capable:

qwen-3-vl-8b-thinking         video summarization
qwen-3-vl-30b-a3b-thinking video summarization
deepseek-r1-distill-qwen-14b ontology / claims
deepseek-r1-distill-qwen-32b ontology / claims
deepseek-r1-distill-qwen-1-5b-gguf ontology / claims (CPU)

Non-thinking models (Qwen3-VL non-thinking variants, Llama-4, Pixtral, Claude family, GPT family, Gemini family) return ReasonedText with thinking=None.

Where the trace surfaces

The model-service FastAPI schemas in infrastructure/adapters/inbound/fastapi/schemas/reasoning.py expose the trace on every response that runs through a use case that calls a thinking-capable model. The backend forwards the trace unchanged. The frontend renders the trace inside a collapsible "thinking" panel below the visible response.

Properties

ThinkingTrace.is_empty           True when steps is empty
ThinkingTrace.combined_text steps joined by blank lines
ReasonedText.has_thinking True iff thinking is not None
and not empty

These properties are what the FastAPI schemas use to decide whether to emit a thinking block in the response body; the frontend uses them to decide whether to render the panel.