claim_extraction
Claim extraction from video summaries using LLMs.
This module provides functions for extracting atomic factual claims from summary text using language models. Supports multiple extraction strategies, contextual enrichment from ontology and annotations, and hierarchical claim decomposition.
json
logging
re
Any
GenerationConfig
LLMLoader
ExtractedClaim
logger
extract_claims_from_summary
async def extract_claims_from_summary(
summary_text: str,
sentences: list[str] | None,
strategy: str,
max_claims: int,
min_confidence: float,
llm_loader: LLMLoader,
ontology_context: dict[str, Any] | None = None,
annotation_context: list[dict[str, Any]] | None = None
) -> list[ExtractedClaim]
Extract atomic claims from summary text.
Parameters
summary_text : str Full summary text to extract claims from. sentences : list[str] | None Pre-split sentences (if None, will split automatically). strategy : str Extraction strategy: "sentence-based", "semantic-units", or "hierarchical". max_claims : int Maximum number of claims to extract. min_confidence : float Minimum confidence threshold. llm_loader : LLMLoader Loaded LLM for generation. ontology_context : dict[str, Any] | None Ontology types and glosses for context. annotation_context : list[dict[str, Any]] | None Annotation data for context.
Returns
list[ExtractedClaim] List of extracted claims with subclaims.
build_extraction_prompt
def build_extraction_prompt(summary_text: str, sentences: list[str],
strategy: str,
ontology_context: dict[str, Any] | None,
annotation_context: list[dict[str, Any]] | None,
max_claims: int) -> str
Build LLM prompt for claim extraction.
Parameters
summary_text : str Full summary text. sentences : list[str] Split sentences. strategy : str Extraction strategy. ontology_context : dict[str, Any] | None Ontology types and glosses. annotation_context : list[dict[str, Any]] | None Annotation data. max_claims : int Maximum claims to extract.
Returns
str Formatted prompt for LLM.
parse_claims_response
def parse_claims_response(response: str, summary_text: str,
sentences: list[str],
min_confidence: float) -> list[ExtractedClaim]
Parse LLM response into structured claims.
Parameters
response : str Raw LLM response text. summary_text : str Original summary text. sentences : list[str] Split sentences. min_confidence : float Minimum confidence threshold.
Returns
list[ExtractedClaim] Parsed and validated claims.
parse_single_claim
def parse_single_claim(claim_data: dict[str, Any],
min_confidence: float) -> ExtractedClaim | None
Parse single claim recursively.
Parameters
claim_data : dict[str, Any] Claim data dictionary. min_confidence : float Minimum confidence threshold.
Returns
ExtractedClaim | None Parsed claim or None if below threshold.
split_into_sentences
def split_into_sentences(text: str) -> list[str]
Split text into sentences using simple heuristics.
Parameters
text : str Text to split.
Returns
list[str] List of sentences.