API Reference¶

Complete API documentation for Docs2Synth modules. This reference covers the public Python APIs for programmatic usage.

Getting Started

For practical usage examples and complete workflow instructions, see:

Complete Workflow Guide - End-to-end workflow from documents to RAG deployment
CLI Reference - Command-line interface documentation with examples

Overview¶

Docs2Synth provides both a CLI and programmatic Python API. This reference documents the Python API for users who want to:

Build custom workflows
Integrate Docs2Synth into existing applications
Extend functionality with custom components
Automate document processing pipelines

Core Package¶

Installation & Import¶

import docs2synth
from docs2synth import __version__

Package-Level Exports¶

`version`¶

Type: str

The installed version of Docs2Synth.

print(docs2synth.__version__)  # "0.1.0"

Configuration & Logging¶

The package auto-initializes global configuration on import:

from docs2synth.utils.config import get_config
from docs2synth.utils.logging import get_logger

config = get_config()
logger = get_logger(__name__)

Configuration (`docs2synth.utils.config`)¶

Classes¶

`Config`¶

Configuration manager supporting dot notation and YAML serialization.

Constructor:

Config(config_dict: Dict[str, Any] | None = None)

Class Methods:

@classmethod
def from_yaml(cls, yaml_path: str | Path) -> Config:
    """Load configuration from YAML file."""

Methods:

def get(self, key: str, default: Any = None) -> Any:
    """Get config value using dot notation (e.g., 'data.root_dir')."""

def set(self, key: str, value: Any) -> None:
    """Set config value using dot notation."""

def to_dict(self) -> Dict[str, Any]:
    """Convert to dictionary."""

def save(self, yaml_path: str | Path) -> None:
    """Save configuration to YAML file."""

Example:

from docs2synth.utils.config import Config

# Load from YAML
config = Config.from_yaml("config.yml")

# Access with dot notation
root_dir = config.get("data.root_dir")
model_name = config.get("agent.model", default="gpt-4")

# Update values
config.set("agent.temperature", 0.8)
config.save("config.yml")

Functions¶

`get_config()`¶

def get_config() -> Config:
    """Get global config instance."""

`set_config()`¶

def set_config(config: Config) -> None:
    """Set global config."""

`load_config()`¶

def load_config(yaml_path: str | Path) -> Config:
    """Load and set global config."""

Logging (`docs2synth.utils.logging`)¶

Constants¶

`LOG_FORMAT`¶

Type: str

Default log format with filename and line numbers:

%(asctime)s - %(name)s - %(levelname)s - [%(filename)s:%(lineno)d] - %(message)s

Functions¶

`setup_logging()`¶

def setup_logging(
    level: int | str = logging.INFO,
    log_file: str | Path = "./logs/docs2synth.log"
) -> None:
    """Setup logging with file and console handlers."""

`setup_logging_from_config()`¶

def setup_logging_from_config(config: Any = None) -> None:
    """Setup logging from Config object."""

`setup_cli_logging()`¶

def setup_cli_logging(verbose: int = 0, config: Any = None) -> None:
    """Setup logging for CLI with verbosity levels (0-2)."""

`get_logger()`¶

def get_logger(name: str) -> logging.Logger:
    """Get logger instance by name."""

`configure_third_party_loggers()`¶

def configure_third_party_loggers(level: int = logging.WARNING) -> None:
    """Configure third-party library loggers to reduce noise."""

`log_function_call()` (Decorator)¶

def log_function_call(logger: logging.Logger, level: int = logging.DEBUG):
    """Decorator to log function calls with arguments."""

Example:

from docs2synth.utils.logging import get_logger, log_function_call

logger = get_logger(__name__)

@log_function_call(logger)
def process_document(path: str) -> dict:
    logger.info(f"Processing {path}")
    return {"status": "success"}

Classes¶

`LoggerContext`¶

Context manager for temporary log level changes.

with LoggerContext(logger, logging.DEBUG):
    # Temporarily enable debug logging
    logger.debug("Detailed debug info")

`ProgressLogger`¶

Progress tracking with periodic logging.

progress = ProgressLogger("Processing", total=100)
for i in range(100):
    progress.update(i)
progress.complete()

Methods:

def __init__(
    self,
    name: str,
    total: int,
    logger: logging.Logger | None = None,
    log_interval: int = 10
)

def update(self, current: int | None = None) -> None:
    """Update progress."""

def complete(self) -> None:
    """Mark as complete."""

Timing Utilities (`docs2synth.utils.timer`)¶

Functions¶

`timer()` (Context Manager)¶

with timer("operation_name"):
    # Code to time
    process_data()

`timeit()` (Decorator)¶

@timeit(log_level=logging.INFO)
def expensive_function():
    # Automatically logs execution time
    pass

`format_time()`¶

def format_time(seconds: float) -> str:
    """Format seconds to human-readable string (e.g., '1h 23m 45s')."""

Classes¶

`Timer`¶

Reusable timer for multiple operations.

timer = Timer()
timer.start("preprocessing")
# ... do work ...
timer.stop("preprocessing")
print(f"Took {timer.elapsed('preprocessing')} seconds")

Document Processing Schema (`docs2synth.preprocess.schema`)¶

Type Aliases¶

BBox = Tuple[float, float, float, float]  # (x1, y1, x2, y2)
Polygon = List[Tuple[float, float]]  # [(x, y), ...]

Enums¶

`LabelType`¶

Semantic granularity labels for document objects.

class LabelType(Enum):
    TEXT = "text"
    SENTENCE = "sentence"
    PARAGRAPH = "paragraph"
    DOCUMENT = "document"
    OTHER = "other"

Dataclasses¶

`QAPair`¶

Question-answer pair with verification metadata.

Attributes:

question: str
answer: Optional[str]
strategy: Optional[str]
verification: Optional[Dict[str, Dict[str, Any]]]
extra: Dict[str, Any]

Methods:

def to_dict() -> Dict[str, Any]

@staticmethod
def from_dict(data: Mapping[str, Any]) -> QAPair

`DocumentObject`¶

Extracted document object with layout information.

Attributes:

object_id: int
text: str
bbox: BBox
label: LabelType
polygon: Optional[Polygon]
page: Optional[int]
score: Optional[float]
qa: List[QAPair]
extra: Dict[str, Any]

`DocumentProcessResult`¶

Complete document processing result.

Attributes:

objects: Dict[int, DocumentObject]
object_list: List[DocumentObject]
bbox_list: List[BBox]
context: str
reading_order_ids: List[int]
process_metadata: ProcessMetadata
document_metadata: DocumentMetadata
qa_metadata: Optional[RunMetadata]
verify_metadata: Optional[RunMetadata]

Methods:

def to_dict() -> Dict[str, Any]
def to_json(*, indent: Optional[int] = None) -> str

@staticmethod
def from_dict(data: Mapping[str, Any]) -> DocumentProcessResult

Example:

from docs2synth.preprocess.schema import DocumentProcessResult

# Load processed document
with open("output/doc.json") as f:
    result = DocumentProcessResult.from_dict(json.load(f))

# Access objects
for obj in result.object_list:
    print(f"Object {obj.object_id}: {obj.text}")
    print(f"  BBox: {obj.bbox}")
    print(f"  QA Pairs: {len(obj.qa)}")

Preprocessing (`docs2synth.preprocess`)¶

Functions¶

`run_preprocess()`¶

Process documents to extract layout and text.

def run_preprocess(
    path: str | Path,
    *,
    processor: str = "paddleocr",
    output_dir: Optional[str | Path] = None,
    lang: Optional[str] = None,
    device: Optional[str] = None,
    config: Optional[Config] = None
) -> Tuple[int, int, List[Path]]:
    """
    Process documents with specified processor.

    Args:
        path: Path to document or directory
        processor: Processor name ("paddleocr", "docling", "pdfplumber")
        output_dir: Output directory for results
        lang: Language code (e.g., "en")
        device: Device for processing ("cpu", "cuda")
        config: Configuration object

    Returns:
        Tuple of (processed_count, failed_count, output_paths)
    """

Example:

from docs2synth.preprocess.runner import run_preprocess

processed, failed, outputs = run_preprocess(
    "data/raw/documents/",
    processor="docling",
    output_dir="data/processed/",
    device="cuda"
)

print(f"Processed {processed} documents, {failed} failed")
for path in outputs:
    print(f"  -> {path}")

QA Generation (`docs2synth.qa`)¶

Configuration Classes¶

`QAStrategyConfig`¶

Configuration for a single QA generation strategy.

def __init__(
    self,
    strategy: str,
    provider: Optional[str] = None,
    model: Optional[str] = None,
    prompt_template: Optional[str] = None,
    temperature: Optional[float] = None,
    max_tokens: Optional[int] = None,
    group_id: Optional[str] = None,
    **kwargs: Any
)

`QAGenerationConfig`¶

Container for multiple QA strategies.

# Load from config
qa_config = QAGenerationConfig.from_yaml("config.yml")

# Get specific strategy
semantic_config = qa_config.get_strategy_config("semantic")

# List all strategies
strategies = qa_config.list_strategies()

`QAVerificationConfig`¶

Configuration for QA verification.

# Load from config
verify_config = QAVerificationConfig.from_yaml("config.yml")

# Get specific verifier
meaningful_config = verify_config.get_verifier_config("meaningful")

# List all verifiers
verifiers = verify_config.list_verifiers()

Generators¶

`BaseQAGenerator` (Abstract)¶

Base class for QA generators.

class BaseQAGenerator(ABC):
    def __init__(
        self,
        agent: Optional[AgentWrapper] = None,
        provider: Optional[str] = None,
        model: Optional[str] = None,
        **kwargs: Any
    )

    @abstractmethod
    def generate(
        self,
        context: str,
        target: str,
        temperature: float = 0.7,
        max_tokens: Optional[int] = None,
        **kwargs: Any
    ) -> str:
        """Generate QA pair from context."""

Concrete Generators¶

SemanticQAGenerator Generates semantic questions from document context.

LayoutAwareQAGenerator Generates questions that require layout understanding.

LogicalAwareQAGenerator Generates questions that require logical reasoning.

Factory¶

from docs2synth.qa.generators import QAGeneratorFactory

generator = QAGeneratorFactory.create_from_config(
    strategy_config,
    config_path="config.yml"
)
result = generator.generate(context="...", target="...")

Verifiers¶

`BaseQAVerifier` (Abstract)¶

Base class for QA verifiers.

@abstractmethod
def verify(
    self,
    question: str,
    answer: str,
    context: Optional[str] = None,
    image: Optional[Any] = None,
    temperature: float = 0.7,
    max_tokens: Optional[int] = None,
    **kwargs: Any
) -> Dict[str, Any]:
    """Verify QA pair quality."""

Concrete Verifiers¶

MeaningfulVerifier Verifies if the question is meaningful given the context.

CorrectnessVerifier Verifies if the answer is correct for the question.

Batch Processing¶

`process_batch()`¶

Process multiple documents for QA generation.

from docs2synth.qa.qa_batch import process_batch
from docs2synth.qa.config import QAGenerationConfig

qa_config = QAGenerationConfig.from_yaml("config.yml")
processed, generated, failed = process_batch(
    input_path=Path("data/processed/"),
    output_dir=Path("data/qa/"),
    qa_config=qa_config,
    processor_name="docling"
)

`process_batch_verification()`¶

Verify QA pairs in batch.

from docs2synth.qa.verify_batch import process_batch_verification
from docs2synth.qa.config import QAVerificationConfig

verify_config = QAVerificationConfig.from_yaml("config.yml")
processed, verified, rejected, failed = process_batch_verification(
    input_path=Path("data/qa/"),
    verification_config=verify_config
)

Agent System (`docs2synth.agent`)¶

AgentWrapper¶

High-level unified interface for LLM providers (OpenAI, Anthropic, Gemini, vLLM).

from docs2synth.agent import AgentWrapper

# Create agent from config
agent = AgentWrapper(provider="openai", model="gpt-4")

# Generate response
response = agent.generate(
    prompt="Extract key information from this text",
    system_prompt="You are a helpful assistant",
    temperature=0.7,
    max_tokens=500
)
print(response.content)

# Chat with messages
messages = [
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi! How can I help?"},
    {"role": "user", "content": "What's the weather?"}
]
response = agent.chat(messages)

# Switch provider dynamically
agent.switch_provider("anthropic", model="claude-3-sonnet")

Methods:

name="__codelineno-44-1" href="#__codelineno-44-1">def generate( self, prompt: str, system_prompt: Optional[str] = None, temperature: float = 0.7, max_tokens: Optional[int] = None, response_format: Optional[str] = None, image: Optional[Any] = None, **kwargs: Any class="p">) -> LLMResponse class="k">def chat( self, messages: List[Dict[str, str]], temperature: float = 0.7, max_tokens: Optional[int] = None, response_format: Optional[str] = None, image: Optional[Any] = None, **kwargs: Any class="p">) -> LLMResponse class="k">def switch_provider( self, provider: str, model: Optional[str] = None, **kwargs: Any class="p">) -> None

Properties:

@property
def model(self) -> str:
    """Current model name."""

@property
def provider_type(self) -> str:
    """Current provider type."""

LLMResponse¶

@dataclass
class LLMResponse:
    content: str                    # Generated text
    model: str                      # Model used
    usage: Dict[str, Any]          # Token usage info
    metadata: Dict[str, Any]       # Additional metadata

Retriever Training (`docs2synth.retriever`)¶

Dataset Functions¶

`load_verified_qa_pairs()`¶

Load verified QA pairs for training.

from docs2synth.retriever.dataset import load_verified_qa_pairs

qa_pairs = load_verified_qa_pairs(
    data_dir="data/qa/",
    processor_name="docling",
    verifier_types=["meaningful", "correctness"]
)

for question, answer, metadata in qa_pairs:
    print(f"Q: {question}")
    print(f"A: {answer}")

Model Creation¶

`create_model_for_qa()`¶

Create LayoutLM model for document QA.

from docs2synth.retriever.model import create_model_for_qa

model, tokenizer = create_model_for_qa(
    model_name="microsoft/layoutlmv3-base",
    num_labels=2
)

Training Functions¶

`train()`¶

Standard training function.

from docs2synth.retriever.training import train

train(
    model=model,
    train_dataset=train_data,
    eval_dataset=eval_data,
    output_dir="models/retriever",
    learning_rate=1e-5,
    num_epochs=10,
    batch_size=8
)

Metrics¶

`calculate_anls()`¶

Calculate Average Normalized Levenshtein Similarity.

from docs2synth.retriever.metrics import calculate_anls

score = calculate_anls(
    prediction="The answer is 42",
    ground_truth="The answer is 42.",
    threshold=0.5
)

RAG System (`docs2synth.rag`)¶

Types¶

`DocumentChunk`¶

Document chunk with metadata.

@dataclass
class DocumentChunk:
    text: str
    metadata: Optional[Mapping[str, object]]
    chunk_id: str
    embedding: Optional[np.ndarray]

`RetrievedDocument`¶

Retrieved document with similarity score.

@dataclass
class RetrievedDocument:
    chunk: DocumentChunk
    score: float

`RAGResult`¶

Complete RAG generation result.

@dataclass
class RAGResult:
    query: str
    answer: str
    retrieved_documents: Sequence[RetrievedDocument]
    iterations: Sequence[IterationResult]
    metadata: Mapping[str, object]

Embedding Model¶

from docs2synth.rag.embeddings import EmbeddingModel

embedding_model = EmbeddingModel(
    model_name="BAAI/bge-base-en-v1.5",
    device="cuda"
)

# Embed query
query_embedding = embedding_model.embed_query("What is RAG?")

# Embed batch
doc_embeddings = embedding_model.embed_texts([
    "Document 1 text",
    "Document 2 text"
])

print(f"Embedding dimension: {embedding_model.dimension}")

Vector Store¶

`FaissVectorStore`¶

FAISS-based vector storage.

from docs2synth.rag.vector_store import FaissVectorStore
from docs2synth.rag.types import DocumentChunk

# Create store
vector_store = FaissVectorStore(dimension=768)

# Add embeddings
chunks = [
    DocumentChunk(text="Document 1", metadata={"source": "file1.pdf"}, chunk_id="1"),
    DocumentChunk(text="Document 2", metadata={"source": "file2.pdf"}, chunk_id="2")
]
vector_store.add_embeddings(embeddings, chunks)

# Search
query_embedding = embedding_model.embed_query("search query")
results = vector_store.search(query_embedding, top_k=5)

for doc in results:
    print(f"Score: {doc.score} - {doc.chunk.text}")

# Persist
vector_store.save("vector_store.faiss")
loaded_store = FaissVectorStore.load("vector_store.faiss")

RAG Pipeline¶

`RAGPipeline`¶

Main orchestration class for RAG workflows.

from docs2synth.rag.pipeline import RAGPipeline
from docs2synth.utils.config import Config

# Load from config
config = Config.from_yaml("config.yml")
pipeline = RAGPipeline.from_config(config)

# Add documents
documents = [
    "The capital of France is Paris.",
    "Python is a popular programming language.",
    "Machine learning is a subset of AI."
]
chunks = pipeline.add_documents(documents)

# Run query with different strategies
result = pipeline.run(
    query="What is the capital of France?",
    strategy_name="naive"
)

print(f"Answer: {result.answer}")
print(f"Sources: {[doc.chunk.metadata for doc in result.retrieved_documents]}")

# List available strategies
print(f"Available strategies: {pipeline.strategies}")

# Reset
pipeline.reset()

RAG Strategies¶

`NaiveRAGStrategy`¶

Simple retrieval + generation strategy.

from docs2synth.rag.strategies import NaiveRAGStrategy

strategy = NaiveRAGStrategy(
    embedding_model=embedding_model,
    vector_store=vector_store,
    agent=agent,
    top_k=5
)

result = strategy.generate("What is machine learning?")

`EnhancedIterativeRAGStrategy`¶

Iterative refinement with similarity checks.

from docs2synth.rag.strategies import EnhancedIterativeRAGStrategy

strategy = EnhancedIterativeRAGStrategy(
    embedding_model=embedding_model,
    vector_store=vector_store,
    agent=agent,
    top_k=5,
    max_iterations=3,
    similarity_threshold=0.7
)

result = strategy.generate("Complex query requiring iteration")

Utility Functions¶

PDF Image Processing (`docs2synth.utils.pdf_images`)¶

from docs2synth.utils.pdf_images import (
    convert_pdf_to_images,
    get_pdf_images,
    convert_pdfs_in_directory
)

# Convert single PDF
images = convert_pdf_to_images("document.pdf", dpi=300)

# Get cached images
cached_images = get_pdf_images("document.pdf")

# Batch convert directory
count = convert_pdfs_in_directory("data/pdfs/", dpi=300)

Text Processing (`docs2synth.utils.text`)¶

from docs2synth.utils.text import truncate_context, calculate_max_context_length

# Truncate text to fit model context
truncated, was_truncated = truncate_context(
    context=long_text,
    provider="openai",
    model="gpt-4",
    max_tokens=4000
)

# Calculate max context length
max_length = calculate_max_context_length(
    provider="openai",
    model="gpt-4",
    prompt_template_overhead=300
)

Configuration Options¶

Key Configuration Paths¶

Data paths:

data:
  root_dir: "./data"
  datasets_dir: "./data/datasets"
  processed_dir: "./data/processed"
  qa_pairs_dir: "./data/qa"
  models_dir: "./models"
  logs_dir: "./logs"

Logging:

logging:
  level: "INFO"
  file:
    path: "./logs/docs2synth.log"
    max_bytes: 10485760  # 10MB
    backup_count: 5
  third_party:
    level: "WARNING"

Agent/LLM:

agent:
  provider: "openai"
  model: "gpt-4"
  openai:
    api_key: "sk-..."
    model: "gpt-4"
  anthropic:
    api_key: "..."
    model: "claude-3-sonnet"

QA Generation:

qa:
  - strategy: "semantic"
    provider: "openai"
    model: "gpt-4"
    temperature: 0.7
    max_tokens: 500
  - strategy: "layout_aware"
    provider: "anthropic"
    model: "claude-3-sonnet"

RAG:

rag:
  embedding:
    model_name: "BAAI/bge-base-en-v1.5"
  vector_store:
    type: "faiss"
    persist_path: "./data/vector_store"
  strategies:
    naive:
      type: "naive"
      top_k: 5
    enhanced:
      type: "enhanced"
      top_k: 5
      max_iterations: 3
      similarity_threshold: 0.7

Complete Example¶

Here's a complete example using the Python API:

from pathlib import Path
from docs2synth.utils.config import Config
from docs2synth.utils.logging import setup_logging, get_logger
from docs2synth.preprocess.runner import run_preprocess
from docs2synth.qa.config import QAGenerationConfig, QAVerificationConfig
from docs2synth.qa.qa_batch import process_batch
from docs2synth.qa.verify_batch import process_batch_verification
from docs2synth.rag.pipeline import RAGPipeline

# Setup
setup_logging()
logger = get_logger(__name__)
config = Config.from_yaml("config.yml")

# 1. Preprocess documents
logger.info("Preprocessing documents...")
processed, failed, outputs = run_preprocess(
    "data/raw/documents/",
    processor="docling",
    output_dir="data/processed/",
    config=config
)

# 2. Generate QA pairs
logger.info("Generating QA pairs...")
qa_config = QAGenerationConfig.from_config(config)
processed, generated, failed = process_batch(
    input_path=Path("data/processed/"),
    output_dir=Path("data/qa/"),
    qa_config=qa_config,
    config=config
)

# 3. Verify QA pairs
logger.info("Verifying QA pairs...")
verify_config = QAVerificationConfig.from_config(config)
processed, verified, rejected, failed = process_batch_verification(
    input_path=Path("data/qa/"),
    verification_config=verify_config
)

# 4. Setup RAG pipeline
logger.info("Setting up RAG pipeline...")
pipeline = RAGPipeline.from_config(config)

# Load documents into RAG
documents = []
for json_file in Path("data/qa/").glob("*.json"):
    with open(json_file) as f:
        result = DocumentProcessResult.from_dict(json.load(f))
        documents.append(result.context)

pipeline.add_documents(documents)

# 5. Query the RAG system
query = "What are the key features of the product?"
result = pipeline.run(query, strategy_name="enhanced")

logger.info(f"Query: {query}")
logger.info(f"Answer: {result.answer}")
logger.info(f"Retrieved {len(result.retrieved_documents)} documents")

Next Steps¶

Complete Workflow Guide - End-to-end workflow
CLI Reference - Command-line interface
Development Guide - Contributing to Docs2Synth

API Reference¶

Overview¶

Core Package¶

Installation & Import¶

Package-Level Exports¶

__version__¶

Configuration & Logging¶

Configuration (docs2synth.utils.config)¶

Classes¶

Config¶

Functions¶

get_config()¶

set_config()¶

load_config()¶

Logging (docs2synth.utils.logging)¶

Constants¶

LOG_FORMAT¶

Functions¶

setup_logging()¶

setup_logging_from_config()¶

setup_cli_logging()¶

get_logger()¶

configure_third_party_loggers()¶

log_function_call() (Decorator)¶

Classes¶

LoggerContext¶

ProgressLogger¶

Timing Utilities (docs2synth.utils.timer)¶

Functions¶

timer() (Context Manager)¶

timeit() (Decorator)¶

format_time()¶

Classes¶

Timer¶

Document Processing Schema (docs2synth.preprocess.schema)¶

Type Aliases¶

Enums¶

LabelType¶

Dataclasses¶

QAPair¶

DocumentObject¶

DocumentProcessResult¶

Preprocessing (docs2synth.preprocess)¶

Functions¶

run_preprocess()¶

QA Generation (docs2synth.qa)¶

Configuration Classes¶

QAStrategyConfig¶

QAGenerationConfig¶

QAVerificationConfig¶

Generators¶

BaseQAGenerator (Abstract)¶

Concrete Generators¶

Factory¶

Verifiers¶

BaseQAVerifier (Abstract)¶

Concrete Verifiers¶

Batch Processing¶

process_batch()¶

process_batch_verification()¶

Agent System (docs2synth.agent)¶

AgentWrapper¶

LLMResponse¶

Retriever Training (docs2synth.retriever)¶

Dataset Functions¶

load_verified_qa_pairs()¶

Model Creation¶

create_model_for_qa()¶

Training Functions¶

train()¶

Metrics¶

calculate_anls()¶

RAG System (docs2synth.rag)¶

Types¶

DocumentChunk¶

RetrievedDocument¶

RAGResult¶

Embedding Model¶

Vector Store¶

FaissVectorStore¶

`version`¶

Configuration (`docs2synth.utils.config`)¶

`Config`¶

`get_config()`¶

`set_config()`¶

`load_config()`¶

Logging (`docs2synth.utils.logging`)¶

`LOG_FORMAT`¶

`setup_logging()`¶

`setup_logging_from_config()`¶

`setup_cli_logging()`¶

`get_logger()`¶

`configure_third_party_loggers()`¶

`log_function_call()` (Decorator)¶

`LoggerContext`¶

`ProgressLogger`¶

Timing Utilities (`docs2synth.utils.timer`)¶

`timer()` (Context Manager)¶

`timeit()` (Decorator)¶

`format_time()`¶

`Timer`¶

Document Processing Schema (`docs2synth.preprocess.schema`)¶

`LabelType`¶

`QAPair`¶

`DocumentObject`¶

`DocumentProcessResult`¶

Preprocessing (`docs2synth.preprocess`)¶

`run_preprocess()`¶

QA Generation (`docs2synth.qa`)¶

`QAStrategyConfig`¶

`QAGenerationConfig`¶

`QAVerificationConfig`¶

`BaseQAGenerator` (Abstract)¶

`BaseQAVerifier` (Abstract)¶

`process_batch()`¶

`process_batch_verification()`¶

Agent System (`docs2synth.agent`)¶

Retriever Training (`docs2synth.retriever`)¶

`load_verified_qa_pairs()`¶

`create_model_for_qa()`¶

`train()`¶

`calculate_anls()`¶

RAG System (`docs2synth.rag`)¶

`DocumentChunk`¶

`RetrievedDocument`¶

`RAGResult`¶

`FaissVectorStore`¶

`RAGPipeline`¶

`NaiveRAGStrategy`¶

`EnhancedIterativeRAGStrategy`¶

PDF Image Processing (`docs2synth.utils.pdf_images`)¶

Text Processing (`docs2synth.utils.text`)¶