Skip to content

API Reference¶

Detailed documentation for Kreuzberg's public API.

Core Components¶

Extraction Functions - Functions for text extraction (Guide)
Types - Data structures for results and configuration (Guide)
OCR Configuration - OCR engine settings (Guide)
Extractor Registry - Document extractor management (Guide)
Exceptions - Error handling (Examples)

Public API¶

All components documented in this section are exported directly from the kreuzberg package and can be imported as follows:

from kreuzberg import extract_file, ExtractionConfig, TesseractConfig  # etc.

API Overview¶

Kreuzberg's API has four main components:

Extraction Functions: Extract text from documents
Configuration Objects: Control extraction behavior
Result Objects: Contain extracted text and metadata
OCR Backends: Pluggable OCR engines

Examples¶

Async API (Recommended)¶

from kreuzberg import extract_file, ExtractionConfig

# Basic usage
result = await extract_file("document.pdf")

# With configuration
result = await extract_file("document.pdf", config=ExtractionConfig(force_ocr=True))

Sync API¶

from kreuzberg import extract_file_sync

# Basic usage
result = extract_file_sync("document.pdf")