Skip to content

API Reference

Detailed documentation for Kreuzberg's public API.

Core Components

Public API

All components documented in this section are exported directly from the kreuzberg package and can be imported as follows:

from kreuzberg import extract_file, ExtractionConfig, TesseractConfig  # etc.

API Overview

Kreuzberg's API has four main components:

  1. Extraction Functions: Extract text from documents
  2. Configuration Objects: Control extraction behavior
  3. Result Objects: Contain extracted text and metadata
  4. OCR Backends: Pluggable OCR engines

Examples

1
2
3
4
5
6
7
from kreuzberg import extract_file, ExtractionConfig

# Basic usage
result = await extract_file("document.pdf")

# With configuration
result = await extract_file("document.pdf", config=ExtractionConfig(force_ocr=True))

Sync API

1
2
3
4
from kreuzberg import extract_file_sync

# Basic usage
result = extract_file_sync("document.pdf")