Skip to content

Advanced Topics

This section covers advanced features and usage patterns for Kreuzberg.

Topics

Advanced Configuration

Kreuzberg allows you to deeply customize the extraction process through:

  1. Validation Hooks: Apply custom validation rules to extraction results
  2. Post-Processing Hooks: Transform extraction results before they're returned
  3. Custom OCR Settings: Fine-tune OCR behavior for specific document types

Custom Validation Example

from kreuzberg import extract_file, ExtractionConfig, ExtractionResult
from kreuzberg import ValidationError

# Define a custom validator that requires a minimum text length
def validate_text_length(result: ExtractionResult) -> None:
    if len(result.content) < 100:
        raise ValidationError("Extracted text is too short", context={"content_length": len(result.content)})

# Use the validator in extraction
async def extract_with_validation(file_path):
    config = ExtractionConfig(validators=[validate_text_length])

    try:
        result = await extract_file(file_path, config=config)
        return result.content
    except ValidationError as e:
        print(f"Validation failed: {e}")
        return None