Skip to content

Kreuzberg

Kreuzberg is a Python library for text extraction from documents. It provides a unified async interface for extracting text from PDFs, images, office documents, and more.

Why Kreuzberg?

  • Simple and Hassle-Free: Clean API that just works, without complex configuration
  • Local Processing: No external API calls or cloud dependencies required
  • Resource Efficient: Lightweight processing without GPU requirements
  • Small Package Size: Has few curated dependencies and a minimal footprint
  • Format Support: Comprehensive support for documents, images, and text formats
  • Modern Python: Built with async/await, type hints, and functional first approach
  • Permissive OSS: Kreuzberg and its dependencies have a permissive OSS license

Kreuzberg was built for RAG (Retrieval Augmented Generation) applications, focusing on local processing with minimal dependencies. Its designed for modern async applications, serverless functions, and dockerized applications.