Kreuzberg¶
Kreuzberg is a Python library for text extraction from documents. It provides a unified async interface for extracting text from PDFs, images, office documents, and more.
Why Kreuzberg?¶
- Simple and Hassle-Free: Clean API that just works, without complex configuration
- Local Processing: No external API calls or cloud dependencies required
- Resource Efficient: Lightweight processing without GPU requirements
- Small Package Size: Has few curated dependencies and a minimal footprint
- Format Support: Comprehensive support for documents, images, and text formats
- Modern Python: Built with async/await, type hints, and functional first approach
- Permissive OSS: Kreuzberg and its dependencies have a permissive OSS license
Kreuzberg was built for RAG (Retrieval Augmented Generation) applications, focusing on local processing with minimal dependencies. Its designed for modern async applications, serverless functions, and dockerized applications.