Getting Started with Kreuzberg¶
Welcome to Kreuzberg! This section will help you get up and running quickly with text extraction.
Overview¶
Kreuzberg is a Python library for extracting text from various document formats including PDFs, images, Office documents, and more. It provides both asynchronous and synchronous APIs for easy integration into any Python application.
Quick Navigation¶
- Installation - Install Kreuzberg and its dependencies
- Quick Start - Basic usage examples to get you started
Key Features¶
- Multi-format Support: Extract text from PDF, image, Word, Excel, PowerPoint, and more
- OCR Capabilities: Process scanned documents and images with OCR
- Multiple OCR Engines: Choose from Tesseract, EasyOCR, or PaddleOCR
- Async First: Built with modern Python async/await support
- Metadata Extraction: Get document metadata alongside text content