Skip to content

Getting Started with Kreuzberg¶

Welcome to Kreuzberg! This section will help you get up and running quickly with text extraction.

Overview¶

Kreuzberg is a Python library for extracting text from various document formats including PDFs, images, Office documents, and more. It provides both asynchronous and synchronous APIs for easy integration into any Python application.

Installation - Install Kreuzberg and its dependencies
Quick Start - Basic usage examples to get you started

Key Features¶

Multi-format Support: Extract text from PDF, image, Word, Excel, PowerPoint, and more
OCR Capabilities: Process scanned documents and images with OCR
Multiple OCR Engines: Choose from Tesseract, EasyOCR, or PaddleOCR
Async First: Built with modern Python async/await support
Metadata Extraction: Get document metadata alongside text content