Skip to content

Getting Started with Kreuzberg

Welcome to Kreuzberg! This section will help you get up and running quickly with text extraction.

Overview

Kreuzberg is a Python library for extracting text from various document formats including PDFs, images, Office documents, and more. It provides both asynchronous and synchronous APIs for easy integration into any Python application.

Quick Navigation

Key Features

  • Multi-format Support: Extract text from PDF, image, Word, Excel, PowerPoint, and more
  • OCR Capabilities: Process scanned documents and images with OCR
  • Multiple OCR Engines: Choose from Tesseract, EasyOCR, or PaddleOCR
  • Async First: Built with modern Python async/await support
  • Metadata Extraction: Get document metadata alongside text content