Basic Usage¶
Kreuzberg offers a simple API for text extraction from documents and images.
Core Functions¶
Kreuzberg exports the following main functions:
Single Item Processing¶
extract_file()
: Async function to extract text from a file (accepts string path orpathlib.Path
)extract_bytes()
: Async function to extract text from bytes (accepts a byte string)extract_file_sync()
: Synchronous version ofextract_file()
extract_bytes_sync()
: Synchronous version ofextract_bytes()
Batch Processing¶
batch_extract_file()
: Async function to extract text from multiple files concurrentlybatch_extract_bytes()
: Async function to extract text from multiple byte contents concurrentlybatch_extract_file_sync()
: Synchronous version ofbatch_extract_file()
batch_extract_bytes_sync()
: Synchronous version ofbatch_extract_bytes()
Async Examples¶
Extract Text from a File¶
Process Multiple Files Concurrently¶
Synchronous Examples¶
Extract Text from a File¶
Process Multiple Files¶
Working with Byte Content¶
If you already have the file content in memory, you can use the bytes extraction functions:
Extraction Result¶
All extraction functions return an ExtractionResult
object containing:
content
: Extracted textmime_type
: Document MIME typemetadata
: Document metadata (see Metadata Extraction)