API Server¶
Kreuzberg includes a built-in REST API server powered by Litestar for document extraction over HTTP.
Installation¶
Install Kreuzberg with the API extra:
Running the API Server¶
Using Python¶
Using Litestar CLI¶
With Custom Settings¶
API Endpoints¶
Health Check¶
Returns the server status:
Extract Files¶
Extract text from one or more files.
Request:
- Method:
POST
- Content-Type:
multipart/form-data
- Body: One or more files with field name
data
Response:
- Status: 201 Created
- Body: Array of extraction results
Example:
Response Format:
Error Handling¶
The API uses standard HTTP status codes:
200 OK
: Successful health check201 Created
: Successful extraction400 Bad Request
: Validation error (e.g., invalid file format)422 Unprocessable Entity
: Parsing error (e.g., corrupted file)500 Internal Server Error
: Unexpected error
Error responses include:
Features¶
- Batch Processing: Extract from multiple files in a single request
- Automatic Format Detection: Detects file types from MIME types
- OCR Support: Automatically applies OCR to images and scanned PDFs
- Structured Logging: Uses structlog for detailed logging
- OpenTelemetry: Built-in observability support
- Async Processing: High-performance async request handling
Configuration¶
The API server uses the default Kreuzberg extraction configuration:
- Tesseract OCR is included by default
- PDF, image, and document extraction is supported
- Table extraction with GMFT (if installed)
To use custom configuration, modify the extraction call in your own API wrapper:
Production Deployment¶
For production use, consider:
- Reverse Proxy: Use nginx or similar for SSL termination
- Process Manager: Use systemd, supervisor, or similar
- Workers: Run multiple workers with uvicorn or gunicorn
- Monitoring: Enable OpenTelemetry exporters
- Rate Limiting: Add rate limiting middleware
- Authentication: Add authentication middleware if needed
Example production command: