Skip to content

OCR Configuration

Configuration classes for the supported OCR engines.

TesseractConfig

Default OCR engine configuration:

kreuzberg.TesseractConfig dataclass

Configuration options for Tesseract OCR engine.

Source code in kreuzberg/_ocr/_tesseract.py
@dataclass(unsafe_hash=True, frozen=True)
class TesseractConfig:
    """Configuration options for Tesseract OCR engine."""

    classify_use_pre_adapted_templates: bool = True
    """Whether to use pre-adapted templates during classification to improve recognition accuracy."""
    language: str = "eng"
    """Language code to use for OCR.
    Examples:
            -   'eng' for English
            -   'deu' for German
            -    multiple languages combined with '+', e.g. 'eng+deu')
    """
    language_model_ngram_on: bool = True
    """Enable or disable the use of n-gram-based language models for improved text recognition."""
    psm: PSMMode = PSMMode.AUTO
    """Page segmentation mode (PSM) to guide Tesseract on how to segment the image (e.g., single block, single line)."""
    tessedit_dont_blkrej_good_wds: bool = True
    """If True, prevents block rejection of words identified as good, improving text output quality."""
    tessedit_dont_rowrej_good_wds: bool = True
    """If True, prevents row rejection of words identified as good, avoiding unnecessary omissions."""
    tessedit_enable_dict_correction: bool = True
    """Enable or disable dictionary-based correction for recognized text to improve word accuracy."""
    tessedit_use_primary_params_model: bool = True
    """If True, forces the use of the primary parameters model for text recognition."""
    textord_space_size_is_variable: bool = True
    """Allow variable spacing between words, useful for text with irregular spacing."""
    thresholding_method: bool = True
    """Enable or disable specific thresholding methods during image preprocessing for better OCR accuracy."""

Attributes

classify_use_pre_adapted_templates: bool = True class-attribute instance-attribute

Whether to use pre-adapted templates during classification to improve recognition accuracy.

language: str = 'eng' class-attribute instance-attribute

Language code to use for OCR. Examples: - 'eng' for English - 'deu' for German - multiple languages combined with '+', e.g. 'eng+deu')

language_model_ngram_on: bool = True class-attribute instance-attribute

Enable or disable the use of n-gram-based language models for improved text recognition.

psm: PSMMode = PSMMode.AUTO class-attribute instance-attribute

Page segmentation mode (PSM) to guide Tesseract on how to segment the image (e.g., single block, single line).

tessedit_dont_blkrej_good_wds: bool = True class-attribute instance-attribute

If True, prevents block rejection of words identified as good, improving text output quality.

tessedit_dont_rowrej_good_wds: bool = True class-attribute instance-attribute

If True, prevents row rejection of words identified as good, avoiding unnecessary omissions.

tessedit_enable_dict_correction: bool = True class-attribute instance-attribute

Enable or disable dictionary-based correction for recognized text to improve word accuracy.

tessedit_use_primary_params_model: bool = True class-attribute instance-attribute

If True, forces the use of the primary parameters model for text recognition.

textord_space_size_is_variable: bool = True class-attribute instance-attribute

Allow variable spacing between words, useful for text with irregular spacing.

thresholding_method: bool = True class-attribute instance-attribute

Enable or disable specific thresholding methods during image preprocessing for better OCR accuracy.

PSMMode

Page Segmentation Mode options for Tesseract:

kreuzberg.PSMMode

Bases: Enum

Enum for Tesseract Page Segmentation Modes (PSM) with human-readable values.

Source code in kreuzberg/_ocr/_tesseract.py
class PSMMode(Enum):
    """Enum for Tesseract Page Segmentation Modes (PSM) with human-readable values."""

    OSD_ONLY = 0
    """Orientation and script detection only."""
    AUTO_OSD = 1
    """Automatic page segmentation with orientation and script detection."""
    AUTO_ONLY = 2
    """Automatic page segmentation without OSD."""
    AUTO = 3
    """Fully automatic page segmentation (default)."""
    SINGLE_COLUMN = 4
    """Assume a single column of text."""
    SINGLE_BLOCK_VERTICAL = 5
    """Assume a single uniform block of vertically aligned text."""
    SINGLE_BLOCK = 6
    """Assume a single uniform block of text."""
    SINGLE_LINE = 7
    """Treat the image as a single text line."""
    SINGLE_WORD = 8
    """Treat the image as a single word."""
    CIRCLE_WORD = 9
    """Treat the image as a single word in a circle."""
    SINGLE_CHAR = 10
    """Treat the image as a single character."""

Attributes

AUTO = 3 class-attribute instance-attribute

Fully automatic page segmentation (default).

AUTO_ONLY = 2 class-attribute instance-attribute

Automatic page segmentation without OSD.

AUTO_OSD = 1 class-attribute instance-attribute

Automatic page segmentation with orientation and script detection.

CIRCLE_WORD = 9 class-attribute instance-attribute

Treat the image as a single word in a circle.

OSD_ONLY = 0 class-attribute instance-attribute

Orientation and script detection only.

SINGLE_BLOCK = 6 class-attribute instance-attribute

Assume a single uniform block of text.

SINGLE_BLOCK_VERTICAL = 5 class-attribute instance-attribute

Assume a single uniform block of vertically aligned text.

SINGLE_CHAR = 10 class-attribute instance-attribute

Treat the image as a single character.

SINGLE_COLUMN = 4 class-attribute instance-attribute

Assume a single column of text.

SINGLE_LINE = 7 class-attribute instance-attribute

Treat the image as a single text line.

SINGLE_WORD = 8 class-attribute instance-attribute

Treat the image as a single word.

EasyOCRConfig

Configuration for the EasyOCR engine:

kreuzberg.EasyOCRConfig dataclass

Configuration options for EasyOCR.

Source code in kreuzberg/_ocr/_easyocr.py
@dataclass(unsafe_hash=True, frozen=True)
class EasyOCRConfig:
    """Configuration options for EasyOCR."""

    add_margin: float = 0.1
    """Extend bounding boxes in all directions."""
    adjust_contrast: float = 0.5
    """Target contrast level for low contrast text."""
    beam_width: int = 5
    """Beam width for beam search in recognition."""
    canvas_size: int = 2560
    """Maximum image dimension for detection."""
    contrast_ths: float = 0.1
    """Contrast threshold for preprocessing."""
    decoder: Literal["greedy", "beamsearch", "wordbeamsearch"] = "greedy"
    """Decoder method. Options: 'greedy', 'beamsearch', 'wordbeamsearch'."""
    height_ths: float = 0.5
    """Maximum difference in box height for merging."""
    language: str | list[str] = "en"
    """Language or languages to use for OCR."""
    link_threshold: float = 0.4
    """Link confidence threshold."""
    low_text: float = 0.4
    """Text low-bound score."""
    mag_ratio: float = 1.0
    """Image magnification ratio."""
    min_size: int = 10
    """Minimum text box size in pixels."""
    rotation_info: list[int] | None = None
    """List of angles to try for detection."""
    slope_ths: float = 0.1
    """Maximum slope for merging text boxes."""
    text_threshold: float = 0.7
    """Text confidence threshold."""
    use_gpu: bool = False
    """Whether to use GPU for inference."""
    width_ths: float = 0.5
    """Maximum horizontal distance for merging boxes."""
    x_ths: float = 1.0
    """Maximum horizontal distance for paragraph merging."""
    y_ths: float = 0.5
    """Maximum vertical distance for paragraph merging."""
    ycenter_ths: float = 0.5
    """Maximum shift in y direction for merging."""

Attributes

add_margin: float = 0.1 class-attribute instance-attribute

Extend bounding boxes in all directions.

adjust_contrast: float = 0.5 class-attribute instance-attribute

Target contrast level for low contrast text.

beam_width: int = 5 class-attribute instance-attribute

Beam width for beam search in recognition.

canvas_size: int = 2560 class-attribute instance-attribute

Maximum image dimension for detection.

contrast_ths: float = 0.1 class-attribute instance-attribute

Contrast threshold for preprocessing.

decoder: Literal['greedy', 'beamsearch', 'wordbeamsearch'] = 'greedy' class-attribute instance-attribute

Decoder method. Options: 'greedy', 'beamsearch', 'wordbeamsearch'.

height_ths: float = 0.5 class-attribute instance-attribute

Maximum difference in box height for merging.

language: str | list[str] = 'en' class-attribute instance-attribute

Language or languages to use for OCR.

Link confidence threshold.

low_text: float = 0.4 class-attribute instance-attribute

Text low-bound score.

mag_ratio: float = 1.0 class-attribute instance-attribute

Image magnification ratio.

min_size: int = 10 class-attribute instance-attribute

Minimum text box size in pixels.

rotation_info: list[int] | None = None class-attribute instance-attribute

List of angles to try for detection.

slope_ths: float = 0.1 class-attribute instance-attribute

Maximum slope for merging text boxes.

text_threshold: float = 0.7 class-attribute instance-attribute

Text confidence threshold.

use_gpu: bool = False class-attribute instance-attribute

Whether to use GPU for inference.

width_ths: float = 0.5 class-attribute instance-attribute

Maximum horizontal distance for merging boxes.

x_ths: float = 1.0 class-attribute instance-attribute

Maximum horizontal distance for paragraph merging.

y_ths: float = 0.5 class-attribute instance-attribute

Maximum vertical distance for paragraph merging.

ycenter_ths: float = 0.5 class-attribute instance-attribute

Maximum shift in y direction for merging.

PaddleOCRConfig

Configuration for the PaddleOCR engine:

kreuzberg.PaddleOCRConfig dataclass

Configuration options for PaddleOCR.

This TypedDict provides type hints and documentation for all PaddleOCR parameters.

Source code in kreuzberg/_ocr/_paddleocr.py
@dataclass(unsafe_hash=True, frozen=True)
class PaddleOCRConfig:
    """Configuration options for PaddleOCR.

    This TypedDict provides type hints and documentation for all PaddleOCR parameters.
    """

    cls_image_shape: str = "3,48,192"
    """Image shape for classification algorithm in format 'channels,height,width'."""
    det_algorithm: Literal["DB", "EAST", "SAST", "PSE", "FCE", "PAN", "CT", "DB++", "Layout"] = "DB"
    """Detection algorithm."""
    det_db_box_thresh: float = 0.5
    """Score threshold for detected boxes. Boxes below this value are discarded."""
    det_db_thresh: float = 0.3
    """Binarization threshold for DB output map."""
    det_db_unclip_ratio: float = 2.0
    """Expansion ratio for detected text boxes."""
    det_east_cover_thresh: float = 0.1
    """Score threshold for EAST output boxes."""
    det_east_nms_thresh: float = 0.2
    """NMS threshold for EAST model output boxes."""
    det_east_score_thresh: float = 0.8
    """Binarization threshold for EAST output map."""
    det_max_side_len: int = 960
    """Maximum size of image long side. Images exceeding this will be proportionally resized."""
    drop_score: float = 0.5
    """Filter recognition results by confidence score. Results below this are discarded."""
    enable_mkldnn: bool = False
    """Whether to enable MKL-DNN acceleration (Intel CPU only)."""
    gpu_mem: int = 8000
    """GPU memory size (in MB) to use for initialization."""
    language: str = "en"
    """Language to use for OCR."""
    max_text_length: int = 25
    """Maximum text length that the recognition algorithm can recognize."""
    rec: bool = True
    """Enable text recognition when using the ocr() function."""
    rec_algorithm: Literal[
        "CRNN",
        "SRN",
        "NRTR",
        "SAR",
        "SEED",
        "SVTR",
        "SVTR_LCNet",
        "ViTSTR",
        "ABINet",
        "VisionLAN",
        "SPIN",
        "RobustScanner",
        "RFL",
    ] = "CRNN"
    """Recognition algorithm."""
    rec_image_shape: str = "3,32,320"
    """Image shape for recognition algorithm in format 'channels,height,width'."""
    table: bool = True
    """Whether to enable table recognition."""
    use_angle_cls: bool = True
    """Whether to use text orientation classification model."""
    use_gpu: bool = False
    """Whether to use GPU for inference. Requires installing the paddlepaddle-gpu package"""
    use_space_char: bool = True
    """Whether to recognize spaces."""
    use_zero_copy_run: bool = False
    """Whether to enable zero_copy_run for inference optimization."""

Attributes

cls_image_shape: str = '3,48,192' class-attribute instance-attribute

Image shape for classification algorithm in format 'channels,height,width'.

det_algorithm: Literal['DB', 'EAST', 'SAST', 'PSE', 'FCE', 'PAN', 'CT', 'DB++', 'Layout'] = 'DB' class-attribute instance-attribute

Detection algorithm.

det_db_box_thresh: float = 0.5 class-attribute instance-attribute

Score threshold for detected boxes. Boxes below this value are discarded.

det_db_thresh: float = 0.3 class-attribute instance-attribute

Binarization threshold for DB output map.

det_db_unclip_ratio: float = 2.0 class-attribute instance-attribute

Expansion ratio for detected text boxes.

det_east_cover_thresh: float = 0.1 class-attribute instance-attribute

Score threshold for EAST output boxes.

det_east_nms_thresh: float = 0.2 class-attribute instance-attribute

NMS threshold for EAST model output boxes.

det_east_score_thresh: float = 0.8 class-attribute instance-attribute

Binarization threshold for EAST output map.

det_max_side_len: int = 960 class-attribute instance-attribute

Maximum size of image long side. Images exceeding this will be proportionally resized.

drop_score: float = 0.5 class-attribute instance-attribute

Filter recognition results by confidence score. Results below this are discarded.

enable_mkldnn: bool = False class-attribute instance-attribute

Whether to enable MKL-DNN acceleration (Intel CPU only).

gpu_mem: int = 8000 class-attribute instance-attribute

GPU memory size (in MB) to use for initialization.

language: str = 'en' class-attribute instance-attribute

Language to use for OCR.

max_text_length: int = 25 class-attribute instance-attribute

Maximum text length that the recognition algorithm can recognize.

rec: bool = True class-attribute instance-attribute

Enable text recognition when using the ocr() function.

rec_algorithm: Literal['CRNN', 'SRN', 'NRTR', 'SAR', 'SEED', 'SVTR', 'SVTR_LCNet', 'ViTSTR', 'ABINet', 'VisionLAN', 'SPIN', 'RobustScanner', 'RFL'] = 'CRNN' class-attribute instance-attribute

Recognition algorithm.

rec_image_shape: str = '3,32,320' class-attribute instance-attribute

Image shape for recognition algorithm in format 'channels,height,width'.

table: bool = True class-attribute instance-attribute

Whether to enable table recognition.

use_angle_cls: bool = True class-attribute instance-attribute

Whether to use text orientation classification model.

use_gpu: bool = False class-attribute instance-attribute

Whether to use GPU for inference. Requires installing the paddlepaddle-gpu package

use_space_char: bool = True class-attribute instance-attribute

Whether to recognize spaces.

use_zero_copy_run: bool = False class-attribute instance-attribute

Whether to enable zero_copy_run for inference optimization.