Skip to content

Supported Formats

Kreuzberg handles a wide range of document, image, and text formats.

Document Formats

  • PDF (.pdf, both searchable and scanned) - includes detailed metadata extraction
  • Microsoft Word (.docx)
  • PowerPoint presentations (.pptx)
  • OpenDocument Text (.odt)
  • Rich Text Format (.rtf)
  • EPUB (.epub)
  • DocBook XML (.dbk, .xml)
  • FictionBook (.fb2)
  • LaTeX (.tex, .latex)
  • Typst (.typ)

Markup and Text Formats

  • HTML (.html, .htm)
  • Plain text (.txt) and Markdown (.md, .markdown)
  • reStructuredText (.rst)
  • Org-mode (.org)
  • DokuWiki (.txt)
  • Pod (.pod)
  • Troff/Man (.1, .2, etc.)

Data and Research Formats

  • Spreadsheets (.xlsx, .xls, .xlsm, .xlsb, .xlam, .xla, .ods)
  • CSV (.csv) and TSV (.tsv) files
  • OPML files (.opml)
  • Jupyter Notebooks (.ipynb)
  • BibTeX (.bib) and BibLaTeX (.bib)
  • CSL-JSON (.json)
  • EndNote and JATS XML (.xml)
  • RIS (.ris)

Image Formats

  • JPEG (.jpg, .jpeg, .pjpeg)
  • PNG (.png)
  • TIFF (.tiff, .tif)
  • BMP (.bmp)
  • GIF (.gif)
  • JPEG 2000 family (.jp2, .jpm, .jpx, .mj2)
  • WebP (.webp)
  • Portable anymap formats (.pbm, .pgm, .ppm, .pnm)