Supported Formats¶
Kreuzberg handles a wide range of document, image, and text formats.
Document Formats¶
- PDF (
.pdf
, both searchable and scanned) - includes detailed metadata extraction - Microsoft Word (
.docx
) - PowerPoint presentations (
.pptx
) - OpenDocument Text (
.odt
) - Rich Text Format (
.rtf
) - EPUB (
.epub
) - DocBook XML (
.dbk
,.xml
) - FictionBook (
.fb2
) - LaTeX (
.tex
,.latex
) - Typst (
.typ
)
Markup and Text Formats¶
- HTML (
.html
,.htm
) - Plain text (
.txt
) and Markdown (.md
,.markdown
) - reStructuredText (
.rst
) - Org-mode (
.org
) - DokuWiki (
.txt
) - Pod (
.pod
) - Troff/Man (
.1
,.2
, etc.)
Data and Research Formats¶
- Spreadsheets (
.xlsx
,.xls
,.xlsm
,.xlsb
,.xlam
,.xla
,.ods
) - CSV (
.csv
) and TSV (.tsv
) files - OPML files (
.opml
) - Jupyter Notebooks (
.ipynb
) - BibTeX (
.bib
) and BibLaTeX (.bib
) - CSL-JSON (
.json
) - EndNote and JATS XML (
.xml
) - RIS (
.ris
)
Image Formats¶
- JPEG (
.jpg
,.jpeg
,.pjpeg
) - PNG (
.png
) - TIFF (
.tiff
,.tif
) - BMP (
.bmp
) - GIF (
.gif
) - JPEG 2000 family (
.jp2
,.jpm
,.jpx
,.mj2
) - WebP (
.webp
) - Portable anymap formats (
.pbm
,.pgm
,.ppm
,.pnm
)