Talk
Intermediate

Is Col Pali the new OCR!?

Approved

Even today, document retrieval systems struggle with PDFs or scanned files that have complex layouts — think tables, charts, images, or multi-column structures. The standard approach involves OCR → layout detection → chunking → embedding → search. It works… but it’s clunky, brittle, and doesn’t scale well across real-world data.

ColPali introduces a new method: skip OCR completely. Instead, it uses a Vision-Language Model (VLM) to directly process the document image and generate multi-vector embeddings that capture both the content and the layout in a single pass.

This is particularly useful for documents where structure matters — contracts, forms, invoices, academic papers. ColPali performs better on these types of documents, as shown by the ViDoRe benchmark.

Example scenarios:

  • A user wants to search across scanned contracts for a clause that appears in a footnote or table.

  • A company wants to make old regulatory PDFs searchable without reformatting or running OCR on thousands of pages.

  • You’re building a chatbot that needs to retrieve information from visual documents like forms or handwritten PDFs.

Traditional pipelines would require several fragile steps. ColPali simplifies this by doing everything — layout understanding, text encoding, and visual structure — in one shot using PaliGemma and a late interaction retrieval mechanism.

In this session, I’ll walk through:

  • The limitations of traditional OCR-based document retrieval

  • ColPali’s architecture

  • How these components work together

  • Demo/Tutorial to get started

  • Col Pali vs OCR when to select which

  • Col Pali architecture

Technology architecture
Knowledge Commons (Open Hardware, Open Science, Open Data etc.)
Engineering practice - productivity, debugging
Tutorial about using a FOSS project

Antara Raman Sahay
SWE Trainee Helmerich and Payne
Speaker Image

100 %
Approvability
1
Approvals
0
Rejections
0
Not Sure

Would be interesting for you to demo a comparison of where this would shine against OCR

Reviewer #1
Approved