How PDF to Excel works
This tool reads each PDF page's text layer, captures the X / Y position of every text item, then reconstructs rows by grouping items at the same vertical position and columns by clustering items at similar horizontal positions. Each PDF page becomes a separate sheet in the output XLSX file.
Best on: bank statements with consistent column widths, invoice line-items, exported reports from systems like Tally / SAP / Salesforce, simple data tables.
Won't handle well: tables with merged cells, multi-row headers, varying column widths within the same table, mixed text+image content, scanned PDFs (need OCR first).
Tips for better output
- If your PDF was generated from Excel / Google Sheets, the column alignment is usually perfect — converts cleanly.
- If your PDF is a scanned bank statement, OCR it first using our Image to Text tool, paste into Excel, then format manually.
- For PDFs with multiple tables per page (e.g. report dashboards), expect the extraction to merge them — split manually after import.
FAQs
How accurate is the conversion?▼
Best-effort. We extract text positions from the PDF and group items into rows by Y coordinate, then into columns by X position. Clean tables with well-defined columns convert well; tables with merged cells, varying column widths, or mixed-content cells (text + images) convert poorly. Always eyeball the output before using it for anything important.
What kinds of PDFs work best?▼
Bank statements, invoice line items, simple data tables, CSV-style PDFs. PDFs that started as Excel and were exported to PDF are often perfect candidates — the column structure is preserved in the text layer.
What kinds don't work?▼
Scanned PDFs (no text layer — needs OCR first), PDFs with complex multi-row headers, PDFs where rows wrap across multiple lines without clear delimiters, PDFs with floating elements (charts, images mixed with tables). For these, manual cleanup is needed after conversion.
What's the output format?▼
An XLSX (Excel) file. Each PDF page becomes a separate sheet. Within each sheet, each detected row maps to one Excel row. Cells are split by column-X gaps in the source PDF.
Is it private?▼
Yes. The PDF is processed entirely in your browser using pdf.js + SheetJS. Verifiable in DevTools → Network tab.